`
The objective of the sequel is to prepare you to be a professional GenAI engineer/developer. I will take you from the ground-up in the realm of LLMs and GenAI, starting from the very basics to building working and production level apps. The spirit of the sequel is to be “hands-on”. All examples are code-based, with final projects, built step-by-step either in python, Google Colab, and deployed in streamlit. By the end of the courses sequel, you will have built chatgpt clone, Midjourney clone, Chat with your data app, Youtube assistant app, Ask YouTube Video, Study Mate App, Recommender system, Image Description App with GPT-V, Image Generation app with DALL-E and StableDiffusion, Video commentator app using Whisper and others. We will cover prompt engineering approaches, and use them to build custom apps, that go beyond what ChatGPT knows. We will use OpenAI APIs, LangChain and many other tools. We will build together different applications using streamlit as our UI and cloud deployment framework. Streamlit is known of its ease of use, and easy python coding. With the power of GPT models, either by OpenAI, or opensource (like Llama or Mixtral on Huggingface), we will be able to build interesting applications, like chatting with your documents, chatting with youtube videos, building a state-of-the art recommender systems, video auto commentator and translator from one voice to another. We will mix modalities, like Image with GPT-V and DALL-E, Text with ChatGPT: GPT3.5 and GPT4, and voice with Whisper. We will be able to feed the AI models with our custom data, that is not available on the internet, and open models like ChatGPT do not know about. We will cover advanced and state-of-the art topics like RAG models, and LLM Agents. You will also work with different kinds of LLMs, being opensource or not. You will get exposed to GPT models by OpenAI, Llama models by Meta, Gemini and Bard by Google, Orca by Microsoft, Mixtral by Mistral AI and others. You will use pre-trained models, and also finetune them on your own data. We will learn about huggingface and use it for model finetuining using the Parameter Efficient Training or PEFT models. We will use Low-Rank Adaptation or LoRA for efficient training. You will learn how to deploy a model in the cloud, or privately host it for privacy concerns of your company data. You will learn how to use existing pre-trained models as teachers and use model distillation to train your custom version of the model.
Python
Probability
Linear Algebra
CV
NLP
Generative AI Foundations
Transformers
GenAI and LLM foundations
OpenAI API basics
ChatGPT Clone in Streamlit
Prompt Engineering
Agents
OpenAI Assistant API
LangChain
Chat with Your Data App
Retrieval Augmented Generation (RAG) model
Vector Databases
Chat with Youtube video
Build Recommender system with LLMs
Midjourney Clone App with Streamlit with DALL-E
Automatic Captions generation App with GPT-V
Automatic Voice-Over App with GPT-V and Whisper
Youtube translator App with Whisper and GPT-4
Huggingface Transformers Review
OpenSource LLMs: Llama 2, Mixtral and others
Privately hosted LLMs: GPT4all Clone with streamlit
Small Language Models (SLM)
LLM fine tuning with Huggningface using QLoRA
LLM finetuning with RLHF
GPT-3 Finetuning
LLM finetuning with Distillation
Builld good awareness of cloud systems, to be able to use and deploy his models
Be able to build apps using pre-trained models via open source LLMs or OpenAI APIs
Be able to train/finetune the LLM in different scenarios, and aware of training methods like TL, PEFT/LoRA or RLHF.
Be able to build multi-modal apps with different LLM models including Text, Speech and Image
Be able to work with RAG models to augment LLM knowledge
Build excellent knowledge of the underlying mechanics of transformers, LLMs
Hello and Welcome to a new Journey in the vast area of Generative AI Generative AI is changing our definition of the way of interacting with machines, mobiles and computers. It is changing our day-to-day life, where AI is an essential component. This new way of interaction has many faces: the good, the bad and the ugly. In this course we will sail in the vast sea of Generative AI, where we will cover both the theoretical foundations of Generative models, in different modalities mappins: Txt2Txt, Img2Txt, Txt2Img, Img2Txt and Txt2Voice and Voice2Text. We will discuss the SoTA models in each area at the time of this course. This includes the SoTA technology of Transformers, Language models, Large LM or LLM like Generative Pre-trained Transformers (GPT), paving the way to ChatGPT for Text Generation, and GANs, VAE, Diffusion models like DALL-E and StabeDiffusion for Image Generation, and VALL-E foe Voice Generation. In addition, we will cover the practical aspects, where we will build simple Language Models, Build a ChatGPT clone using OpenAI APIs where we will take a tour in OpenAI use cases with GPT3.5 and ChatGPT and DALL-E. In addition we will cover Huggingface transformers and StableDiffusion. Hope you enjoy our journey!
AI, ML and Deep Learning foundations
NLP: RNN, LSTM, Transformers basics
CV: ConvNets
What is Generative AI?
Applications of Generative AI
Uni-modal Generative AI: Text2Text: GPT, ChatGPT, Image2Image: GANs
Multi-modal Generative AI: Txt2Img: DALL-E, StableDiffusion, Img2Txt: Captioning
GenAI: The good, the bad and the ugly
Conclusion
Generative AI definition, areas of applications, mappings like txt2txt, img2txt, txt2img and txt2voice
How ChatGPT works, and the underlying tech behind like GPT, Large-Scale Language Models (LLM) and Transformers
How Latent Diffusion, StableDiffusion and DALL-E systems work
Generative Adversarial Networks (GANs) and Variational Auto Encoder (VAE)
The good, bad and ugly faces of GenAI, and how to adapt to the new tech
Build ChatGPT clone using OpenAI API and Streamlit
Build NLP applications using OpenAI API like Summarization, Text Classification and fine tuning GPT models
Build NLP applications using Huggingface transformers library like Language Models, Summarization, Translation, QA systems and others
Build Midjourney clone application using OpenAI DALL-E and StableDiffusion on Huggingface
This is part 1 of the Practical GenAI Sequel. The objective of the sequel is to prepare you to be a professional GenAI engineer/developer. I will take you from the ground-up in the realm of LLMs and GenAI, starting from the very basics to building working and production level apps. The spirit of the sequel is to be “hands-on”. All examples are code-based, with final projects, built step-by-step either in python, Google Colab, and deployed in streamlit. By the end of the courses sequel, you will have built chatgpt clone. We will cover prompt engineering approaches, and use them to build custom apps, that go beyond what ChatGPT knows. We will use OpenAI APIs, LangChain and many other tools. We will build together different applications using streamlit as our UI and cloud deployment framework. Streamlit is known of its ease of use, and easy python coding.
Python
NLP
Generative AI Foundations
Transormers
GenAI and LLM foundations
OpenAI API basics
ChatGPT Clone in Streamlit
Prompt Engineering
Agents
OpenAI Assistant API
LangChain
Understand the foundations of GenAI
Master OpenAI API Chat API
Build ChatGPT Clone with OpenAI API and Streamlit
Understand and apply Prompt Enigneering techniques
Practice with advanced OpenAI Assitants API and Function Calling
Get introduced to LangChain
This is Part 2 of the Practical GenAI Sequel. The objective of the sequel is to prepare you to be a professional GenAI engineer/developer. I will take you from the ground-up in the realm of LLMs and GenAI, starting from the very basics to building working and production level apps. The spirit of the sequel is to be “hands-on”. All examples are code-based, with final projects, built step-by-step either in python, Google Colab, and deployed in streamlit. By the end of the courses sequel, you will have built chatgpt clone, Midjourney clone, Chat with your data app, Youtube assistant app, Ask YouTube Video, Study Mate App, Recommender system, Image Description App with GPT-V, Image Generation app with DALL-E and StableDiffusion, Video commentator app using Whisper and others. We will cover prompt engineering approaches, and use them to build custom apps, that go beyond what ChatGPT knows. We will use OpenAI APIs, LangChain and many other tools. We will build together different applications using streamlit as our UI and cloud deployment framework. Streamlit is known of its ease of use, and easy python coding. With the power of GPT models, either by OpenAI, or opensource (like Llama or Mixtral on Huggingface), we will be able to build interesting applications, like chatting with your documents, chatting with youtube videos, building a state-of-the art recommender systems, video auto commentator and translator from one voice to another. We will mix modalities, like Image with GPT-V and DALL-E, Text with ChatGPT: GPT3.5 and GPT4, and voice with Whisper. We will be able to feed the AI models with our custom data, that is not available on the internet, and open models like ChatGPT do not know about. We will cover advanced and state-of-the art topics like RAG models, and LLM Agents.
Python
NLP
Transformers
Generative AI Foundations
Chat with Your Data App
Retrieval Augmented Generation (RAG) model
Vector Databases
Chat with Youtube video
Build Recommender system with LLMs
Midjourney Clone App with Streamlit with DALL-E
Automatic Captions generation App with GPT-V
Automatic Voice-Over App with GPT-V and Whisper
Youtube translator App with Whisper and GPT-4
Build RAG models to augment LLM knowledge
Build multi-modal apps with different LLM models including Text, Speech and Image
Understand the different RAG components and Design Patterns
Design RAG systems and identify the best design choices for each application
This is part 3 of the Practical GenAI Sequel. The objective of the sequel is to prepare you to be a professional GenAI engineer/developer. I will take you from the ground-up in the realm of LLMs and GenAI, starting from the very basics to building working and production level apps. The spirit of the sequel is to be “hands-on”. All examples are code-based, with final projects, built step-by-step either in python, Google Colab, and deployed in streamlit. By the end of the courses sequel, you will have built chatgpt clone, Midjourney clone, Chat with your data app, Youtube assistant app, Ask YouTube Video, Study Mate App, Recommender system, Image Description App with GPT-V, Image Generation app with DALL-E and StableDiffusion, Video commentator app using Whisper and others. In this part you will work with different kinds of LLMs, being opensource or not. You will get exposed to GPT models by OpenAI, Llama models by Meta, Gemini and Bard by Google, Orca by Microsoft, Mixtral by Mistral AI and others. You will use pre-trained models, and also finetune them on your own data. We will learn about huggingface and use it for model finetuining using the Parameter Efficient Training or PEFT models. We will use Low-Rank Adaptation or LoRA for efficient training. You will learn how to deploy a model in the cloud, or privately host it for privacy concerns of your company data. You will learn how to use existing pre-trained models as teachers and use model distillation to train your custom version of the model.
Python
NLP
Transformers
Generative AI Foundations
Huggingface Transformers Review
OpenSource LLMs: Llama 2, Mixtral and others
Privately hosted LLMs: GPT4all Clone with streamlit
Small Language Models (SLM)
LLM fine tuning with Huggningface using QLoRA
LLM finetuning with RLHF
GPT-3 Finetuning
LLM finetuning with Distillation
Build excellent knowledge of the underlying mechanics of transformers, LLMs
Go through the full training cycle of LLMs
Work with opensource LLMs
Work with privately hosted models
Fine tune pre-trained models with your own data
In this course, we will dive into the world of Natural Language Processing. We will demonstrate how Deep Learning has re-shaped this area of Artificial Intelligence using concepts like word vectors and embeddings, strucutured deep learning, collaborative filtering, recurrent neural networks, sequence-to-sequence models and transformer networks. In our journey, we will be mostly concerned with how to represent the language tokens, being at the word or character level, and and how to represent their aggregation, like sentences or documents, in a semantically sound way. We start the journey by going through the traditional pipeline of text pre-processing and the different text features like binary and TF-IDF features with the Bag-of-Words model. Then we will dive into the concepts of word vectors and embeddings as a general deep learning concept, with detailed discussion of famous word embedding techniques like word2vec, GloVe, Fasttext and ELMo. This will enable us to divert into recommender systems, using collaborative filtering and twin-tower model as an example of the generic usage of embeddings beyond word representations. In the second part of the course, we will be concerned with sentence and sequence representations. We will tackle the core NLP of Langauge Modeling, at statistical and neural levels, using recurrent models, like LSTM and GRU. In the following part, we tackle sequence-to-sequence models, with the flagship NLP task of Machine Translation, which paves the way to talk about many other tasks under the same design seq2seq pattern, like Question-Answering and Chatbots. We present the core idea idea of Attention mechanisms with recurrent seq2seq, before we generalize it as a generic deep learning concept. This generalization leads to the to the state-of-the art Transformer Network, which revolutionized the world of NLP, using full attention mechanisms. In the final part of the course, we present the ImageNet moment of NLP, where Transfer Learning comes into play together with pre-trained Transfomer architectures like BERT, GPT 1-2-3, RoBERTa, ALBERT, XLTransformer and XLNet.
Python
Probability
Linear Algebra
Machine Learning
Introduction to NLP
DL in NLP: Bag-of-Words models
Word Vectors and Word Embeddings
Pre-trained word embeddings: Word2Vec, GloVe, ELMo, Fasttext
Sequence models: Recurrent Nerual Networks, LSTM, GRU
Language Modeling: Statistical Language Models (SLM) and Neural Language Models (NLM)
Seq2seq models for Neural Machine Translation (NMT), Question-Answering (QA) and Chatbots
Transfomer Models for NMT
Transfer Learning in NLP: ULMFiT, BERT, GPT, XLNet
Build solid understanding of NLP traditional and Deep Learning techniques
Practice DL NLP in real problems like sentiment classification, machine translation, chatbots and question-answering
Build solid understanding of state-of-the art NLP models like BERT and GPT
Understand the evolution of DL NLP word and sentence embedding models using word2vec, GloVe, Fasttext, ELMo, BERT
Master the use of Transfer Learning in modern NLP models
Transformer Networks are the new trend in Deep Learning nowadays. Transformer models have taken the world of NLP by storm since 2017. Since then, they become the mainstream model in almost ALL NLP tasks. Transformers in CV are still lagging, however they started to take over since 2020. We will start by introducing attention and the transformer networks. Since transformers were first introduced in NLP, they are easier to be described with some NLP example first. From there, we will understand the pros and cons of this architecture. Also, we will discuss the importance of unsupervised or semi supervised pre-training for the transformer architectures, discussing Large Scale Language Models (LLM) in brief, like BERT and GPT. This will pave the way to introduce transformers in CV. Here we will try to extend the attention idea into the 2D spatial domain of the image. We will discuss how convolution can be generalized using self attention, within the encoder-decoder meta architecture. We will see how this generic architecture is almost the same in image as in text and NLP, which makes transformers a generic function approximator. We will discuss the channel and spatial attention, local vs. global attention among other topics. In the next three modules, we will discuss the specific networks that solve the big problems in CV: classification, object detection and segmentation. We will discuss Vision Transformer (ViT) from Google, Shifter Window Transformer (SWIN) from Microsoft, Detection Transformer (DETR) from Facebook research, Segmentation Transformer (SETR) and many others. Then we will discuss the application of Transformers in video processing, through Spatio-Temporal Transformers with application to Moving Object Detection, along with Multi-Task Learning setup. Finally, we will show how those pre-trained arcthiectures can be easily applied in practice using the famous Huggingface library using the Pipeline interface.
Practical Machine Learning course
Practical Computer Vision course (ConvNets)
Introduction to NLP course
Overview of Transformer Networks
Transformers in CV
Transformers for image classification
Transformers for object detection
Transformers for semantic segmentation
Huggingface transformers in CV
What are transformer networks?
State of the Art architectures for CV Apps like Image Classification, Semantic Segmentation, Object Detection and Video Processing
Practical application of SoTA architectures like ViT, DETR, SWIN in Huggingface vision transformers
Attention mechanisms as a general Deep Learning idea
Inductive Bias and the landscape of DL models in terms of modeling assumptions
Transformers application in NLP and Machine Translation
Transformers in Computer Vision
Different types of attention in Computer Vision
Hello and welcome to our course; Reinforcement Learning. Reinforcement Learning is a very exciting and important field of Machine Learning and AI. Some call it the crown jewel of AI. In this course, we will cover all the aspects related to Reinforcement Learning or RL. We will start by defining the RL problem, and compare it to the Supervised Learning problem, and discover the areas of applications where RL can excel. This includes the problem formulation, starting from the very basics to the advanced usage of Deep Learning, leading to the era of Deep Reinforcement Learning. In our journey, we will cover, as usual, both the theoretical and practical aspects, where we will learn how to implement the RL algorithms and apply them to the famous problems using libraries like OpenAI Gym, Keras-RL, TensorFlow Agents or TF-Agents and Stable Baselines. The course is divided into 6 main sections: 1- We start with an introduction to the RL problem definition, mainly comparing it to the Supervised learning problem, and discovering the application domains and the main constituents of an RL problem. We describe here the famous OpenAI Gym environments, which will be our playground when it comes to practical implementation of the algorithms that we learn about. 2- In the second part we discuss the main formulation of an RL problem as a Markov Decision Process or MDP, with simple solution to the most basic problems using Dynamic Programming. 3- After being armed with an understanding of MDP, we move on to explore the solution space of the MDP problem, and what the different solutions beyond DP, which includes model-based and model-free solutions. We will focus in this part on model-free solutions, and defer model-based solutions to the last part. In this part, we describe the Monte-Carlo and Temporal-Difference sampling based methods, including the famous and important Q-learning algorithm, and SARSA. We will describe the practical usage and implementation of Q-learning and SARSA on control tabular maze problems from OpenAI Gym environments. 4- To move beyond simple tabular problems, we will need to learn about function approximation in RL, which leads to the mainstream RL methods today using Deep Learning, or Deep Reinforcement Learning (DRL). We will describe here the breakthrough algorithm of DeepMind that solved the Atari games and AlphaGO, which is Deep Q-Networks or DQN. We also discuss how we can solve Atari games problems using DQN in practice using Keras-RL and TF-Agents. 5- In the fifth part, we move to Advanced DRL algorithms, mainly under a family called Policy based methods. We discuss here Policy Gradients, DDPG, Actor-Critic, A2C, A3C, TRPO and PPO methods. We also discuss the important Stable Baseline library to implement all those algorithms on different environments in OpenAI Gym, like Atari and others. 6- Finally, we explore the model-based family of RL methods, and importantly, differentiating model-based RL from planning, and exploring the whole spectrum of RL methods. Hopefully, you enjoy this course, and find it useful.
Machine Learning basics
Deep Learning basics
Probability
Programming and Problem solving basics
Python programming
Introduction to Reinforcement Learning
Markov Decision Process (MDP)
MDP Solution Space
Deep Reinforcement Learning (DRL)
Advanced DRL
Model based RL
Define what is Reinforcement Learning?
Apply all what is learned using state-of-the art libraries like OpenAI Gym, StabeBaselines, Keras-RL and TensorFlow Agents
Define what are the applications domains and success stories of RL?
Define what are the difference between Reinforcement and Supervised Learning?
Define the main components of an RL problem setup?
Define what are the main ingredients of an RL agent and their taxonomy?
Define what is Markov Reward Process (MRP) and Markov Decision Process (MDP)?
Define the solution space of RL using MDP framework
Solve the RL problems using planning with Dynamic Programming algorithms, like Policy Evaluation, Policy Iteration and Value Iteration
Solve RL problems using model free algorithms like Monte-Carlo, TD learning, Q-learning and SARSA
Differentiate On-policy and Off-policy algorithms
Master Deep Reinforcement Learning algorithms like Deep Q-Networks (DQN), and apply them to Large Scale RL
Master Policy Gradients algorithms and Actor-Critic (AC, A2C, A3C)
Master advanced DRL algorithms like DDPG, TRPO and PPO
Define what is model-based RL, and differentiate it from planning, and what are their main algorithms and applications?
This course is a comprehensive introduction to AI and Machine Learning, targeting Data Scientists and Machine Learning engineers. It starts with setting the boundaries of Artificial Intelligence, Machine Learning, Deep Learning, and their relation to Data Science. What is expected as a member an AI team, and how to speak the same language. What is possible and what is not, and what defines a good AI project. The basics of supervised learning are covered, including the main ingredients of the Machine Learning problem, and the different solution setups. We cover both Linear models (Linear Regression, Logistic Regression, Support Vector Machines (SVM)) and Non-linear models (Polynomial Regression, Kernel SVM, Deep Neural Networks (DNN)). A universal approach is given to tackle any ML problem in a systematic way, covering data preparation, Exploratory Data Analysis (EDA), Model selection, Model evaluation, Model design, Fine tuning and Regularization. An end-to-end is given to illustrate this process with code in Google Colab Notebooks. We also cover the Machine Learning Meta algorithms and Ensemble methods: Voting, BAGGing, Boosting Decision Trees and Random Forests. Finally, we introduce unsupervised learning, covering dimensionality reduction algorithms, like Manifold Learning like Locally Linear Embedding (LLE) and Projection methods like Principal Component Analysis (PCA) and Clustering, like K-Means. Throughout the course, Python language is used. Popular Machine Learning libraries are used, like scikit-learn, in addition to pandas and keras.
Python
Probability
Linear Algebra
Traditional programming vs. Statistical learning
AI vs. Machine learning vs. Deep learning
Different ML types: Supervised learning vs. Unsupervised learning vs. Self-supervised learning vs. Reinforcement learning
Linear models: Linear Regression, Logistic Regression, SVM
Non-Linear Classifiers: Polynomial Regression, Kernel SVM, Deep Neural Networks
Universal ML process: hyperparameters tuning, Regularization, Overfitting and underfitting
Evaluation protocols: Model Selection, Sampling, CrossValidation, Bootstrapping
Meta-Algorithms: Model Ensembles, Voting, BAGGing, Boosting, DecisionTrees, RandomForests
Unsupervised learning: clustering and dimensionality reduction
Build solid knowledge necessary for data scientists about AI, Machine Learning and Deep Learning
Understand the basics and underlying dynamics of supervised learining models: LinearRegression, LogisiticRegression, SVM, DNN, DecisionTrees and RandomForests.
Get introduced to unsupervised learning approaches for dimensionality reduction and clustering.
Build practical Machine Learning models and pipelines using python, scikit-learn, pandas, keras and tensorflow
Solve practical problems like image classification, text classification, price prediction.
This course is for AI and ML Engineers, Practitioners and Researchers who already built an awesome Deep Learning model, and they have a great idea for an app. But they discovered that it is not straight forward to deploy their model in a production App. Another example, say you want to build a robot that uses the Camera sensor to perceive the surrounding environment, build a map of it and eventually navigate it. Here also you discover that you still have a long Journey to go after your model is already performing great on your training machine. Finally, Software Engineers, who have their primary job is to build a working system or an app, often find themselves in a situation where they need to integrate an AI model in their software, which happens a lot today with the expansion of AI applications. They might get this model from a research team in their firm or company, or even use an API or pre-trained model on the internet to do their task. We cover all those deployment scenarios, covering the journey from working trained model to an optimized deployed model. Our focus will be on CV deployment mainly. We cover Mobile deployment like on Android devices, Edge deployment on Embedded boards like Rasperry Pi, and Browser deployment where your AI model is running in the browser like Chrome, Edge, Safari or any other browser. Also, we cover server deployment scenarios, which are often found in highly scalable apps and systems with millions of users, and also in industrial scenarios like AI visual inspection in factories. While the course is mostly practical, focusing on “How” things are done and the best way of doing it, we cover also some theoretical parts about the “what” and “why” those techniques are used. This requires sometimes to understand new types of convolution operations that are optimized for speed and memory, or understanding some model compression techniques that makes them suitable for Embedded and Edge deployments, which was not in scope during building the initial model that was already performing great.
Machine Learning Basics, including model building process
Deep learning basics and neural networks training process
Computer vision basics, including ConvNets, transfer learning and pre-trained models architectures
Deep Learning models Deployment Scenarios - Client and Server sides
DL Model compression: Distillation, Pruning and Quantization
Optimized DL models architectures and Sepcial Convolution types
Deployment of DL models on Edge Devices: Mobile – TFLite Android and TFLite Rasperry Pi
Deployment of DL models in the Browser: TFJS
Cloud deployment and Cloud-based APIs: TFHub, TF-API OBB, Torchhub
Model serving in the cloud: Flask, Django and TFServing
Define and understand the different deployment scenarios, being it Edge or Server deployment
Understand the constraints on each deployment scenario
Be able to choose the scenario suitable to your practical case and put the proper system architecture for it
Deploy ML models into Edge and Mobile devices using TLite tools
Deploy ML models into Browsers using TFJS
Define the different model serving qualities and understand their settings for production-level systems
Define the landscape of model serving options and be able to choose the proper one based on the needed qualities
Build a server model that uses Cloud APIs like TFHub, Torchhub or TF-API and customize it on custom data, or even build it from scratch
Serve a model using Flask, Django or TFServing, using custom infrastructure or in the Cloud like AWS EC2 and using Docker containers
Convert different models built in any framework to a common runtime format using ONNX
Understand the full ML development cycle and phases
Be able to define MLOps, model drift and monitoring
Hello and Welcome to a new Journey in the vast area of Generative AI Generative AI is changing our definition of the way of interacting with machines, mobiles and computers. It is changing our day-to-day life, where AI is an essential component. This new way of interaction has many faces: the good, the bad and the ugly. In this course we will sail in the vast sea of Generative AI, where we will cover both the theoretical foundations of Generative models, in different modalities mappins: Txt2Txt, Img2Txt, Txt2Img, Img2Txt and Txt2Voice and Voice2Text. We will discuss the SoTA models in each area at the time of this course. This includes the SoTA technology of Transformers, Language models, Large LM or LLM like Generative Pre-trained Transformers (GPT), paving the way to ChatGPT for Text Generation, and GANs, VAE, Diffusion models like DALL-E and StabeDiffusion for Image Generation, and VALL-E foe Voice Generation. In addition, we will cover the practical aspects, where we will build simple Language Models, Build a ChatGPT clone using OpenAI APIs where we will take a tour in OpenAI use cases with GPT3.5 and ChatGPT and DALL-E. In addition we will cover Huggingface transformers and StableDiffusion. Hope you enjoy our journey!
AI, ML and Deep Learning foundations
NLP: RNN, LSTM, Transformers basics
CV: ConvNets
What is Generative AI?
Applications of Generative AI
Uni-modal Generative AI: Text2Text: GPT, ChatGPT, Image2Image: GANs
Multi-modal Generative AI: Txt2Img: DALL-E, StableDiffusion, Img2Txt: Captioning
GenAI: The good, the bad and the ugly
Conclusion
Generative AI definition, areas of applications, mappings like txt2txt, img2txt, txt2img and txt2voice
How ChatGPT works, and the underlying tech behind like GPT, Large-Scale Language Models (LLM) and Transformers
How Latent Diffusion, StableDiffusion and DALL-E systems work
Generative Adversarial Networks (GANs) and Variational Auto Encoder (VAE)
The good, bad and ugly faces of GenAI, and how to adapt to the new tech
Build ChatGPT clone using OpenAI API and Streamlit
Build NLP applications using OpenAI API like Summarization, Text Classification and fine tuning GPT models
Build NLP applications using Huggingface transformers library like Language Models, Summarization, Translation, QA systems and others
Build Midjourney clone application using OpenAI DALL-E and StableDiffusion on Huggingface
Welcome to our course, Deep Learning for Computer Vision: From Pixels to Semantics. In this course, we will cover three main parts. The first part covers the essentials of traditional computer vision pipeline, and how to deal with images in OpenCV and Pillow libraries, including the image pre-processing pipeline like: thresholding, denoising, blurring, filtering, edge detection, contours...etc. We will build simple apps like Car License Plate Detection (LPD) and activity recogntion. This will lead us to the revolution that deep learning brought to the game of computer vision, turning traditional filters into learnable parameters using Convolution Neural Networks. We will cover all the basics of ConvNets, including the details of the Vanilla architecture for image classification, hyper parameters like kernels, strides, maxpool and feature maps sizes calculations. Beyond the Vanilla architecture, we also cover the state-of-the art ConvNet meta-architectures and design patters, like skip-connnections, Inception, DenseNet...etc. In the second part, we will learn how to use ConvNets to solve practical problems in different situations, with small amount of data, how to use transfer learning and the different scenarios for that, and finally how to debug and visualize the leant kernels in ConvNets. In the last part, we will learn about different CV apps using ConvNets. We will learn about the Encoder-Decoder design pattern. We start by the task of semantic segmentation, where we will build a U-Net architecture from scratch for the Cambridge Video (CAMVID) dataset. Then we will learn about Object Detection, covering both 2-stage and one-shot architectures like SSD and YOLO. Next, we will learn how to deal with the video data using the Spatio-Temporal ConvNet architectures. Finally we will introduce 3D Deep Learning to extend ConvNets usage to deal with 3D data, like LiDAR data.
Python
Probability
Linear Algebra
Machine Learning
From traditional Computer Vision to Deep Learning
The basics of ConvNets in Computer Vision
The practical aspects of DL in CV, like data augmentation and transfer learning
ConvNets Architectures and Pre-trained ConvNets
Debugging ConvNets by visualization of ConvNets filters and features
Image Classification
Semantic Segmentation
Object Detection
Video Analysis: Spatio-Temporal Models
3D Deep Learning in Computer Vision
Build solid understanding of Computer vision foundations, using traditional and Deep Learning methods
Deep understanding of Conolutional Neural Networks and their usage in computer vision
Build practical projects with ConvNets, like image classification, multi-object detection and semantic segmentations
Understand and practice the concepts of Transfer Learning in practical problems
Learn how to visualize and debug ConvNets and understand their underlying dynamics in a practical way
Learn how to use and apply data augmentation and how to deal with large and small datasets using ConvNets
Understand the basics of dealing with time and video data using Spatio-temporal models
Understand the basics of 3D Deep Learning and how to deal with 3D data sets
Hello and welcome to our course; Reinforcement Learning. Reinforcement Learning is a very exciting and important field of Machine Learning and AI. Some call it the crown jewel of AI. In this course, we will cover all the aspects related to Reinforcement Learning or RL. We will start by defining the RL problem, and compare it to the Supervised Learning problem, and discover the areas of applications where RL can excel. This includes the problem formulation, starting from the very basics to the advanced usage of Deep Learning, leading to the era of Deep Reinforcement Learning. In our journey, we will cover, as usual, both the theoretical and practical aspects, where we will learn how to implement the RL algorithms and apply them to the famous problems using libraries like OpenAI Gym, Keras-RL, TensorFlow Agents or TF-Agents and Stable Baselines. The course is divided into 6 main sections: 1- We start with an introduction to the RL problem definition, mainly comparing it to the Supervised learning problem, and discovering the application domains and the main constituents of an RL problem. We describe here the famous OpenAI Gym environments, which will be our playground when it comes to practical implementation of the algorithms that we learn about. 2- In the second part we discuss the main formulation of an RL problem as a Markov Decision Process or MDP, with simple solution to the most basic problems using Dynamic Programming. 3- After being armed with an understanding of MDP, we move on to explore the solution space of the MDP problem, and what the different solutions beyond DP, which includes model-based and model-free solutions. We will focus in this part on model-free solutions, and defer model-based solutions to the last part. In this part, we describe the Monte-Carlo and Temporal-Difference sampling based methods, including the famous and important Q-learning algorithm, and SARSA. We will describe the practical usage and implementation of Q-learning and SARSA on control tabular maze problems from OpenAI Gym environments. 4- To move beyond simple tabular problems, we will need to learn about function approximation in RL, which leads to the mainstream RL methods today using Deep Learning, or Deep Reinforcement Learning (DRL). We will describe here the breakthrough algorithm of DeepMind that solved the Atari games and AlphaGO, which is Deep Q-Networks or DQN. We also discuss how we can solve Atari games problems using DQN in practice using Keras-RL and TF-Agents. 5- In the fifth part, we move to Advanced DRL algorithms, mainly under a family called Policy based methods. We discuss here Policy Gradients, DDPG, Actor-Critic, A2C, A3C, TRPO and PPO methods. We also discuss the important Stable Baseline library to implement all those algorithms on different environments in OpenAI Gym, like Atari and others. 6- Finally, we explore the model-based family of RL methods, and importantly, differentiating model-based RL from planning, and exploring the whole spectrum of RL methods. Hopefully, you enjoy this course, and find it useful.
Machine Learning basics
Deep Learning basics
Probability
Programming and Problem solving basics
Python programming
Introduction to Reinforcement Learning
Markov Decision Process (MDP)
MDP Solution Space
Deep Reinforcement Learning (DRL)
Advanced DRL
Model based RL
Define what is Reinforcement Learning?
Apply all what is learned using state-of-the art libraries like OpenAI Gym, StabeBaselines, Keras-RL and TensorFlow Agents
Define what are the applications domains and success stories of RL?
Define what are the difference between Reinforcement and Supervised Learning?
Define the main components of an RL problem setup?
Define what are the main ingredients of an RL agent and their taxonomy?
Define what is Markov Reward Process (MRP) and Markov Decision Process (MDP)?
Define the solution space of RL using MDP framework
Solve the RL problems using planning with Dynamic Programming algorithms, like Policy Evaluation, Policy Iteration and Value Iteration
Solve RL problems using model free algorithms like Monte-Carlo, TD learning, Q-learning and SARSA
Differentiate On-policy and Off-policy algorithms
Master Deep Reinforcement Learning algorithms like Deep Q-Networks (DQN), and apply them to Large Scale RL
Master Policy Gradients algorithms and Actor-Critic (AC, A2C, A3C)
Master advanced DRL algorithms like DDPG, TRPO and PPO
Define what is model-based RL, and differentiate it from planning, and what are their main algorithms and applications?
Transformer Networks are the new trend in Deep Learning nowadays. Transformer models have taken the world of NLP by storm since 2017. Since then, they become the mainstream model in almost ALL NLP tasks. Transformers in CV are still lagging, however they started to take over since 2020. We will start by introducing attention and the transformer networks. Since transformers were first introduced in NLP, they are easier to be described with some NLP example first. From there, we will understand the pros and cons of this architecture. Also, we will discuss the importance of unsupervised or semi supervised pre-training for the transformer architectures, discussing Large Scale Language Models (LLM) in brief, like BERT and GPT. This will pave the way to introduce transformers in CV. Here we will try to extend the attention idea into the 2D spatial domain of the image. We will discuss how convolution can be generalized using self attention, within the encoder-decoder meta architecture. We will see how this generic architecture is almost the same in image as in text and NLP, which makes transformers a generic function approximator. We will discuss the channel and spatial attention, local vs. global attention among other topics. In the next three modules, we will discuss the specific networks that solve the big problems in CV: classification, object detection and segmentation. We will discuss Vision Transformer (ViT) from Google, Shifter Window Transformer (SWIN) from Microsoft, Detection Transformer (DETR) from Facebook research, Segmentation Transformer (SETR) and many others. Then we will discuss the application of Transformers in video processing, through Spatio-Temporal Transformers with application to Moving Object Detection, along with Multi-Task Learning setup. Finally, we will show how those pre-trained arcthiectures can be easily applied in practice using the famous Huggingface library using the Pipeline interface.
Practical Machine Learning course
Practical Computer Vision course (ConvNets)
Introduction to NLP course
Overview of Transformer Networks
Transformers in CV
Transformers for image classification
Transformers for object detection
Transformers for semantic segmentation
Huggingface transformers in CV
Conclusion
What are transformer networks?
State of the Art architectures for CV Apps like Image Classification, Semantic Segmentation, Object Detection and Video Processing
Practical application of SoTA architectures like ViT, DETR, SWIN in Huggingface vision transformers
Attention mechanisms as a general Deep Learning idea
Inductive Bias and the landscape of DL models in terms of modeling assumptions
Transformers application in NLP and Machine Translation
Transformers in Computer Vision
Different types of attention in Computer Vision