- 17
- 75 200
ExplainingAI
India
เข้าร่วมเมื่อ 8 ก.ย. 2023
Hello, I am Tushar and this channel is a product of two things that I am very passionate about.
Learning something new everyday specifically related to my field which is ML, Deep Learning & Computer Vision(hoping to expand this list with this channel :) )
Teaching and explaining things in the most simple manner that I can, to people who are interested in knowing what I already know. Making their learning process little bit easier and lot more fun.
If you are interested in continuously learning and improving and have any intersection with my interests, then do subscribe (obviously only if you like the content otherwise just ignore me for now and comeback when my content has become worthy of your subscription, which it will someday :) )
Thank you so much for visiting my channel
Tushar,
x-Amazon | Last seen building AI for cooking robot @ Nymble (www.eatwithnymble.com/)
www.linkedin.com/in/tushar-kumar-40299b19/
Learning something new everyday specifically related to my field which is ML, Deep Learning & Computer Vision(hoping to expand this list with this channel :) )
Teaching and explaining things in the most simple manner that I can, to people who are interested in knowing what I already know. Making their learning process little bit easier and lot more fun.
If you are interested in continuously learning and improving and have any intersection with my interests, then do subscribe (obviously only if you like the content otherwise just ignore me for now and comeback when my content has become worthy of your subscription, which it will someday :) )
Thank you so much for visiting my channel
Tushar,
x-Amazon | Last seen building AI for cooking robot @ Nymble (www.eatwithnymble.com/)
www.linkedin.com/in/tushar-kumar-40299b19/
Faster R-CNN Explanation | Region Proposal Network
In this tutorial we cover Faster R-CNN for object detection. Its an attempt to provide in-depth faster rcnn explanation. The video covers what is faster rcnn, how faster rcnn training works and we also dive deep into its architecture. We start with difference between fast rcnn and faster rcnn , understand anchor boxes and region proposal networks (RPNs) step by step, the two main components of faster rcnn architecture, and finally look into the performance we get for object detection using faster r-cnn.
⏱️ Timestamps
00:00 Intro
00:28 Need for Faster R-CNN
02:41 Region proposal network (RPN) in Faster RCNN
06:28 Anchor Boxes in Faster RCNN
09:57 Anchors to Proposals in Region Proposal Network
12:58 RPN + Fast R-CNN
14:17 Training Region Proposal Network
19:53 Why have both RPN and Fast RCNN
22:18 Joint Training of Faster R-CNN
25:23 Alternate Training of Faster RCNN
27:58 Results for Object Detection Using Faster R-CNN
38:15 Difference between rcnn, fast rcnn and faster rcnn
40:34 Outro
📖 Resources:
Faster R-CNN Paper - tinyurl.com/exai-faster-rcnn-paper
🔔 Subscribe :
tinyurl.com/exai-channel-link
📌 Keywords:
#ObjectDetection
Background Track - Fruits of Life by Jimena Contreras
Email - explainingai.official@gmail.com
⏱️ Timestamps
00:00 Intro
00:28 Need for Faster R-CNN
02:41 Region proposal network (RPN) in Faster RCNN
06:28 Anchor Boxes in Faster RCNN
09:57 Anchors to Proposals in Region Proposal Network
12:58 RPN + Fast R-CNN
14:17 Training Region Proposal Network
19:53 Why have both RPN and Fast RCNN
22:18 Joint Training of Faster R-CNN
25:23 Alternate Training of Faster RCNN
27:58 Results for Object Detection Using Faster R-CNN
38:15 Difference between rcnn, fast rcnn and faster rcnn
40:34 Outro
📖 Resources:
Faster R-CNN Paper - tinyurl.com/exai-faster-rcnn-paper
🔔 Subscribe :
tinyurl.com/exai-channel-link
📌 Keywords:
#ObjectDetection
Background Track - Fruits of Life by Jimena Contreras
Email - explainingai.official@gmail.com
มุมมอง: 345
วีดีโอ
Fast R-CNN Explained | ROI Pooling
มุมมอง 628หลายเดือนก่อน
In this tutorial, I dive deep into Fast R-CNN , explaining its architecture, the role of ROI pooling and how it differs from R-CNN. Through this video you will learn how Fast R-CNN works, understand Region Of Interest (ROI) pooling, and discover the advantages it brings to object detection tasks over previous approaches. I specifically go through how Fast R-CNN compares over R-CNN in terms of p...
Mean Average Precision (mAP) | Explanation and Implementation for Object Detection
มุมมอง 5442 หลายเดือนก่อน
In this video we go over Mean Average Precision (mAP) , Non-Maximum Suppression (NMS), anIn this video we go over Mean Average Precision (mAP) , Non-Maximum Suppression (NMS), and Intersection over Union (IOU) in object detection. We dive deep into understanding these crucial concepts for improving the accuracy of object detection algorithms. We first discuss Intersection over Union (IOU) as a ...
R-CNN Explained
มุมมอง 1.5K2 หลายเดือนก่อน
This is a R CNN tutorial video in which I dive deep into what is R CNN and r cnn basics. This video is a part of object detection series and the first one in that is RCNN for object detection. By the end of this video you would be able to understand the R CNN algorithm in detail to understand clearly as to how rcnn works . We start with what selective search is and how rcnn uses selective searc...
Stable Diffusion from Scratch in PyTorch | Conditional Latent Diffusion Models
มุมมอง 4.4K3 หลายเดือนก่อน
In this video, we'll cover all the different types of conditioning in latent diffusion and finish stable diffusion implementation in PyTorch and after this you would be able to build and train Stable Diffusion from scratch. This is Part II of the tutorial where I get into conditioning in latent diffusion models. We dive deep into class conditioning in latent diffusion models, implementing class...
Stable Diffusion from Scratch in PyTorch | Unconditional Latent Diffusion Models
มุมมอง 8K4 หลายเดือนก่อน
In this video, we'll cover everything from the building blocks of stable diffusion to its implementation in PyTorch and see how to build and train Stable Diffusion from scratch. This is Part I of the tutorial where I explain latent diffusion models specifically unconditional latent diffusion models. We dive deep into what is latent diffusion , how latent diffusion works , what are the component...
DCGAN Tutorial with PyTorch Implementation
มุมมอง 8625 หลายเดือนก่อน
In this video I cover DCGAN with the goal of understanding and implementing DCGAN from scratch in pytorch. We will be doing a dive deep into its architecture which will guide us on how to build a dcgan model. We will then implement dcgan from scratch and understand how the entire code for dcgan looks like and the different layers involved. We train dcgan for image generation on mnist dataset. W...
Generative Adversarial Networks | Tutorial with Math Explanation and PyTorch Implementation
มุมมอง 2.1K5 หลายเดือนก่อน
In this video, we delve into the core concepts of Generative Adversarial Networks ( GAN ), going over different components and how to build a generative adversarial neural network , understanding how to train generative adversarial networks , exploring the underlying math and providing a step-by-step PyTorch implementation. This video is an attempt to give you a thorough understanding of how do...
Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation
มุมมอง 11K6 หลายเดือนก่อน
In this video I get into Denoising Diffusion Probabilistic Models implementation ( DDPM ) and walk through the complete Denoising Diffusion Probabilistic Models code in pytorch. I give a quick overview of math behind diffusion models before getting into DDPM implementation. I cover the denoising diffusion probabilistic models pytorch implementation in 4 parts: 1. Noise scheduler in ddpm - codin...
Denoising Diffusion Probabilistic Models | DDPM Explained
มุมมอง 21K6 หลายเดือนก่อน
In this video, I get into diffusion models and specifically we look into denoising diffusion probabilistic models (DDPM). I try to provide a comprehensive guide to understanding entire maths behind it and training diffusion models ( denoising diffusion probabilistic models ). 🔍 Video Highlights: 1. Overview of Diffusion Models: We first look at the code idea in diffusion models 2. DDPM Demystif...
Image Classification Using Vision Transformer | An Image is Worth 16x16 Words
มุมมอง 7687 หลายเดือนก่อน
This video covers the implementation of vision Transformer - VIT in pytorch . This is the third part of Vision transformer - VIT series in which I build entire vision transformer from scratch. I also cover visualizations of the attention map using attention rollout as well as positional embedding visualization to get more intuition on what the model is learning in this video. I have covered Vis...
ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation
มุมมอง 1.6K7 หลายเดือนก่อน
This video covers everything about self attention in Vision Transformer - VIT , and its implementation from scratch. I go over all the details and explain everything happening inside attention in vision transformer in detail through visualizations and also go over how an implementation of self-attention from scratch would look like in Pytorch. I cover Vision transformer ( VIT ) in three parts: ...
PATCH EMBEDDING | Vision Transformers explained
มุมมอง 3.6K7 หลายเดือนก่อน
I will cover Vision transformer in three parts. The first part which is this video focusses on patch embedding in vision transformer. I will go over all the details and explain everything happening inside the patch embedding in VIT in detail. I will also go over how an implementation of patch embedding for vision transformer in Pytorch would look like. The second part which goes through attenti...
I implement DALLE 1 from SCRATCH on MNIST
มุมมอง 1.7K8 หลายเดือนก่อน
In this video I go over Dalle-1. Specifically I talk over all Dalle components and how to train each of those components. I show results and visualizations of what Dalle 1 ends up learning in both the stages and finally how to actually implement Dalle by yourself and generate images in pytorch. #dalle #generativeai *Timestamps* 00:00 Intro 00:31 Components of Dalle 01:42 Stage I - Learning the ...
VQ-VAE | Everything you need to know about it | Explanation and Implementation
มุมมอง 10K8 หลายเดือนก่อน
In this video I go over Vector Quantised Variational Auto Encoder(VQVAE). Specifically I talk over how its different from VAE, its theory and implementation. I also train a VQVAE model and show how the generation process looks like after we have trained a VQVAE model. Timestamps 00:00 Intro 00:27 Need and difference from VAE 02:43 VQVAE Components 04:31 Argmin and Straight Through Gradient Esti...
Implementing Variational Auto Encoder from Scratch in Pytorch
มุมมอง 3.2K9 หลายเดือนก่อน
Implementing Variational Auto Encoder from Scratch in Pytorch
Understanding Variational Autoencoder | VAE Explained
มุมมอง 4K9 หลายเดือนก่อน
Understanding Variational Autoencoder | VAE Explained
i have a doubt at this timestamp: th-cam.com/video/H45lF4sUgiE/w-d-xo.htmlsi=mzOMzB0uACX8mPd6&t=528 - when you do summation of GP - wont the common factor be sqrt(1-beta)? - hence the final summation equation seems wrong to me. need some help to understand that formulation. captions during the time stamp: ... the rest of the terms are all gaussian with zero mean but different variances however since all are independent we can formulate them as one gaussian with mean zero and variance as sum of all individual variances. Thanks
Loved every bit of it. The amount of effort you put in to explain these complex concepts in a simple manner is NEXT LEVEL. This has become my favourite Deep Learning Channel. THANKS A LOT!! keep up the amazing work.
Thank you for the continuous encouragement and appreciation Vikram. It means a lot! Will keep trying my best to put out videos that are worthy of this.
Wow this was awesome!!
Thank you
Thanks. Many interesting nuggets that I had missed from reading the paper.
Hello will you implement it from scratch especially in tensorflow? Plus i have been struggling a bit with yolo v2 implementation. Would also need help on that😅
Hello, Actually I have limited experience(and by limited I mean zero) with tensorflow, so the implementation would be in Pytorch. That video should be out in 3-4 days and my hope is that it will be of some help to you, using which you could get a better understanding of tensorflow implementation as well. And yes this series will indeed have videos on different yolo versions . Would take some time to get through all of them but it will have it.
@@Explaining-AI Thanks for the information.
Great video. Very well explained
Don't have enough words to describe it. This is presented and explained so beautifully. Thanks, Legend.
Thank you for these kind words Vikram :)
now do flow matching
Added this one to my list :)
excellent, clear explanation of diffusion
Thank You!
The code in line 65 is wrong. It should be codebook_loss = torch.mean((quant_out - quant_input.detach())**2)
Yes indeed. Its correct in the repo - github.com/explainingai-code/VQVAE-Pytorch/blob/main/run_simple_vqvae.py#L65 but the video version has a typo where instead of torch.mean((quant_out - quant_input.detach())**2) its incorrectly implemented as torch.mean((quant_out - quant_input.detach()**2))
such a good explanation and illustration! Please make about YOLO
Thank You! Yes, as part of this series, will be making videos on different versions of YOLO as well.
Nicely explained! Keep the good work going! 😁
*Github Implementation* - github.com/explainingai-code/DetectionUtils/blob/main/compute_ap.py
Very good and detailed explanation. I think the Curve shown in the example at @18:55 is wrong since the mAP is always monotonically decreasing due to the prior confidence-based sorting.
Thank You. While the predictions and confidence scores used in the video are hypothetical but its still plausible for the curve to not be monotonically decreasing. Because even after confidence sorting, it could be that a higher confidence detection is a false positive and a lower confidence detection is a true positive. That would first decrease the precision and then increase it. And because of this we do the step of replacing each precision value with the maximum precision value to the right @20:14 , to smooth the zig zag pattern and then the curve is monotonically decreasing. Let me know if that makes sense.
Outstanding video! Although I'm wondering if there are some dependencies that are missing from various code segments? I'm using jupyter lab to enable an interactive pytorch environment. When I attempted to compile the VQVAE.PY code, Jupyter returned an error that the models module hadn't been defined. Here's the error: ------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 3 1 import torch 2 import torch.nn as nn ----> 3 from models.blocks import DownBlock, MidBlock, UpBlock 6 class VQVAE(nn.Module): 7 def __init__(self, im_channels, model_config): ModuleNotFoundError: No module named 'models' The models.blocks thus wouldn't load. Where might I find this dependency? Or is it a question of file compiling order? Edit: I think I found my problem: Requirements.txt :) Still, thanks for the great video!!
great video!
Thank You!
Amazing video! Thanks
nice.
can you share the code, please?
I am really sorry for the late reply. Had thought to provide this code together with an object detection implementation so that one can clearly understand how to use them. But haven't been able to get to that yet. For now, I have pushed the code for this video here - github.com/explainingai-code/DetectionUtils/blob/main/compute_ap.py with comments on what the map computation method is expecting as input and how to create that.
Try a version with an AI voice for clarity ?
Haven't given the AI voice option any thought until now, but as a viewer, was the clarity of audio that bad for you ? And entire video or some specific part ?
@@Explaining-AI It's just an idea...... 😊👍👍
@@Explaining-AI Maybe you could do a test with an American voice, maybe even female, to see how it impacts view count ?
Nice explanation..!
Thank You!
Amazing. I only did not understand the classification part. Does this zero shot learning achieves that, we need to fine tune the pretrained model with hard labels to make it a classifier? Thanks.. amazing transformers series.. best best best!!!!
Yes you would need to fine tune/train it on your dataset. Typically you would have a fc layer on top of the CLS token representation and through training the model(say on mnist), it will learn to attend on the right patches and build a CLS representation that allows it to correctly classify which digit it is.
Amazing explanation... i did not come accross the beautiful and easy explanation of transformers that seems extremely difficult... this channel deserves millions of subscribers 🎉
Thank you for the kind words :)
Thanks for the video, it is really great. I have finetuned a stable diffusion v1.5 model and now I am trying to built a stable diffusion model from scratch without using any pretrained ckpts and running it locally. So is it possible that we can train the model without using any pretrained checkpoint ?
Hello, yes its definitely possible. Though depending on your dataset and image resolution you might have to use a lot of compute time, and also if your pre-trained checkpoint was trained on images similar to your task, then your generation results(without pre-training) would be of lesser quality(than with pretraining) .
Keep up the great work 👍
Thank you!
Love your work! This channel is really helpful.
Thank you for this
Brilliant video, thanks so much!
You're very welcome!
From the video it is evident that using a variational autoencoder for image de-noising works better compared to just using an autoencoder as the latent representation of the images generated by the encoder in an autoencoder is mapped to a point instead of a latent space with a distribution of the latent masks generated by the encoder.
Thank you, the visuals really helped me understand, especially for the backpropagation part !
Happy that the video was some help to you :)
I watched your video again, and cannot give you enough compliments on it! Great job!
@bayesianmonk Thank you so much for taking the time to comment these words of appreciation(that too twice) 🙂
Great explanation, covering theory and implementation. Nice visualisations. Thanks!
Thank You!
Very well done
Thank You!
Very Nice! Keep the good word going!!
Thank You!
Hi sir, I have scanning electron microscope images which has some defects in it or some part of pattern is missing while printing on wafer. How can we use VAE to classify a sem image into fault and faultless category? Please guide me.
Hello, Is it possible for you to send me an email and we can have a conversation around this there. You could use VAE's to get a latent representation for all non-faulty images and then using reconstruction error as measure to classify test image as faulty or non-faulty but I would be able to help better if I know more specific details about it. So do drop me an email if you feel it would benefit you to have a discussion around this.
@@Explaining-AI hello sir, I have sent an email. Please check. Thanks
crazy stuff
Hello Can we do the following steps for a simple dataset 1. Encode our text into embedding using pre trained models like t5 and then using those text embeddings train a simple lstm to predict the next batch of image embedding tokens (of the codebook in vq vae once the image passes through the encoder) 2. The predicted image embedding tokens will go through the decoder of trained vq-vae to output images. By the way thanks for the video.
Hello, Yes absolutely. In fact in my vqvae video this is exactly what I did. Although I trained a lstm to generate sequence of codebook tokens unconditionally(and then the step 2 that you mentioned), but all one would need to change is just prepend the condition representation to the codebook token sequence to have lstm generate data conditionally. If you are interested you can take a look at that here - th-cam.com/video/1ZHzAOutcnw/w-d-xo.html
I don't have enough words to describe this masterpiece. VERY WELL EXPLAINED. Thanks. :)
Thank you so much for this appreciation :)
I derived the whole equation for reverse diffusion process and at 21:26 in the last term of equation in the last line, I did not get \sqrt{\alpha t - 1}. Could you share the complete derivation? Also, the third last line seems to be incorrect, it should be (\alpha t - 1) instead of (\alpha t - 1)^2
Hello, yes the square on \bar{\alpha_(t-1)} is a mistake which gets corrected in the next line. But thank you for pointing that out! Regarding the last term in last line, just wanted to mention that its \bar{\alpha_(t-1)} which is just coming from rewriting \bar{\alpha_(t)} from the last term in second last line as \alpha_t * \bar{\alpha_(t-1)} .
@@Explaining-AI Ahh yes, ignorant me. Thank you for your time in deriving the equations. I did not find this derivation any where else yet :)
Very nicely explained, Kudos
Great video, but as feedback, I'd suggest to breath and pause a bit after each bigger step. You're jumping between statements really fast, so you don't give people to think a little bit about what you just said.
Thank you so much for this feedback, makes perfect sense. Will try to improve on this in the future videos.
Fantastic explanation.
What was the compute you used to train this? And how long did it take? Great video btw!
Thanks! For the diffusion model I used single Nvidia V100 which took around 15 mins per epoch and as far as I remember, I trained for about 50 epochs (to get these outputs and I stopped at that point, ideally should train for much longer to get better quality outputs).
@@Explaining-AI Thank you for your prompt reply! I am building something similar for generating synthetic galaxy images to learn about LDMs. Your videos are a lifesaver.
@@Explaining-AI Can we also use Flash Attention instead of normal attention?
@@HardikBishnoi Yes , I have not used it myself but I remember reading some implementation where diffusers(huggingface) + flashattention gave 3x speedup.
@@HardikBishnoi Glad these were helpful to you!
Superb, the math doesn't looks all that scary after your explanation! Now I just need pen an paper to sink it in.
Thank You!
Thank you so much!
Great Video! It was very helpful to understand DDPM ! Thank you so much ! : )
Thank you :) Glad that the video was helpful to you!
Hi, i have two questions: Shouldn't the decoder_fcs have a final nn.Tanh() layer so that the output (-1, 1) matches the way the images are rescaled (-1, 1)? Another question I have is, how are we supposed to generate new data points? Should we need to feed the network and get the mean and log and generate vector z, or could we just navigate through vectors of the same shape as z and expect to get something? Thank you!!
For the tanh activation, I think this was missed while I was recording, but the repo code does indeed have the activation layer - github.com/explainingai-code/VAE-Pytorch/blob/main/run_simple_vae.py#L46 If your goal is generation then you can just sample z from a normal distribution and feed it to the decoder layers. So torch.randn((num_images, latent_dim)) and then feed this to decoder.
will you cover MAMBA implementation later? I think there's no current video with clear explanation. It would be very nice if you do it.
Hello, I indeed plan to cover it but it wont be part of this series. I have 3-4 topics that I intend to cover first and then after that will do a video on Mamba.
Best explanation of diffusion process with connection to VAE process!
Thank you for the kind words!
Amazing explanation and Implementation. Thanks you so much.
Thank you!
Legendry video