ExplainingAI
ExplainingAI
  • 26
  • 220 519
YOLOv4 Explained | CIOU Loss, CSPDarknet53, SPP, PANet | Everything about it
This video aims to explain YOLOv4, real-time object detection model including all features and techniques used in it. In this video, we thoroughly get into YOLOv4 architecture, its unique features such as the Dropblock, cross mini bn, SPP (Spatial Pyramid Pooling) module, CSP(cross stage partial connections) and how they all improves object detection performance. We start the video covering all features that improve backbone performance like cutmix, mosaic, label smoothing and cross stage partial connections. Each of these features are covered in great detail to give you an idea of how yolov4 works.
Then dive deep into dropblock, ciou loss(complete iou loss), self adversarial training, grid sensitivity, diou nms and so on.
We then end with a complete review of yolov4 architecture and performance of yolov4 to understand how it fares as a real time object detector specifically and also compare it to yolov3
⏱️ Timestamps:
00:00 Intro
01:23 Typical Object Detection Model Architecture
03:03 YOLOv4 - Bag of freebies and Bag of specials
05:15 Cutmix Data Augmentation
07:10 Mosaic Data Augmentation
09:32 DropBlock Regularization in YOLOv4
20:19 Class Label Smoothing in YOLO-v4
23:40 Mish in Backbone
24:53 Cross Stage Partial Connections
29:26 MiWRC
31:27 Cross Mini Batch Normalization in YOLOv4
39:33 CIOU Loss (Complete IOU Loss)
47:47 Self Adversarial Training
49:11 Eliminating Grid Sensitivity in YOLO-v4
53:33 Genetic Algorithm
56:26 Spatial Pyramid Pooling
57:36 Spatial Attention Module for YOLOv4
59:50 Path Aggregation Network in YOLOv4
01:02:33 DIOU NMS
01:04:52 Performance of YOLOv4
01:05:43 YOLOv4 Architecture Explained
📖 Resources:
YOLOv4 Paper - arxiv.org/pdf/2004.10934
YOLOv4 Repo - github.com/AlexeyAB/darknet
Cutmix Paper - arxiv.org/pdf/1905.04899
Spatial Dropout Paper - arxiv.org/pdf/1411.4280
DropBlock Paper - arxiv.org/pdf/1810.12890
Mish Paper - arxiv.org/pdf/1908.08681
Cross stage Partial Connections Paper - arxiv.org/pdf/1911.11929
Efficient Det Paper - arxiv.org/pdf/1911.09070
Cross Iteration Batch Normalization Paper - arxiv.org/pdf/2002.05712
Generalized IOU Loss Paper - arxiv.org/pdf/1902.09630
DIOU and Complete IOU Loss Paper - arxiv.org/pdf/1911.08287
Grid Sensitivity Issue Link - github.com/AlexeyAB/darknet/issues/3293
Path Aggregation Paper - arxiv.org/pdf/1803.01534
🔔 Subscribe:
tinyurl.com/exai-channel-link
Email - explainingai.official@gmail.com
มุมมอง: 250

วีดีโอ

YOLOv2 (YOLO9000) and YOLOv3 Explained
มุมมอง 529หลายเดือนก่อน
In this yolo object detection series tutorial, we dive into the details of YOLOv2 (YOLO9000) and YOLOv3 model for object detection . The video explores how yolov2 and yolov3 models work, their architectures, losses for training them, and their advancements over earlier versions like YOLOv1. We will get into features that make YOLOv2 better, faster, and stronger, as described in the YOLO9000 pap...
Building a Video Generation Model with Diffusion Transformers | Explanation and Implementation
มุมมอง 2.1K2 หลายเดือนก่อน
In this video, we dive deep into Latte, a latent diffusion transformer for video generation. This generative video diffusion model combines diffusion techniques with transformer architecture and is trained on latent frames of videos. We start with a quick recap of diffusion transformers, as the core building block of this latent transformer for video generation is similar to the adaptive layer ...
Single Shot Multibox Detector | SSD Object Detection Explained and Implemented
มุมมอง 3.1K3 หลายเดือนก่อน
In this video, I get into Single Shot Multibox Detector or SSD, a popular real-time object detection model. We will understand how Single Shot Multibox Detector algorithm works, and also do step by step walkthrough of implementation of SSD in PyTorch. This video is part of my object detection series, where I’ve previously covered YOLO, and now we’re exploring SSD object detection to get an unde...
Scalable Diffusion Models with Transformers | DiT Explanation and Implementation
มุมมอง 6K3 หลายเดือนก่อน
In this video, we’ll dive deep into Diffusion with Transformers (DiT), a scalable approach to diffusion models that leverages the transformer architecture. We will first get an overview of vision transformer, then see the changes the author make to get to DiT. We will look in detail the different block designs that the DiT authors explore for Diffusion Transformers and also see the results of e...
YOLO Object Detection | YoloV1 Explanation and Implementation Tutorial
มุมมอง 4.4K4 หลายเดือนก่อน
This video is on YOLO object detection, specifically yolov1 object detection algorithm. In this tutorial we try to understand how the YOLO algorithm works, from its real-time object detection capabilities to its approach of bounding box predictions. We will also go through YOLOv1 implementation from scratch in PyTorch. By the end of this video you would be able to get a complete explanation of ...
ControlNet with Diffusion Models | Explanation and PyTorch Implementation
มุมมอง 3.5K5 หลายเดือนก่อน
In this tutorial we get into ControlNet for diffusion models. We delve into the architecture of ControlNet for Stable Diffusion, explaining how it enhances final model performance on conditional dataset. We cover need for controlnet and goal it tries to achieve, architecture overview of controlnet for a simple block. Then we get into how to use controlnet for controlling generation output of di...
Faster RCNN PyTorch Code Walkthrough | Fine-Tuning and Custom Dataset Training
มุมมอง 3.8K6 หลายเดือนก่อน
This tutorial covers all the details of Faster R-CNN with an in-depth PyTorch code walkthrough! This will guide you through the implementation of Faster R-CNN in PyTorch, including training on custom datasets and fine-tuning faster r cnn techniques. We first do a walkthrough of Faster RCNN with resnet50 FPN backbone wherein we cover the backbone initialization part, RPN, ROI head and also dive ...
Faster R-CNN PyTorch Implementation
มุมมอง 6K7 หลายเดือนก่อน
In this tutorial, I go step-by-step into how to implement Faster R-CNN for object detection using PyTorch . I cover everything from building Faster R-CNN from scratch to training the model and running object detection. This video builds the code for Faster R-CNN in Python and provides detailed explanations of different components involved in implementing Faster R-CNN. We start with building RPN...
Faster R-CNN Explanation | Region Proposal Network
มุมมอง 7K8 หลายเดือนก่อน
In this tutorial we cover Faster R-CNN for object detection. Its an attempt to provide in-depth faster rcnn explanation. The video covers what is faster rcnn, how faster rcnn training works and we also dive deep into its architecture. We start with difference between fast rcnn and faster rcnn , understand anchor boxes and region proposal networks (RPNs) step by step, the two main components of ...
Fast R-CNN Explained | ROI Pooling
มุมมอง 4.7K9 หลายเดือนก่อน
In this tutorial, I dive deep into Fast R-CNN , explaining its architecture, the role of ROI pooling and how it differs from R-CNN. Through this video you will learn how Fast R-CNN works, understand Region Of Interest (ROI) pooling, and discover the advantages it brings to object detection tasks over previous approaches. I specifically go through how Fast R-CNN compares over R-CNN in terms of p...
Mean Average Precision (mAP) | Explanation and Implementation for Object Detection
มุมมอง 4.2K9 หลายเดือนก่อน
In this video we go over Mean Average Precision (mAP) , Non-Maximum Suppression (NMS), anIn this video we go over Mean Average Precision (mAP) , Non-Maximum Suppression (NMS), and Intersection over Union (IOU) in object detection. We dive deep into understanding these crucial concepts for improving the accuracy of object detection algorithms. We first discuss Intersection over Union (IOU) as a ...
R-CNN Explained
มุมมอง 8K10 หลายเดือนก่อน
This is a R CNN tutorial video in which I dive deep into what is R CNN and r cnn basics. This video is a part of object detection series and the first one in that is RCNN for object detection. By the end of this video you would be able to understand the R CNN algorithm in detail to understand clearly as to how rcnn works . We start with what selective search is and how rcnn uses selective searc...
Stable Diffusion from Scratch in PyTorch | Conditional Latent Diffusion Models
มุมมอง 12K10 หลายเดือนก่อน
In this video, we'll cover all the different types of conditioning in latent diffusion and finish stable diffusion implementation in PyTorch and after this you would be able to build and train Stable Diffusion from scratch. This is Part II of the tutorial where I get into conditioning in latent diffusion models. We dive deep into class conditioning in latent diffusion models, implementing class...
Stable Diffusion from Scratch in PyTorch | Unconditional Latent Diffusion Models
มุมมอง 21K11 หลายเดือนก่อน
In this video, we'll cover everything from the building blocks of stable diffusion to its implementation in PyTorch and see how to build and train Stable Diffusion from scratch. This is Part I of the tutorial where I explain latent diffusion models specifically unconditional latent diffusion models. We dive deep into what is latent diffusion , how latent diffusion works , what are the component...
DCGAN Tutorial with PyTorch Implementation
มุมมอง 1.8Kปีที่แล้ว
DCGAN Tutorial with PyTorch Implementation
Generative Adversarial Networks | Tutorial with Math Explanation and PyTorch Implementation
มุมมอง 3.5Kปีที่แล้ว
Generative Adversarial Networks | Tutorial with Math Explanation and PyTorch Implementation
Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation
มุมมอง 25Kปีที่แล้ว
Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation
Denoising Diffusion Probabilistic Models | DDPM Explained
มุมมอง 53Kปีที่แล้ว
Denoising Diffusion Probabilistic Models | DDPM Explained
Image Classification Using Vision Transformer | An Image is Worth 16x16 Words
มุมมอง 1.7Kปีที่แล้ว
Image Classification Using Vision Transformer | An Image is Worth 16x16 Words
ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation
มุมมอง 3.5Kปีที่แล้ว
ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation
PATCH EMBEDDING | Vision Transformers explained
มุมมอง 7Kปีที่แล้ว
PATCH EMBEDDING | Vision Transformers explained
I implement DALLE 1 from SCRATCH on MNIST
มุมมอง 2.5Kปีที่แล้ว
I implement DALLE 1 from SCRATCH on MNIST
VQ-VAE | Everything you need to know about it | Explanation and Implementation
มุมมอง 19Kปีที่แล้ว
VQ-VAE | Everything you need to know about it | Explanation and Implementation
Implementing Variational Auto Encoder from Scratch in Pytorch
มุมมอง 6Kปีที่แล้ว
Implementing Variational Auto Encoder from Scratch in Pytorch
Understanding Variational Autoencoder | VAE Explained
มุมมอง 10Kปีที่แล้ว
Understanding Variational Autoencoder | VAE Explained

ความคิดเห็น

  • @ddozzi
    @ddozzi 9 ชั่วโมงที่ผ่านมา

    Could you please explain why at 7:48 that last term is a constant? Thank you!

  • @ShreyaBanik-fb2gd
    @ShreyaBanik-fb2gd 15 ชั่วโมงที่ผ่านมา

    Very informative! Step by step approach for OD in CV. Best video so far 👍❤

    • @Explaining-AI
      @Explaining-AI 49 นาทีที่ผ่านมา

      Thank You :)

  • @qwertytechnology
    @qwertytechnology 17 ชั่วโมงที่ผ่านมา

    This explanation is absolutely fantastic.

  • @ddozzi
    @ddozzi วันที่ผ่านมา

    Amazing video. Probably the best explanation I've seen on the internet. However, I'm still struggling to understand the encoding and statistical explanation of the model. You say that we need to compute p(z|x), but we can't because its computationally intractable, so instead we estimate it using q(z|x). However, my question is firstly: how do we calculate the KL-divergence between q(z|x) and p(z|x) if we don't actually know p(z|x) (at 7:34)? If we did, why couldn't we just use that instead? Next, you say that we sample from the distribution p(z) to generate new pieces of data. This does not make sense to me. Isn't p(z) a standard gaussian? If we sampled from p(z) wouldn't we just get non-sensical, random results? Why don't we sample from the learned distribution of q(z|x) instead? Here's my thought process, please correct me where my understanding is wrong: 1) Imagine we have images of circles that a VAE must reconstruct. 2) We encode it into a 2-dimensional latent space. 3) The decoder decodes the a point sampled from the latent space and generates a new image of a circle. In step 2 we encode images by estimating p(z|x) through q(z|x). Let's say the encoder learns that the two dimensions of the latent space are radius and position. Then, for every single image the encoder finds the latent variables z and turns it into a distribution of radius-position combinations which approximate a standard gaussian (p(z)) (I imagine this looks something like a joint distribution between two normally-spread random variables). We do this enough times and eventually we have a latent space represented by the probability density function q(z|x) which organizes the space of all seen circles into varying spaces within q(z|x) (I imagine this looks like approximately standard gaussian clusters spread out throughout a 2D latent space with axis radius and position where each cluster represents a certain type of circle x). In step 3 we decode these images by sampling from q(z|x) (which is differentiable through the reparameterization trick). Then, we can conduct the reconstruction loss between the generated output x' (or f(x)) and original input x (this makes sense to me) and calculate the KL-divergence between our estimation of the latent space q(z|x) and a standard gaussian p(z) (this doesn't make sense to me). - In this part, why do we take the KL-divergence between these two terms? To my understanding q(z|x) is our prediction of the true latent space p(z|x). If we tried to make q(z|x) as similar to p(z) as possible, would we not just see q(z|x) turn into a giant standard gaussian? Why would we want that? Am I understanding things correctly? (probably not) But more importantly, could you please correct me on what I'm misunderstanding and answer my questions? Again, excellent video. It just seems like there are some kinks which I have yet to work out because of my inexperience.

    • @ddozzi
      @ddozzi 9 ชั่วโมงที่ผ่านมา

      Nevermind, I figured most of it out.

  • @zakirkerimov3421
    @zakirkerimov3421 วันที่ผ่านมา

    Man, I love you. You're the person I was looking for. Thanks for great explanation without omitting details and things that can be unclear.

    • @Explaining-AI
      @Explaining-AI 49 นาทีที่ผ่านมา

      Thank you for this comment :) Really happy that my videos are of help to you.

  • @adhemardesenneville1115
    @adhemardesenneville1115 วันที่ผ่านมา

    Amazing !!! v8 is coming...

  • @PraveenKottari
    @PraveenKottari 4 วันที่ผ่านมา

    what a clean explanation!!!!!!!!!!

  • @Ssadesc
    @Ssadesc 5 วันที่ผ่านมา

    Awesome explanation, Thanks!

  • @muhamadnursyami
    @muhamadnursyami 6 วันที่ผ่านมา

    please make video Mask R-CNN sir 🙏🙏

  • @xiaolongli-lx5hz
    @xiaolongli-lx5hz 8 วันที่ผ่านมา

    can i get the parameter file of this DIT model (trained on mnist) directly?

  • @debanganmandal2524
    @debanganmandal2524 8 วันที่ผ่านมา

    you have made my literature survey 10 times easier. May I reccomend you to look into transformer and attention based object detection model, starting with DETR. Love your content <3

    • @Explaining-AI
      @Explaining-AI 6 วันที่ผ่านมา

      Happy that my content was helpful to you :) The next one for the detection series is DETR only.

  • @just_exist_ezz
    @just_exist_ezz 9 วันที่ผ่านมา

    God thanks

  • @alivecoding4995
    @alivecoding4995 9 วันที่ผ่านมา

    :) thanks!!!

  • @alivecoding4995
    @alivecoding4995 9 วันที่ผ่านมา

    Does YOLOv7 have a similar architecture?

    • @Explaining-AI
      @Explaining-AI 9 วันที่ผ่านมา

      Hello, there are some similarities in terms of presence of pyramid pooling and top down bottom up pathways, but the design of those blocks are quite different. Also yolov7 uses E-ELAN rather than CSP residual blocks. If you are interested then do take a look at this paper - arxiv.org/pdf/2304.00501 . It provides highlights and changes of all different yolo versions. For YOLOv7, refer to Figure 16 (Page 21).

    • @alivecoding4995
      @alivecoding4995 9 วันที่ผ่านมา

      @ ☺️👍

  • @Coldgpu
    @Coldgpu 10 วันที่ผ่านมา

    yoyo

  • @ansalrobinson
    @ansalrobinson 11 วันที่ผ่านมา

    Can you please share prediction code

    • @Explaining-AI
      @Explaining-AI 9 วันที่ผ่านมา

      Hello, the repo (github.com/explainingai-code/SSD-PyTorch/blob/main/tools/infer.py) has prediction as well as evaluation code.

  • @Bioinforere99
    @Bioinforere99 11 วันที่ผ่านมา

    Best explnation of Denoising Diffusion Probabilistic Models!

  • @jeffmacleod5194
    @jeffmacleod5194 13 วันที่ผ่านมา

    So much great info in 8 minutes. Thank you so much!

    • @Explaining-AI
      @Explaining-AI 12 วันที่ผ่านมา

      Thank you for the appreciation :)

  • @moiirani8827
    @moiirani8827 14 วันที่ผ่านมา

    The quality of the writings too poor to see the equations

  • @坨坨王
    @坨坨王 14 วันที่ผ่านมา

    thank you so much, this video is very helpful to me. you are very generous.

    • @Explaining-AI
      @Explaining-AI 14 วันที่ผ่านมา

      I'm happy that the video ended up being of help to you :)

  • @davidjennicson5614
    @davidjennicson5614 16 วันที่ผ่านมา

    Hey I am a new subscriber can you explain the implementations of LayoutLMV3 and UDOP and help implementing from scratch

  • @nickbakker9747
    @nickbakker9747 17 วันที่ผ่านมา

    Can you explain to me why there is a break on line 114 in the train_torchvision_frcnn.py. It now looks like because of the break it will only use one batch and than break out of the epoch? I really like your videos thankss

    • @Explaining-AI
      @Explaining-AI 17 วันที่ผ่านมา

      The only explanation is my oversight :D I must have been debugging something before pushing the code, and ended up forgetting to remove the 'break' at the end. My apologies for the confusion, and thank you so much for pointing it out. Have fixed it now in the repo.

  • @luisangeld9894
    @luisangeld9894 20 วันที่ผ่านมา

    I am trying to use this model to train coco, but I am having issues using it, seems the model is veru structured to be trained on PASCAL VOC, any idea how can I adapt it to COCO? great video

    • @Explaining-AI
      @Explaining-AI 12 วันที่ผ่านมา

      Hello, apologies for the late reply. I think the model should work once you set the right number of classes. But you would need changes in the dataset class. If you are still facing problems after making the dataset class changes(or if you need help with that), can you please open an issue on the repo and I can try to help resolve that.

  • @alivecoding4995
    @alivecoding4995 21 วันที่ผ่านมา

    If you look at the latent space images at 37:26 you cannot believe the decoder can re-generate the original image from it. As there is simply a lot of missing information. Any explanation on how it does it? First I thought this is due to original image information leaking through through skip-connections between down and up blocks. But we are not using those in the auto-encoder.

  • @frimlinso1894
    @frimlinso1894 22 วันที่ผ่านมา

    In the sampling algorithm (algo 2), I don't understand why we have to add noise z back in. Can anyone explain this to me?

    • @Explaining-AI
      @Explaining-AI 18 วันที่ผ่านมา

      In the reverse process, at each time step we have a distribution (P(xt-1|xt)), which is a gaussian(N(mu_t-1, sigma)). We use the prediction of noise at each timestep to compute the predicted mean, mu_theta. The adding noise part is actually reparameterization trick to sample from the predicted P(xt-1) distribution. Which is why we sample a random noise z, shift it by the mean of this predicted distribution and then scale it by sigma. Also, if we straightaway use mu_theta(so always return mean instead of reparameterization trick to sample from P(xt-1)), then the entire reverse process would end up being deterministic.

    • @frimlinso1894
      @frimlinso1894 18 วันที่ผ่านมา

      @Explaining-AI that makes sense, thank you very much!

  • @Martingrossman78
    @Martingrossman78 25 วันที่ผ่านมา

    Hi, great explanation of RCNN with very useful insights which often are skipped. I am especially grateful for answering questions like "Why SVM, Why different IOU Thr, etc."

    • @Explaining-AI
      @Explaining-AI 25 วันที่ผ่านมา

      Happy that you found the explanation helpful!

  • @rannvijaysingh1
    @rannvijaysingh1 26 วันที่ผ่านมา

    Brother so good keep it up

  • @arizmohammadi5354
    @arizmohammadi5354 26 วันที่ผ่านมา

    Can we train our dataset with vq-gan (without transformer) and then use it in train_ddpm_vqve?

    • @Explaining-AI
      @Explaining-AI 25 วันที่ผ่านมา

      Hello, I might be misunderstanding your question so do let me know if thats the case. But for stable diffusion we don't need transformer. In the repo (github.com/explainingai-code/StableDiffusion-PyTorch?tab=readme-ov-file#training), what you are mentioning, is exactly what I implemented. Train VQVAE+PerceptualLoss+Discriminator (same as VQ-GAN without transformer, which is only needed for generating new latent images) on a dataset. Once auto-encoder part of VQGAN is trained, we then save the latent representations for all the training images, using the trained encoder of VQGAN. Finally, use these latent representations, to train LDM. We don't need to train transformer, as thats for generating new latent images , for which we are using DDPM.

    • @arizmohammadi5354
      @arizmohammadi5354 21 วันที่ผ่านมา

      @@Explaining-AI Thank you so much for your kind reply. Yeah, you are right.

  • @maximestudio2513
    @maximestudio2513 28 วันที่ผ่านมา

    Can you explain us how Multi-view Diffusion Base Model works please

    • @Explaining-AI
      @Explaining-AI 25 วันที่ผ่านมา

      Hello, have added this to my list. But since I am not familiar with it, as of now, will take me some time to cover this.

    • @maximestudio2513
      @maximestudio2513 25 วันที่ผ่านมา

      @@Explaining-AI thank you, you are amazing

  • @hammadkhan7927
    @hammadkhan7927 29 วันที่ผ่านมา

    Can you please share the notes of all object detection videos

  • @yoverale
    @yoverale หลายเดือนก่อน

    Just what i was needing. Thanks 🙏🏻

  • @hoangduong5954
    @hoangduong5954 หลายเดือนก่อน

    Please answer me, how can they train CLASS SPECIFIC bounding box regressor? So they input class as input in one model and regress the bounding box or they build multiple(if they detect 6 class then we build 6 model) model and each model they train on specific bounding box regressor? Please answer me

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Hello, I have tried to explain a bit on this, do let me know if this does not clarify everything for you. This is how the official rcnn repo does it. We create as many box regressor models as there are classes. Then we train each of these regressors separately using proposals assigned to the respective classes. github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_train_bbox_regressor.m#L76 During inference, given the predicted classes for proposals, we use the trained regressor for that class to modify the proposal . github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_test_bbox_regressor.m#L58-L65

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Btw you could also do this by one fc layer. Lets say you have 10 classes. Then your bounding box regressor fc layer predicts 10 times 4 , 40 values. These are tx ty tw th for all 10 classes. Then during training, the bounding box regression loss will be computed between the ground truth transformation targets and prediction values at indexes corresponding to ground truth class. At inference, you take the class index with highest predicted probability value. The predicted tx, ty,tw, th are then the 4 values(from 40) corresponding to this highest probable class.

    • @hoangduong5954
      @hoangduong5954 หลายเดือนก่อน

      @@Explaining-AIthank you alot!!!! I am fully understand it now. So they do train multiple models and choose the model based on class. Thats crazy though!

  • @adeirman2705
    @adeirman2705 หลายเดือนก่อน

    Please create yolo panoptic sir, it would be a huge help and it has so many applications

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Added this to my list. Will try to get to this as soon as I can.

  • @arizmohammadi5354
    @arizmohammadi5354 หลายเดือนก่อน

    It was great! Good luck!

  • @BlueQuantum
    @BlueQuantum หลายเดือนก่อน

    why gussian noise only added. Not Rician, Laplacian etc.. there are so many other probability distribution.

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Hello, have replied to something similar here(highlighted comment) - th-cam.com/video/H45lF4sUgiE/w-d-xo.html&lc=Ugznn1UksOPa3NfWLXR4AaABAg

  • @Kamalsai369
    @Kamalsai369 หลายเดือนก่อน

    Bro no one in this platform explained clearly as much as you Thankyou for providing these lectures for free of cost . I think for paid courses also no one can explain this much thankyou again

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Really happy that you found the explanation helpful :)

  • @rishidixit7939
    @rishidixit7939 หลายเดือนก่อน

    Subscribed

  • @sartq_333
    @sartq_333 หลายเดือนก่อน

    one of the finest videos on yolo available on internet. contains intuitive as well as detailed explanation (right from research paper). concepts like these are hard to explain in so much detail. thanks a lot for the amazing work, cheers!

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Thank you for this comment :)

  • @Mahan_Veisi
    @Mahan_Veisi หลายเดือนก่อน

    Fantastic video! You’re undoubtedly on your way to becoming one of the top lecturers in Generative AI. I’m excited to see more of your work in the future!

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Thank you so much for your words of encouragement and support :)

  • @Andrey41k
    @Andrey41k หลายเดือนก่อน

    Thanj you very much for the video, it is very interesting. Though, I have one question, at 15:40 timestanp. You mention that there may be a situation, where ground truth box doesn't have big IoU with any of anchor boxes. How do we pick these anchor boxes (i just cant get which methodology we have to follow when picking the dimensions for anchor boxes)?

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Thank You! For faster rcnn, that is the reason why we add low overlap anchor boxes as well(if they are indeed the best anchor box available). Here the authors did not tune anchor box selection at all for a dataset, they just pick one which captures large enough variation in terms of scale and aspect ratio. In models like yolov2 , they use the anchor box strategy but use k means to pick the the best anchor boxes. So once you use k means on your ground truth box dimensions, you end up with cluster centres that are good representatives of bxo dimensions in your dataset. These cluster centres then become a good choice for your anchor boxes width and height.

  • @alivecoding4995
    @alivecoding4995 หลายเดือนก่อน

    How does self attention work in convnets (instead of transformers)? 😊

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      After a reshape of the input, the self attention works exactly same as transformers. Assuming you have a BxCxHxW feature map at a certain stage of network. Then during self attention you reshape it into Bx(H*W)xC. Now it becomes very similar to how you would have seen it in transformers. H*W is the number of grid cells(tokens) and C is the embedding dimension of each token. We just compute attention between all spatial grid cells.

    • @alivecoding4995
      @alivecoding4995 หลายเดือนก่อน

      @ Thank you 😊

  • @robbegeusens1302
    @robbegeusens1302 หลายเดือนก่อน

    Great video, but why do you add 1E-6 when calculating your IOU?

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Thank You! That is just for ensuring the iou method never ends up doing a division by 0, like say in some degenerate case where bounding box area is zero(of both gt and prediction). That just makes the iou computation numerically stable no matter what the predicted and ground truth box is.

  • @raihanpahlevi6870
    @raihanpahlevi6870 หลายเดือนก่อน

    what do you mean by topk proposals 2000, is this from the single image we take 2000 proposals?

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Hello @raihanpahlevi6870, Yes thats correct. 2000 proposals are taken from a single image.

  • @yusuphajuwara1490
    @yusuphajuwara1490 หลายเดือนก่อน

    Thanks for this wonderfully intuitive video! It provided a fantastic breakdown of the fundamentals of diffusion models. Let me try to answer your question about why the reverse process in diffusion models is also a (reverse) diffusion with Gaussian transitions. Why Reverse Diffusion Uses Gaussian Transitions 1. Forward Diffusion Introduces Noise Gradually Remember the β term? In the forward process, β is chosen to be very small (close to 0). This ensures that Gaussian noise is added gradually to the data over many steps. Each step introduces only a tiny amount of noise, meaning the transition from the original image to pure noise happens slowly and smoothly. This gradual noise addition is crucial because it preserves the structure of the data for longer, making it easier for the reverse process to reconstruct high-quality images. If we added large amounts of noise in one go, like in VAEs, the original structure would be harder to recover, leading to blurrier reconstructions. 2. Reverse Diffusion Needs "Gaussian-Like" Inputs The forward process only involves adding isotropic Gaussian noise at each step. This means the model learns to work with samples that are progressively noised in a Gaussian way. However, in the reverse process, when the model predicts the noise at each step, the resulting sample isn't guaranteed to remain Gaussian-like. To fix this, after subtracting the model's predicted noise, we add a small Gaussian noise with a carefully chosen variance. This step helps "Gaussianize" the sample, ensuring it aligns with what the model expects at the next time step. This small added noise smoothens any irregularities and makes the reverse process more stable, resulting in higher-quality outputs. Step-by-Step Noise Removal The reverse process works by removing noise step-by-step, moving from pure noise back to a clean image (closer to x0 ). This gradual approach is crucial because predicting small changes (i.e., removing a little noise at a time) is much easier for the model than trying to reconstruct the clean image in one big jump. This is why diffusion models produce sharper and more realistic images compared to VAEs, where predictions often result in blurry outputs due to the lack of such gradual refinement.

  • @mariolinovalencia7776
    @mariolinovalencia7776 หลายเดือนก่อน

    Excellent. Complete, clear and to the point

  • @GouravJoshi-z7j
    @GouravJoshi-z7j หลายเดือนก่อน

    I am new to this field can anyone provide me with the prerequisites to understand this video

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Hello @GouravJoshi-z7j, I think this list covers the pre-requisites . Gaussian Distribution and its properties .......Mean/variance of adding two independent gaussians Reparameterization trick Maximum Likelihood Estimation Variational Lower Bound Bayes theorem, conditional independence KL Divergence, KL divergence between two gaussians VAE(cause the video incorrectly assumes knowledge about it) I may have missed something so in case there is some aspect of the video that you aren't able to understand even after that please do let me know

  • @alivecoding4995
    @alivecoding4995 หลายเดือนก่อน

    thank you very much!!! great explanation ❤

  • @Mahan_Veisi
    @Mahan_Veisi หลายเดือนก่อน

    Thank you! It was amazing. While there are limited content available for diffusion models, you did pretty nice.❤

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Thank you for your kind words :)

  • @abdulkarimatrash
    @abdulkarimatrash หลายเดือนก่อน

    Excellent Work ! Please Don't Stop :)

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Thank you so much for your support

  • @comunedipadova1790
    @comunedipadova1790 หลายเดือนก่อน

    Please don't use music in the background, it's very distracting, thanks.

    • @Explaining-AI
      @Explaining-AI หลายเดือนก่อน

      Thank you for the feedback. Have taken care of this in my recent videos.