YOLO V1 - YOU ONLY LOOK ONCE || YOLO OBJECT DETECTION SERIES

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ม.ค. 2025

ความคิดเห็น •

  • @MLForNerds
    @MLForNerds  2 ปีที่แล้ว +4

    Watch my latest in-detailed video on YOLO-V2 object detector.
    th-cam.com/video/PYpn1GSwWnc/w-d-xo.html

  • @TimidMeercat
    @TimidMeercat ปีที่แล้ว +17

    After viewing multiple videos on YOLO workings, I found your video very detailed and helpful. Thanks!

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว +1

      Thank you Nitin, glad it helped you.

  • @madhavpr
    @madhavpr ปีที่แล้ว +11

    Hands down the BEST explanation of the Yolo family found online. Great job brother!! Keep up the great work.

  • @devvfakeaccount
    @devvfakeaccount 19 วันที่ผ่านมา

    This is as close to perfect as it's going to get for explaining the core of YOLO. Thank you

  • @shubha07m
    @shubha07m ปีที่แล้ว +19

    I am so surprised that, you are doing such a phenomenal job, (trust me: almost no TH-cam channel does such a deep dive into theoretical understanding video!), but you do not have so many subscribers! I will definitely spread about this excellent channel.

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว

      Glad you enjoy it! Thank you!

  • @SonNguyen-y2e3o
    @SonNguyen-y2e3o 10 หลายเดือนก่อน +2

    Bro ! I stuck to understand Yolo until I found your video. This deserves more than 15k views. now I know at least how Yolo working

  • @Dontknow-s2x
    @Dontknow-s2x 3 หลายเดือนก่อน

    2 min silence for those who can't find this video ! Best video for yolo i read it paper,watch video , read article and i was confused like hell in loss fn and bounding box now its clear thank you so much i recommed to everyone who is planning to study deep learning
    ❤♥

  • @ahsentahir4473
    @ahsentahir4473 9 หลายเดือนก่อน +2

    Great! I have not seen such indepth explanation anywhere. God bless you!

    • @MLForNerds
      @MLForNerds  9 หลายเดือนก่อน

      Glad it was helpful!

  • @aryangaur276
    @aryangaur276 11 หลายเดือนก่อน +3

    You are really awesome. My all concepts cleared.

  • @ankitsharma-ol9qn
    @ankitsharma-ol9qn 3 หลายเดือนก่อน

    Greatest lecture... I have ever seen on youtube...Thank you so much..

  • @chandank5266
    @chandank5266 ปีที่แล้ว +1

    The best video on yolo v1 so far.

  • @zaidazhari9386
    @zaidazhari9386 ปีที่แล้ว +5

    Thank you very much sir, i've been watching few videos regarding YOLO v1, but had difficulty grasping the loss function. But your video has helped a lot in understanding it 👍👍👍

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว

      You are most welcome

  • @ParbatSingh-sl3ko
    @ParbatSingh-sl3ko 9 หลายเดือนก่อน +2

    Loved the simplicity of explaining, and the presentation was also very minimal and apt. You really deserve more subs and views
    🙌❤

  • @consumeentertainment9310
    @consumeentertainment9310 5 หลายเดือนก่อน

    You rock!!! It was very detailed. Clearly, you have out a lot of work into this. Thank you so much🙏🙏🙏🙏🙏🙏

  • @kvnptl4400
    @kvnptl4400 9 หลายเดือนก่อน +2

    🌟A very in-depth analysis of the paper. I would say this is one of the best easy to understand explanations of YOLOv1. Keep up the good work

    • @MLForNerds
      @MLForNerds  9 หลายเดือนก่อน +1

      Glad it was helpful!

  • @AyushAgarwal-r4v
    @AyushAgarwal-r4v ปีที่แล้ว

    You are a god man ! Thanks for such clear and deep explanations of Yolo.

  • @rogribas
    @rogribas 2 หลายเดือนก่อน

    Incredible explanation, thank you very much

  • @mdminhazurrahman8673
    @mdminhazurrahman8673 ปีที่แล้ว +1

    your videos are gems bro!! I have not got such a clear explanation on yolo anywhere. please make a video on yolov5 as well. thank you!!!

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว

      Thank you Rahman, Sure will make.

  • @giriprasad5221
    @giriprasad5221 ปีที่แล้ว +1

    Great work in the image, class probability map says that cell occupies max area than we are giving that class and building targets we are just giving zeros to the cells which contains of center of object

  • @jayeshshinde9625
    @jayeshshinde9625 หลายเดือนก่อน

    03:40 Iam confused, how the cell would know that the ground truth object's center falls inside it both in training and inference part. And after that , how the cell predicts the x, y, w, h, coordinates (anchors) as we don't know the size or shape of the object. Cause after training, the CNN would be able to extract the object features. Hence Objectness scores and class probabilities for each cell are understandable.

  • @harshans7712
    @harshans7712 4 หลายเดือนก่อน

    Thanks a lot for your video, this helped me a lot to understand its working

  • @salilbhatnagar3260
    @salilbhatnagar3260 ปีที่แล้ว +1

    A great lecture about YoLO! Thanks!

  • @adityaa8918
    @adityaa8918 6 หลายเดือนก่อน

    Underrated. Keep going man!

  • @azama74
    @azama74 ปีที่แล้ว

    Thanks, your videos is the best from another related videos of yolo expalanation

  • @poojaverma9168
    @poojaverma9168 ปีที่แล้ว +1

    Great content, very informative wating for the next versions...🙂

  • @AryanKumarBaghel-cp1jv
    @AryanKumarBaghel-cp1jv 7 หลายเดือนก่อน

    Fantastic explaination. Super clear

  • @pavantripathi1890
    @pavantripathi1890 ปีที่แล้ว +1

    Great explanation of loss function.

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว +1

      Glad you liked it

  • @Тима-щ2ю
    @Тима-щ2ю 8 หลายเดือนก่อน

    Hi! thank you for your wonderfull explanation! Unfortunately in the original paper there are many unclear moments. Your video helped me a lot. But i still have some questions.
    1) "Grid cell is "responsible" if the center of bbox falls into it." In training data we have annotated bboxes. But in test data there are no annotated bboxes and therefore centers. So which grid cell will be "responsible" in that case?
    2) if c < threshold, then we simply nullify all the values in the vector or we should train the model to nullify the vector on its own?
    3) if only 2 grid cells (in your case) predict the coordinates of bboxes, what is the use of the other 47 grid cells (are the useless at all or not?)
    4) How one small grid cell (64x64) predicts a box for an object that is a way bigger than this cell (450x450)?
    5) Why you are telling that there are only 2 object cells, if the woman overlap at least 6 cells? Maybe you mean only 2 "responsible" cells?

  • @Endless_emotions1
    @Endless_emotions1 ปีที่แล้ว

    Amazing video overall 👏

  • @lovekesh88
    @lovekesh88 3 หลายเดือนก่อน

    class confidence is not conditional probability, the individual probabilities are conditional P(class_i | Object) and when you multiply with C_1 aka Pr(object), we get non conditional probabilities i.e. only Pr(Class_i)

  • @aishwaryamahajan6773
    @aishwaryamahajan6773 2 ปีที่แล้ว +2

    Awesome Content, please can you also create videos on RCNN, SPPNet, Fast RCNN, SSD and FPN, It would vey grateful, if possible. Very well explained. Waiting for more on videos🙂

  • @giuliorusso1046
    @giuliorusso1046 9 หลายเดือนก่อน +1

    But in the paper they say the objecness is Pr(Object) x IoU. Can anyone explain that? Why the video say 1?

  • @adityakrishnajaiswal8663
    @adityakrishnajaiswal8663 5 หลายเดือนก่อน

    Just wow!

  • @sagaradoshi
    @sagaradoshi 9 หลายเดือนก่อน

    Hello , Great explanation on the content. Not seen such detailed content on YOLO. I have some question looking forward for your support.
    1. Each cell can have two bounding box, but how is that the size of bounding box for each grid cell be different. For example in grid cell1 one bounding box could be rectangle and other as square. Or both are rectangles with different dimensions. So how is this possible?
    2. Each bounding box provides x,y,w,h relative to grid cell starting co-ordinate and original/ground truth width and height bounding box. Correct? What I didn't further understand is how each cell calculates it C score value per bounding box and how it calculated probabilities value?
    3. Then later you mentioned that out of two bounding box any one is considered for each cell based on confidence score of that bounding box * class probability right?
    4. When you are calculating the final loss.
    a. For cell with object , we took one of two bounding box and its x,y,w,h and c value and compared with ground truth value . Right?
    b. For cell with no object, we took C values from both bounding box and subtracted with 0 since ground truth confidence score is 0 for that cell. Right?
    5. Do we use IOU to calculate C value per bounding box per grid cell? If yes, how is it possible to calculate C value per grid as IOU depends on original size of bounding box which may spread across cells. Isn't?
    5. To get this ground truth value for each cell (x,y,w,h,c, p1....p20) do we do manual annotation for all the images in dataset if its custom dataset?
    Looking forward for your support

  • @jeffreydanowitz3083
    @jeffreydanowitz3083 ปีที่แล้ว

    This is a very clear and concise video. It really helped me to put everything together. Question: each supergrid box has associated with it 2 bounding boxes for the object. So the algorithms allows for dual results. If surrounding supergrid boxes decide to give some confidence - say for a larger object - is there some non-maximal suppression or some mechanism that makes sure that each object is reported, in the end, only 1 time?
    Also just for clarity - in the training, the 2 5 valued vectors for the box are identical, I assume. Is this correct? We are just giving the algorithm some breathing space by potentially finding 2 bounding boxes per supergrid boxes in my understanding. Is this also correct?

  • @vishnum7985
    @vishnum7985 2 ปีที่แล้ว +2

    Great content.
    Can you create a videos on latest YOLO models (7).
    Waiting for more. Good Luck!

    • @MLForNerds
      @MLForNerds  2 ปีที่แล้ว

      Thank you Vishnu, I will make all the yolo versions one by one.

  • @Jayanth_mohan
    @Jayanth_mohan ปีที่แล้ว

    It was really awesomoe Learnt a lot !! Thanks

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว

      Glad you liked it! Thank you 👍

  • @mallaswetha5629
    @mallaswetha5629 ปีที่แล้ว +1

    sir can you explain yolov5 or suggest me the best video for yolov5??????

  • @prodbyprodigy
    @prodbyprodigy 6 หลายเดือนก่อน

    this is such a great vid

  • @sumanthpichika5295
    @sumanthpichika5295 6 หลายเดือนก่อน

    very detailed explanation, Thanks for making it more clear. I believe i didn't find any such video with the way you explained the things in deep. I have a doubt when you said total loss = obj loss+no obj loss, In the example you considered only 2 grid cells has an object which means obj loss is calculated for those 2 grid cells and remaining 47 grid cells falls under no obj loss right?

  • @daminirijhwani5792
    @daminirijhwani5792 4 หลายเดือนก่อน

    This is amazing could you do a transformer series!

  • @fotoluminescencjastudiesai1239
    @fotoluminescencjastudiesai1239 6 หลายเดือนก่อน +1

    great video, now I finally understand it :) could you just please clarify why in 22:32 only 2 grid cells contain objects? the woman appears in a few other cells as well, so why only two?

    • @MLForNerds
      @MLForNerds  6 หลายเดือนก่อน

      Wherever the object centroid falls, only those cells are considered

    • @fotoluminescencjastudiesai1239
      @fotoluminescencjastudiesai1239 6 หลายเดือนก่อน

      @@MLForNerds thank you!! just to make sure that I understand correctly - in this example, one cell has a centroid for the horse and one has a centroid for the person?
      also, are you planning on making a video on Yolo v7? :)

    • @MLForNerds
      @MLForNerds  6 หลายเดือนก่อน +1

      Yes, you are right regarding object centers. I will continue this series and finish all yolo versions

  • @AndreiChegurovRobotics
    @AndreiChegurovRobotics 8 หลายเดือนก่อน

    great material!

  • @YarkoFFXI
    @YarkoFFXI ปีที่แล้ว +4

    Great content man, I'm really grateful for your videos. I have 2 questions regarding YOLO v1 that I hope you can help me with.
    1) how did the authors pretrain the model on 224x224 images, and then "resize" their network to accommodate 448x448 images for further training? Were you able to find details about this step?
    2) the authors state that yolo considers the whole image as opposed to more classical sliding window techniques such as overfeat. Is this thanks to the fully connected layers at the end? Because up until the 7x7x1024 conv layer, each activation has a receptive field that is smaller than the full image. So the only step that is a function of the whole image are the last FC layers.. And that's one weird architecture, my brain has a hard time keeping track what is going on, considering the flattening, the dense layers, and then reshaping again. Ugh.

    • @thuytran2880
      @thuytran2880 ปีที่แล้ว

      hello, i read your cmt and such a very amazing question. It almost 5 months ago, but I wanna ask have you found out the answer? If you have, can you share the answer with me

    • @YarkoFFXI
      @YarkoFFXI ปีที่แล้ว

      @@thuytran2880 no unfortunately I haven't made any progress in finding these answers :/

    • @praveenkandula8011
      @praveenkandula8011 11 หลายเดือนก่อน

      "the authors state that yolo considers the whole image as opposed to more classical sliding window techniques such as overfeat. Is this thanks to the fully connected layers at the end? ". The network structure doesn't play anyrole but the way they train does. In sliding window, slices of image pass thorugh a classifer multple times. Whereas in yolo, image is passed single time and the bounding box predictions are caluclated.

  • @spencergameing3575
    @spencergameing3575 ปีที่แล้ว +1

    there are two C scores if the grid cell contains object. then for( Ci-Ci^)^2 whihch one should we consider

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว

      Consider the highest confidence score and it's corresponding object.

  • @benna_plusplus
    @benna_plusplus 11 หลายเดือนก่อน +1

    Thank you for the video.. very well detailed. I have a question: how Yolo create 2 bounding box for each celll? By randomly creating the coordinates? This is still not clear for me.

    • @MLForNerds
      @MLForNerds  9 หลายเดือนก่อน

      Yes, correct. Box coordinates are learned as regression parameters.

  • @nayabwaris-pl8lj
    @nayabwaris-pl8lj 7 หลายเดือนก่อน +1

    please make video soon on remaining yolo variants

  • @thuytran2880
    @thuytran2880 ปีที่แล้ว +2

    Thank youu, you helped me so much. But can I ask you a question? I tried to find the knowledge about yolov1: the paper, websites, ... but I didnt find any sources having detailed knowledge as your video. Please, can you share to me how do you search and have this deep understanding. I will be very very very very very happy if you see my comment and reply me.

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว +1

      Yes of course! Read the paper and look inti the code implementation to understand in detail. Once you look at the implementation, most of your doubts get clarified. Hope it helps!

    • @tttmgwl
      @tttmgwl ปีที่แล้ว

      Thank you very much❤❤, i will try reading the code

  • @pratikpatil2866
    @pratikpatil2866 5 หลายเดือนก่อน

    could you please mention source of the mathematical explanations it would be great help.

  • @akarshjain7141
    @akarshjain7141 ปีที่แล้ว +1

    Why IoU is not taken into account while selecting the bbox out of 2 predicted bounding box?

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว +1

      During prediction, there is no groundtruth, how can we calculate IOU?

    • @ZakiMubarak-wk1vl
      @ZakiMubarak-wk1vl ปีที่แล้ว

      @@MLForNerds then, when do we use IoU?

  • @mdabdullahalhasib1730
    @mdabdullahalhasib1730 11 หลายเดือนก่อน +1

    please release all the version of YOLO. Thanks

  • @matecaste
    @matecaste ปีที่แล้ว

    I have two question. If YOLO predicts two boxes, how do you create the label? Do you repeat (x,y,w,h,c) two times?? And finally, what would you do in the process of create the label if the center of two objects are in the same cell?? Thank you, NICE VIDEO!!

  • @asthapatidar5507
    @asthapatidar5507 11 หลายเดือนก่อน +1

    how the center of object is marked?..........for calculating the target?

    • @MLForNerds
      @MLForNerds  9 หลายเดือนก่อน

      That happens during training. You can calculate the center from bounding box obtained from GT

  • @srivaasjaideep522
    @srivaasjaideep522 9 หลายเดือนก่อน +1

    how is the center of the object detected?

    • @MLForNerds
      @MLForNerds  9 หลายเดือนก่อน +1

      From the groundtruth box, we can calculate the center of the object. It's used to identify which grid is responsible for detecting that object.

  • @karthickmurugan8478
    @karthickmurugan8478 3 หลายเดือนก่อน

    Please explain more yolo versions from yolov5

  • @luansouzasilva31
    @luansouzasilva31 8 หลายเดือนก่อน +1

    If only one grid cell is labeled as class X, how does it get the bbox for the entire object?

    • @MLForNerds
      @MLForNerds  8 หลายเดือนก่อน +1

      Grid call is only for box centre, the box dimensions will be learned as regression parameters

  • @prathameshdinkar2966
    @prathameshdinkar2966 ปีที่แล้ว +1

    Very nicely explained!
    I have a doubt, what if there are more than one gt box centers in one cell?

    • @neeru1196
      @neeru1196 ปีที่แล้ว

      That's one of the limitations I guess. Each cell can only output one class.

    • @prathameshdinkar2966
      @prathameshdinkar2966 ปีที่แล้ว

      @@neeru1196 Ok thanks

  • @dmlane_sougata
    @dmlane_sougata ปีที่แล้ว

    Sir, can you please explain YOLOv5 architecture.

  • @abbasjafarpour4506
    @abbasjafarpour4506 ปีที่แล้ว

    Very informative, thank you.

  • @Raj-xz4vz
    @Raj-xz4vz ปีที่แล้ว +1

    How we got ground truth value here i.e 200,311,142,250

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว +1

      Groundtruth values are provided by dataset.

  • @dimitrisspiridonidis3284
    @dimitrisspiridonidis3284 2 ปีที่แล้ว

    I often see in other videos people saying that width and height is relative to the grid contrary to the paper which clearly states relative to the image. Even Andrew ng him self in his courses says relative to the grid meaning that width and height can be greater than 1 , I wonder why is every one get's it wrong maybe they change it relative to the grid in the next papers.

    • @MLForNerds
      @MLForNerds  2 ปีที่แล้ว +2

      Yes, but I checked few implementations, they are implementing as in the paper. Only x&y is encoded with respect to grid cell. Width and height are just normalised by image dimensions.

  • @techie_gangwar
    @techie_gangwar ปีที่แล้ว +1

    Can you share the ppt? It's really helpful

    • @MLForNerds
      @MLForNerds  ปีที่แล้ว

      github.com/MLForNerds/YOLO-OBJECT-DETECTION-TUTORIALS

  • @yasht1328
    @yasht1328 ปีที่แล้ว +1

    Bro please upload YOLOv5 model as soon as possible 🙏

  • @jeffg4686
    @jeffg4686 9 หลายเดือนก่อน

    Grounding Dino, what do you guys need a refresher course?
    It's all YOLO World these days...
    th-cam.com/video/SjJYNZirQCU/w-d-xo.html