Lecture 15: Object Detection

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ม.ค. 2025

ความคิดเห็น • 44

  • @zhaobryan4441
    @zhaobryan4441 11 หลายเดือนก่อน +4

    This is the best lecture that I have ever seen since SICP,so beautiful

  • @glowish1993
    @glowish1993 ปีที่แล้ว +1

    thank you for posting such high-quality lectures online for free!! amazing lecturer, slides and content

  • @neilteng1735
    @neilteng1735 3 ปีที่แล้ว +18

    Really love this step by step walk through! Hugh improve than the 2017cs231n course!

  • @sachavanweeren9578
    @sachavanweeren9578 2 ปีที่แล้ว +4

    Great lecture, very welll explained, step by step. Maybe the best I found so far.

  • @tunaipm
    @tunaipm 3 ปีที่แล้ว +6

    Another amazing class! I look forward to watching the updated version describing the use of Transformers in the coming years. Thank you Dr. Justin.

    • @terrelldean9481
      @terrelldean9481 3 ปีที่แล้ว

      I know it's quite off topic but does anyone know of a good site to watch new series online?

    • @chiendvhust8122
      @chiendvhust8122 2 ปีที่แล้ว

      @@samuelimran3429 Can you send a link? I search google but dont see anything :(

    • @sagniksinha5831
      @sagniksinha5831 5 หลายเดือนก่อน

      @@chiendvhust8122 latest videos are not publicly available

  • @kamranmehdiyev8561
    @kamranmehdiyev8561 ปีที่แล้ว +1

    57:53 should be "from anchor box to proposal box"

  • @DED_Search
    @DED_Search 3 ปีที่แล้ว +2

    49:59 how to project RoI onto feature map exactly? 50:10 does snapping projection to feature map grid affect transformation parameters of the bounding box regression?

    • @itchainx4375
      @itchainx4375 ปีที่แล้ว

      No you get wrong understanding. Box was obtained using heuristic methods on the original picture. The convnet can be seen as a transformaion. It converts the cat's picture into feature map. The convert process is the process of projection

  • @davidrwasserman
    @davidrwasserman 5 หลายเดือนก่อน

    When we compute the average precision (42:52) is this for one image? a batch? the whole training set?

    • @lumin-ec1mf
      @lumin-ec1mf 3 หลายเดือนก่อน

      all test images

  • @DED_Search
    @DED_Search 3 ปีที่แล้ว

    59:00 I don’t quite get the 2k anchor (2 scores) vs 1k (1 score) part. Hmmm

  • @DED_Search
    @DED_Search 3 ปีที่แล้ว

    42:12 I am really confused about why all dog detections are considered positive here (precision = 3/5)? Shouldn’t we set a threshold? Thanks.

  • @satyamgaba
    @satyamgaba 2 ปีที่แล้ว

    31:20 Purple box should be union of both the box. Here is it overflowing

  • @TomChenyangJI
    @TomChenyangJI 6 หลายเดือนก่อน

    I watched a lecture on RNN delivered by him on Stanford channel on YT, that was good

  • @mailoisback
    @mailoisback 3 ปีที่แล้ว +1

    He is a great lecturer!

  • @NielsRogge
    @NielsRogge 4 ปีที่แล้ว +4

    Looking at this coming from NLP, NLP seems like so much easier where you just have a Transformer with a sequence classification/token classification head on top.. Here you have a very complex way of computing mAP, region proposals, non-maximum suppression procedure, anchor generation... Luckily, the introduction of DETR by Facebook AI (which replaces a lot of these handcrated features by a Transformer which learns everything end-to-end) seems really refreshing :)

  • @daitran8266
    @daitran8266 3 ปีที่แล้ว

    Thank you very much for sharing these useful resources.

  • @DED_Search
    @DED_Search 3 ปีที่แล้ว

    23:00 and 23:41 how is transformation learnt invariant to RoI warp?1. Warpping changes height and width. 2. Warped RoI are fed into CNN. I’d appreciate if anyone can shed some light here. Thanks.

    • @itchainx4375
      @itchainx4375 ปีที่แล้ว

      Do you know the answer now?I have same question

  • @wireghost897
    @wireghost897 ปีที่แล้ว

    Great lecture. Thanks a lot.

  • @yahaisha
    @yahaisha 2 ปีที่แล้ว

    best lecture..i like..tq

  • @shazzadhasan4067
    @shazzadhasan4067 2 ปีที่แล้ว

    thank you for making available, amazing lec

  • @itchainx4375
    @itchainx4375 ปีที่แล้ว

    1:04:13 where is yolo :)

  • @Davide-bx3js
    @Davide-bx3js 2 ปีที่แล้ว

    Amazing lecture

  • @krishnatibrewal6640
    @krishnatibrewal6640 2 ปีที่แล้ว +3

    Surprisingly there's no mention of YOLO which makes RCNN family obsolete

    • @zainbaloch5541
      @zainbaloch5541 2 ปีที่แล้ว

      Yeah!

    • @itchainx4375
      @itchainx4375 ปีที่แล้ว +1

      Seems like teacher don't like Yolo. 2022Winter Lectures not even a word about yolo was mentioned

    • @lifanzhong9782
      @lifanzhong9782 ปีที่แล้ว

      yes I'm curious about it too. Only a flash of yolo paper reference at 1:03:57

  • @neelambujchaturvedi6886
    @neelambujchaturvedi6886 4 ปีที่แล้ว

    Why do the authors of the RCNN paper use a log scale transform to get the new scale factors for width ?

  • @lifanzhong9782
    @lifanzhong9782 ปีที่แล้ว

    Thank you Justin!!

  • @zubaidaalsadi4313
    @zubaidaalsadi4313 11 หลายเดือนก่อน

    I can't download the slides , is there any other way to get it ?

    • @cc98-oe7ol
      @cc98-oe7ol 8 หลายเดือนก่อน

      The resolution of these slides are quite high, so their size often exceed like 100 MB. Maybe the network is the main issue.

  • @hehehe5198
    @hehehe5198 ปีที่แล้ว

    Does anyone have link to the 2020 version?

    • @davidrwasserman
      @davidrwasserman 5 หลายเดือนก่อน

      drive.google.com/drive/folders/1LXriM9h8WNJGErlYQXIrNNytAzVaHBjF?usp=sharing

  • @QuyetNguyen-sg9dq
    @QuyetNguyen-sg9dq 4 ปีที่แล้ว +2

    thanks you very much

  • @elkwang4357
    @elkwang4357 2 ปีที่แล้ว

    Is Johnson the guy in the Stanford University?

    • @geen160
      @geen160 ปีที่แล้ว

      yessss

  • @phangb580
    @phangb580 9 หลายเดือนก่อน

    37:10

  • @harshdeepsingh3872
    @harshdeepsingh3872 6 หลายเดือนก่อน +1

  • @lukealexanderhwilson
    @lukealexanderhwilson 3 ปีที่แล้ว

    I wonder if mean average precision could be calculated faster while still incorporating the performance of the bounding boxes by simply factoring the detections by their IOU's and using the results instead of rerunning at many different thresholds and averaging.
    For example, perfect Mean Average Precision would impossibly be the first detections all correctly identifying the detectable objects in the image, and the detections all had an IOU of 1.0. Essentially rather than calculating the area under a curve on a 2D plot with precision and recall and replotting many times at various thresholds. We would instead calculate a 3d volume, where a 2d plot of detections matched against a third dimension that represents the IOU (or some factored IOU if it's better).
    It seems to me that that would achieve the same results more quickly and elegantly, if anyone knows more though I would love to hear about it!