Reading Codetr source code - part 1 out of 2

RT DETR - realtime object detection with transformers

DETR - End to end object detection with transformers (ECCV2020)

BABYMONSTER - 'DRIP' M/V

Players vs Pitch 🤯

[Part 1] รักมาไม่ทันตั้งตัว❤ หนุ่มสาวโสดถูกจับมาแต่งงานกันกะทันหัน แถมเธอยังเป็นหัวหน้าของเขา?!

CoDETR - SOTA object detection with transformers

Mak Gaiduk

มุมมอง 1 211

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 พ.ย. 2024

ความคิดเห็น • 13

@pologogo03 หลายเดือนก่อน ⁺¹
thanks you for this amazing serise of DETR videos. it really helps me understand how transformer works for object detection task.
@ThanhPham-qu5qk 2 หลายเดือนก่อน ⁺¹
hey there, i'm really inspired by your works, there is currently new SOTA of detr models, which names RT-DETR could you make a video about it?
@makgaiduk 2 หลายเดือนก่อน ⁺¹
Certainly, RT DETR is next on my list!
Though I believe RT DETR is better in terms of inference speed/model size to quality trade-off, absolute accuracy is not better than CoDETR
@olumidegodson5799 2 หลายเดือนก่อน
Dope!
@saeedahmad4925 2 หลายเดือนก่อน
Hey man! great work.
I have this question that I searched online but couldn't find any intuitive answers. Why do DETRs use seperate backbones, why not use transformer based backbones as encoder and backbone both.
@makgaiduk 2 หลายเดือนก่อน
I could think of the following reasons:
- CoDETR input is multi-scale. I.e., a backbone encoder (like in SwinL) will project the image into smaller dimensions in later layers, while for object detection we need all the little pixelwise details. For this purpose, we take later outputs of backbone as well as earlier ones, flatten them, concatenate together and pass to Detr encoder, thus allowing information exchange between scales
- Scale of the data: ViT was pretrained on 300m JFT dataset, which probably took millions of dollars. DETRs train on a smaller CoCo dataset with around 100k train images. In this regard, DETR encoder can bee seen as a smaller "adapter" to quickly finetune on a different target
@davidro00 2 หลายเดือนก่อน
Great answer. I think the fusion of cnns and transformers are really hyped right now because you obtain benefits from both worlds - the inductive bias + smaller & faster models within cnns and then the unbiased refinement of these features by (computationally expensive) transformers. You should have a look on parameter numbers and training time benchmarks between using a resnet50 and vit - while AP stays in a relative moderate range
@saeedahmad4925 2 หลายเดือนก่อน
@@makgaiduk Actually that could not be the sole reason at all, because you can also extract intermediate outputs of transformer encoder and create FPN by downsampling later layers and upsampling earlier to get those 4,8,16,32 stride features. I have actually done that, works pretty well. IMO the only intuitive reason I can come up with is that the CNNs are better at aggregation of local neighbourhood features. Transformers reason better globally. But then Faster RCNN produces best results with SWIN transformer backbone in documented experiments. I will try someday to train DETR without backbone and see for myself how it pans out.
@saeedahmad4925 2 หลายเดือนก่อน
@@davidro00 The question was not about replacing Resnet with transfomer, Question is why use resnet, why not use transformer encoder as backbone and encoder both.
@davidro00 2 หลายเดือนก่อน
@@saeedahmad4925 i see, my thought about this was:
Encoder (but with more layers) = vit + encoder
But if you remove the backbone and dont scale up the encoder, for me this intuitively is way to low representational power. If you can prove this wrong please tell me!
@ocj5320 2 หลายเดือนก่อน
👍👍
@落影摇情 หลายเดือนก่อน
can i ask for your ppt?
@makgaiduk หลายเดือนก่อน
github.com/adensur/blog/blob/main/computer_vision_zero_to_hero/28_CoDetr/presentation.key

ต่อไป

เล่นอัตโนมัติ

Reading Codetr source code - part 1 out of 2

Reading Codetr source code - part 1 out of 2

RT DETR - realtime object detection with transformers

RT DETR - realtime object detection with transformers

DETR - End to end object detection with transformers (ECCV2020)

DETR - End to end object detection with transformers (ECCV2020)

BABYMONSTER - 'DRIP' M/V

BABYMONSTER - 'DRIP' M/V

Players vs Pitch 🤯

Players vs Pitch 🤯

[Part 1] รักมาไม่ทันตั้งตัว❤ หนุ่มสาวโสดถูกจับมาแต่งงานกันกะทันหัน แถมเธอยังเป็นหัวหน้าของเขา?!

[Part 1] รักมาไม่ทันตั้งตัว❤ หนุ่มสาวโสดถูกจับมาแต่งงานกันกะทันหัน แถมเธอยังเป็นหัวหน้าของเขา?!

TOP 9 MGI QUEEN’S CELEBRATION 2024

TOP 9 MGI QUEEN’S CELEBRATION 2024

Tutorial: Efficient Gaussian Splatting | CVPR 2024

Tutorial: Efficient Gaussian Splatting | CVPR 2024

Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

Scalable Diffusion Models with Transformers | DiT Explanation and Implementation

Scalable Diffusion Models with Transformers | DiT Explanation and Implementation

How YOLO Object Detection Works

How YOLO Object Detection Works

DETR: End-to-End Object Detection with Transformers (Paper Explained)

DETR: End-to-End Object Detection with Transformers (Paper Explained)

Object Detection with Transformers (DETR)

Object Detection with Transformers (DETR)

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Car Bubble vs Lamborghini

Car Bubble vs Lamborghini

Horwang Sisters Plus l ตีจริงไม่มีคอนเท้นต์ กับชีวิตของ 3 พี่น้อง youtuber ’บาส บิว โบว์‘

Horwang Sisters Plus l ตีจริงไม่มีคอนเท้นต์ กับชีวิตของ 3 พี่น้อง youtuber ’บาส บิว โบว์‘

Epic Ghost Car EP.77 พิสูจน์ผี!! ใต้สะพานรถไฟผีสิง (Feat.Oaujun Hiend)

Epic Ghost Car EP.77 พิสูจน์ผี!! ใต้สะพานรถไฟผีสิง (Feat.Oaujun Hiend)

พี่เลี้ยงของฉันเป็นบาร์บี้

พี่เลี้ยงของฉันเป็นบาร์บี้

ลองทำเยลลี่หมีกัมมี่แบร์แช่แข็ง เยลลี่หมีกรอบ frozen gummy bear #chengandrock #luckytree

ลองทำเยลลี่หมีกัมมี่แบร์แช่แข็ง เยลลี่หมีกรอบ frozen gummy bear #chengandrock #luckytree

ไม่คิดว่า‼️ แม่หน่อยจะทำแบบนี้กับโชว์อุ๋งอิ๋งหน้าเวที

ไม่คิดว่า‼️ แม่หน่อยจะทำแบบนี้กับโชว์อุ๋งอิ๋งหน้าเวที

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 10 : บอร์นมัธ พบ แมนเชสเตอร์ ซิตี้

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 10 : บอร์นมัธ พบ แมนเชสเตอร์ ซิตี้

[LIVE] : ONE ลุมพินี 85 | คู่เอก "ยอดเหล็กเพชร vs ผึ้งหลวง"

[LIVE] : ONE ลุมพินี 85 | คู่เอก "ยอดเหล็กเพชร vs ผึ้งหลวง"