Reading DAB Detr source code

Anchor Boxes | Essentials of Object Detection

DN Detr (Denoising Detr for object detection)

เปิดชื่อ "บอสพอล" และพวก 18 คนเครือ "ดิไอคอน" โดนจับฐานฉ้อโกง- พรบ.คอม | 16 ต.ค. 67 | ข่าวเย็นไทยรัฐ

龟兔赛跑：好可爱的小乌龟#short #angel #clown

คอมเมนต์ชาวซีเรียซูฮกแข้งไทย หลังทัพช้างศึกชนะทีมชาติซีเรีย 2-1 คว้าแชมป์คิงส์คัพในรอบ 7 ปี

DAB Detr (dynamic anchor boxes)

Mak Gaiduk

มุมมอง 1 007

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 17 ต.ค. 2024

ความคิดเห็น • 10

@eliaweiss1 6 หลายเดือนก่อน ⁺¹
Great content!
Some remarks:
* you say that the residual connection makes 'half of the gradient flow to the anchor box', but it is not precise since the + operation (residual connection) passes the gradient as is (i.e. not half) and this is the key feature that help the gradient to propagate to the anchor (ie lower layers), essentially avoiding vanishing gradient
* The division in the H,W is indeed strange, I would expect them to use multiply operation, but still it is not intuitive how any operation on a sine embedding should reflect the width and height, any way I just want to suggest that, for the network, it doesn't actually matter if it's a division or multiply, since it will learn to treat it as needed according to the loss function, so maybe they decided to use division to keep the numbers at a limited range, and the learnable weight is are just scalers that the network learns to (again) keep a reasonable numbers range
I'm no expert, so take this remarks with a grain of salt :)
@eliaweiss1 6 หลายเดือนก่อน
To me it seems that query sine code is a bit messy and random, I wouldn't be surprised if coming researches will improve on this point
@makgaiduk 6 หลายเดือนก่อน
Thanks for clarifications about the gradient!
I think I understand hw modulation better after David's comment.
The modulation happens because of Softmax properties:
x = torch.Tensor([0, 0.25, 0.5, 0.75, 1, 0.75, 0.5, 0.25, 0])
s = torch.nn.Softmax(dim=0)
s(x)
# tensor([0.0675, 0.0867, 0.1113, 0.1429, 0.1834, 0.1429, 0.1113, 0.0867, 0.0675])
s(x/2)
# tensor([0.0878, 0.0995, 0.1127, 0.1277, 0.1447, 0.1277, 0.1127, 0.0995, 0.0878])
s(x*2)
tensor([0.0369, 0.0609, 0.1004, 0.1655, 0.2728, 0.1655, 0.1004, 0.0609, 0.0369])
Without Softmax, multiplying/dividing a tensor by a constant wouldn't change relative ratio between coordinates. With Softmax - it will; and to achieve correct modulation, we indeed need to have height/width in the denominator. So with high Height/Width in the denominator, after softmax the relative ratios between coordinates become smaller, effectively making attention more spread out around the same center; with small Height/Width in the denominator, the ratios become starker, making attention more focused around the central point
@davidro00 8 หลายเดือนก่อน ⁺¹
To refer to the width and height modulation: I can think a bit of it like they obtain a relational vector between w and h of content query and anchor box with the division. They then multiply it element wise to scale the attention map (or more specific the positional embedded input). Essentially, by dividing content query height by anchor box height, what you get is a scaling factor which then is used to modulate the positional embeddings in the transformer block BEFORE the softmax. This can increase or decrease the similarity between key and query and thus can be optimized during training.
At least this is what i think about it, but it actually could be deepend a bit more in their paper 😅
@makgaiduk 8 หลายเดือนก่อน
Nice point! So the softmax seems to be the key here.
@davidro00 8 หลายเดือนก่อน ⁺¹
@@makgaiduk yeah, i think that makes the most sense 👍🏼
@A.El-Taher 7 หลายเดือนก่อน ⁺¹
Great content ❤
Detr can be used in instance segmentation if we add a mask head ... can it also be used in deformable detr and DAB detr ??
@makgaiduk 7 หลายเดือนก่อน ⁺¹
Great question!
Looks like it can: arxiv.org/pdf/2206.02777.pdf
DINO is a famous model for object detection, that had SOTA status for some time. It uses both dynamic anchor boxes and deformable attention, as well as a new technique that I am about to cover in the next video - query denoising.
Mask DINO builds on top of that by adding a mask head and modifying some components a little to make them fit better with segmentation task. It still uses both dynamic anchor boxes and deformable attention as key components in the decoder.
@A.El-Taher 7 หลายเดือนก่อน ⁺²
@@makgaiduk Exciting 🎉
I'm waiting for the next video 😊
@makgaiduk 6 หลายเดือนก่อน
Check out my next video: reading DAB detr source code th-cam.com/video/eClBoEnn9k4/w-d-xo.html

ต่อไป

เล่นอัตโนมัติ

Reading DAB Detr source code

Reading DAB Detr source code

Anchor Boxes | Essentials of Object Detection

Anchor Boxes | Essentials of Object Detection

DN Detr (Denoising Detr for object detection)

DN Detr (Denoising Detr for object detection)

เปิดชื่อ "บอสพอล" และพวก 18 คนเครือ "ดิไอคอน" โดนจับฐานฉ้อโกง- พรบ.คอม | 16 ต.ค. 67 | ข่าวเย็นไทยรัฐ

เปิดชื่อ "บอสพอล" และพวก 18 คนเครือ "ดิไอคอน" โดนจับฐานฉ้อโกง- พรบ.คอม | 16 ต.ค. 67 | ข่าวเย็นไทยรัฐ

龟兔赛跑：好可爱的小乌龟#short #angel #clown

龟兔赛跑：好可爱的小乌龟#short #angel #clown

คอมเมนต์ชาวซีเรียซูฮกแข้งไทย หลังทัพช้างศึกชนะทีมชาติซีเรีย 2-1 คว้าแชมป์คิงส์คัพในรอบ 7 ปี

คอมเมนต์ชาวซีเรียซูฮกแข้งไทย หลังทัพช้างศึกชนะทีมชาติซีเรีย 2-1 คว้าแชมป์คิงส์คัพในรอบ 7 ปี

ถ้าหมูเด้งเป็นโปเกม่อน #shorts

ถ้าหมูเด้งเป็นโปเกม่อน #shorts

Living off Microsoft Copilot

Living off Microsoft Copilot

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

Fellowship: Vision Transformer with Deformable Attention

Fellowship: Vision Transformer with Deformable Attention

Deformable DETR

Deformable DETR

DETR: End-to-End Object Detection with Transformers (Paper Explained)

DETR: End-to-End Object Detection with Transformers (Paper Explained)

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital

Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital

[Tutorial] Training End-to-end Object Detection with Transformer(DETR) model on custom dataset

[Tutorial] Training End-to-end Object Detection with Transformer(DETR) model on custom dataset

รวม 3 คนดังสุดชั่วโมงนี้ "บอสพอล-ธเนตร-ฟลุค" : 17-10-67 | iNN Top Story

รวม 3 คนดังสุดชั่วโมงนี้ "บอสพอล-ธเนตร-ฟลุค" : 17-10-67 | iNN Top Story

Turn the light off pls

Turn the light off pls

ศรีโคตรบอง : ออโต้ ธีรพงษ์ | OFFICIAL MV 4K

ศรีโคตรบอง : ออโต้ ธีรพงษ์ | OFFICIAL MV 4K

🔴Live โหนกระแส ติดกับดัก...รักบอสตัวร้าย #4 "ตอนตามหาหมอและคนเก็บขยะ"

🔴Live โหนกระแส ติดกับดัก...รักบอสตัวร้าย #4 "ตอนตามหาหมอและคนเก็บขยะ"

จานสีน้ำจิ๋ว อันเท่านิ 🤏

จานสีน้ำจิ๋ว อันเท่านิ 🤏

Trading True Triple Dark Blade ?⚔️| Doge Gaming

Trading True Triple Dark Blade ?⚔️| Doge Gaming

#วิเคราะห์ แทคติก ไทยแชมป์ เดอะตุ๊ก เปิดสูตร ทีมชาติไทย 10 เต็ม 10 ท็อปฟอร์มสุด !!

#วิเคราะห์ แทคติก ไทยแชมป์ เดอะตุ๊ก เปิดสูตร ทีมชาติไทย 10 เต็ม 10 ท็อปฟอร์มสุด !!

Flipping Robot vs Heavier And Heavier Objects

Flipping Robot vs Heavier And Heavier Objects