38
24 120

CoDETR - SOTA object detection with transformers

39:27

Multiple Object Tracking Metrics - MOTA, IDF1, HOTA. Algorithm and source code reading

1:51:06

Reading MOTR source code

1:34:06

MOTR - Object Tracking with Transformers

48:20

Reading Trackformer source code

1:41:54

Trackformer - Multi-Object Tracking with Transformers

46:31

Reading Codetr source code - part 1 out of 2

This video dives into the source code of CoDETR - current SOTA object detection model.
Part 1: setup, debugging setup, and transformer part of the model - encoder, decoder, deformable attention, one-to-one matching (aka Hungarian) and loss.
Part 2 will be about aux heads: region proposal network, ROI heads, ATSS, their use to boost encoder training and aux queries setup to boost decoder training.
Important links:
- Original paper arxiv.org/pdf/2211.12860
- Source code: github.com/Sense-X/Co-DETR
- Jupyter notebook shown in the video: github.com/adensur/blog/blob/main/computer_vision_zero_to_hero/28_CoDetr/sandbox.ipynb
00:00 - Intro
04:47 - Setup, Dataloaders
11:06 - Transformer part of the model
15:00 - Backbone
20:44 - Positional Encoding
24:32 - Valid ratios and Reference points
30:26 - Encoder
33:35 - Mutli Scale Deformable Attention
43:30 - (Mixed) Query Selection
51:15 - Reference Points for Decoder
52:34 - Decoder code
58:53 - Converting Decoder output to Detections
01:02:26 - Loss, Hungarian Matching
01:10:52 - Outro

มุมมอง: 37

วีดีโอ

CoDETR - SOTA object detection with transformers

39:27

CoDETR - SOTA object detection with transformers

มุมมอง 51828 วันที่ผ่านมา

This video talks about CoDETR - current state of the art, transformer-based model that builds on top of previous generations, such as DETR, Deformable Detr and DINO, by adding extra "auxiliary heads" during training, introducing no overhead during inference. This video is focused on the explanation of the model itself. My next video will be a "source code read" with more in-depth explanations a...

Multiple Object Tracking Metrics - MOTA, IDF1, HOTA. Algorithm and source code reading

1:51:06

Multiple Object Tracking Metrics - MOTA, IDF1, HOTA. Algorithm and source code reading

มุมมอง 547หลายเดือนก่อน

This video takes a deep dive into metrics used for assessing trackers for multiple object tracking. It talks about motivations behind them, pros and cons of earlier metrics (MOTA/IDF1), and the actual metric algorithm. Second half of the video goes over the official source code. Important links: - Metrics repo: github.com/JonathonLuiten/TrackEval - MOT challenge home - the video primarily talks...

1:34:06

Reading MOTR source code

มุมมอง 2322 หลายเดือนก่อน

This video talks about MOTR, one of the modern transformer-based models for multiple object tracking. The focus of the video is on the source code. Check out my previous video for the algorithm explanations! Important links: - My previous video about MOTR: th-cam.com/video/jK6xsNbnE-8/w-d-xo.html - Installation guide and notebook showed in the video: github.com/adensur/blog/tree/main/Object Tra...

MOTR - Object Tracking with Transformers

48:20

MOTR - Object Tracking with Transformers

มุมมอง 2872 หลายเดือนก่อน

This video is about MOTR model - one of the modern attempts to perform multiple object tracking end-to-end with transformers, other notable attempt being TrackFormer. This is the first video about MOTR, and focuses more on the algorithm, similarities and differences with Trackformer, and results. Second (upcoming) video will focus more on the source code. Important links: - MOTR paper arxiv.org...

1:41:54

Reading Trackformer source code

มุมมอง 3053 หลายเดือนก่อน

This video talks about the source code of TrackFormer - Multi-Object tracking model based on Transformers. Important links: - Source code: github.com/timmeinhardt/trackformer/tree/main - My setup instructions & fixes: github.com/adensur/blog/blob/main/Object Tracking/02_reading_trackformer_source_code/Installation.md - Jupyter notebook from the video: github.com/adensur/blog/blob/main/Object Tr...

Trackformer - Multi-Object Tracking with Transformers

46:31

Trackformer - Multi-Object Tracking with Transformers

มุมมอง 3963 หลายเดือนก่อน

This video talks about Trackformer - a model based on Detr object detector to perform object tracking end-to-end. Some links: - Original paper arxiv.org/pdf/2101.02702 - My previous video about Detr th-cam.com/video/A2f4w54fSsM/w-d-xo.html - Challenge home motchallenge.net/ 00:00 - Intro 01:07 - Multi Object Tracking task overview 10:01 - Detr overview 21:38 - Trackformer Method 28:07 - Trackfo...

1:29:52

Reading Grounding Dino source code

มุมมอง 4973 หลายเดือนก่อน

This video talks about Grounding Dino - Dino's "open set" object detection brother that allows to detect objects from novel categories zero shot, as well as detect objects using referring expressions like "the lion most to the right". This video is part of broader series: Modern Object Detection - from YOLO to Transformers th-cam.com/play/PL1HdfW5-F8AQlPZCJBq2gNjERTDEAl8v3.html. Check out this ...

Grounding Dino for open set object detection

55:14

Grounding Dino for open set object detection

มุมมอง 7633 หลายเดือนก่อน

1:20:24

Reading BERT source code

มุมมอง 1864 หลายเดือนก่อน

This video talks about BERT, second model in my "modern NLP" list, first one being GPT. By using bidirectional attention, it loses the ability to generate language, but become more useful in old-school task requiring regression or classification based on text: question answering, search, text classification etc. The video explains main concepts of BERT, and shows an example implementation of BE...

Reading DINO source code - DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

1:05:25

Reading DINO source code - DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

มุมมอง 6194 หลายเดือนก่อน

This video is focused on reading the source code of the official implementation of DINO model in PyTorch. This video is part of broader series: Modern Object Detection - from YOLO to Transformer th-cam.com/play/PL1HdfW5-F8AQlPZCJBq2gNjERTDEAl8v3.html. Check out this playlist for other object detection videos, including source code reads for DINO's predecessors - DETR, Deformable DETR, DAB DETR ...

1:19:54

Reading GPT-2 source code

มุมมอง 6974 หลายเดือนก่อน

This video talks about GPT and GPT2, "Generative PreTraining" models that first proved to the world that solving language modeling task using transformers is a good way to create models capable in a wide variety of tasks. They also provide a good starting point for learning modern NLP, because they are really simple in concept and implementation. This video is a part of broader series about NLP...

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

41:10

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

มุมมอง 1.5K4 หลายเดือนก่อน

This video talks about DINO - the first state-of-the-art, Detr-like, transformer based model. This video is part of broader series: Modern Object Detection - from YOLO to Transformer th-cam.com/play/PL1HdfW5-F8AQlPZCJBq2gNjERTDEAl8v3.html. The model itself builds on top of the concepts introduced in Detr, Deformable Detr, DAB Detr and DN Detr, improving on them and remixing them to achieve supe...

Reading SWIN transformer source code - Image Recognition with Transformers

56:07

Reading SWIN transformer source code - Image Recognition with Transformers

มุมมอง 5855 หลายเดือนก่อน

This video goes through the source code of Pytorch "vision" implementation of SWIN image recognition model. This is not the original implementation of the paper, but rather, "torchvision" reimplementation that attempts to follow the original as close as possible and achieves the same results. This is another video from my "Modern Object Detection" series: th-cam.com/play/PL1HdfW5-F8AQlPZCJBq2gN...

27:30

SWIN transformer (image recognition)

มุมมอง 4615 หลายเดือนก่อน

This video talks about SWIN transformer - a model trained for image classification, but also used in a variety of tasks as a backbone, replacing ResNet/ViT. It is currently the main part of SOTA object detection models like DINO. This is another video from my "Modern Object Detection" series: th-cam.com/play/PL1HdfW5-F8AQlPZCJBq2gNjERTDEAl8v3.html Important links: - Original paper: arxiv.org/pd...