MoCo (+ v2): Unsupervised learning in computer vision

SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Part 5. Roblox trend☠️

XG - IYKYK (Official Music Video)

Clowns want to sleep. #joker#joker #Harriet Quinn#cosplay

ViTPose: 2D Human Pose Estimation

Soroush Mehraban

มุมมอง 3 511

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 13 ต.ค. 2024

ความคิดเห็น •

@amirhosseinmohammadi4731 2 หลายเดือนก่อน
It was very comprehensive, thanks a lot Soroush
@wolpumba4099 ปีที่แล้ว
- 0:00: The video discusses vit post paper which is currently leading in 2D post estimation on the Ms coco data set.
- 0:13: Previous attempts to use Transformers for 2D Pro estimation have included transpose and token pose.
- 0:26: Transpose uses a CNN backbone to extract local information from the input image and a Transformer encoder to understand the skeleton key points in the image.
- 0:58: Token pose uses a similar approach but includes random tokens to represent missing or occluded key points.
- 1:33: Another attempt, HR former, combines Transformer blocks and convolutional blocks for down sampling and up sampling.
- 2:11: Vit pose simplifies the process by using only Transformers, making it easier to deal with the problem.
- 2:21: Vit pose uses an encoder which is a Transformer to create tokens from an input image.
- 3:50: Vit pose has two different decoder options - classic decoder and simple decoder.
- 6:15: Vit pose allows multi-dataset training, enabling the utilization of different decoders depending on the data set.
- 7:03: The video presents different variants of vit pose like base, large, huge, and gigantic, which differ in the number of layers and channel size.
- 7:27: The video discusses the simplicity and scalability of vit pose.
- 8:33: The video discusses the influence of pre-training data on the performance of vit pose.
- 10:11: The video discusses the influence of input resolution on the performance of vit pose.
- 11:32: The video discusses the influence of attention type on the performance of vit pose.
- 14:55: The video discusses the influence of partially finetuning on the performance of vit pose.
- 16:02: The video discusses the influence of multi-dataset training on the performance of vit pose.
- 16:21: The video discusses the use of knowledge distillation to improve the generalizability of the model.
- 21:12: The video presents the results of vit pose in comparison with different modules for the task of 2D post estimation on Ms Coco dataset.
Positive Learnings:
- Vit pose simplifies the process of 2D pose estimation by using only Transformers.
- The use of an encoder which is a Transformer to create tokens from an input image has proven to be effective.
- The use of different variants like base, large, huge, and gigantic can enhance the performance of vit pose.
- The use of pre-training data can improve the performance of vit pose.
- The use of knowledge distillation can improve the generalizability of the model.
Negative Learnings:
- Previous attempts to use Transformers for 2D Pro estimation such as transpose and token pose had limitations.
- The use of a CNN backbone in transpose limits its effectiveness.
- Token pose's use of random tokens to represent missing or occluded key points is not the most efficient approach.
- HR former's combination of Transformer blocks and convolutional blocks for down sampling and up sampling makes it complicated.
- Partially finetuning can negatively affect the performance of vit pose.
@mjalali3109 ปีที่แล้ว
Congratulations, a perfect and neat job
@francisferri2732 ปีที่แล้ว
Thank you for your videos! they are very good to know the state of the art
@soroushmehraban ปีที่แล้ว ⁺¹
Glad you enjoyed it
@mrraptorious8090 6 หลายเดือนก่อน
Hey, I am asking myself how to train ViTPose by myself. Did you coincidently trained it by yourself? If so could you share experiences?
@rohollahhosseyni8564 ปีที่แล้ว
Great job!
@nikhilchhabra ปีที่แล้ว
Thank you for this Interesting video. Would be interesting to see Bottom up pose estimation using transformers like ED-Pose. VitPose is top down so (a) Inference time increases with number of person. (b) It can not handle overlapping human scenarios.
@soroushmehraban ปีที่แล้ว
Thanks for the feedback. I didn’t know about the ED-Pose. Surely will read it soon
@Fateme_Pourghasem ปีที่แล้ว
That was great. Thanks.
@soroushmehraban ปีที่แล้ว
Thanks for the feedback
@alihadimoghadam8931 ปีที่แล้ว
nice job
@soroushmehraban ปีที่แล้ว
Thanks
@shklbor หลายเดือนก่อน
how do they detect poses from heatmaps for say 'k' people?
@shklbor หลายเดือนก่อน ⁺¹
nevermind it doesn't detect multiple poses
@ngtiens_dat 11 วันที่ผ่านมา
làm ơn cho tôi code

ต่อไป

เล่นอัตโนมัติ

MoCo (+ v2): Unsupervised learning in computer vision

MoCo (+ v2): Unsupervised learning in computer vision

SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design

SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Part 5. Roblox trend☠️

Part 5. Roblox trend☠️

XG - IYKYK (Official Music Video)

XG - IYKYK (Official Music Video)

Clowns want to sleep. #joker#joker #Harriet Quinn#cosplay

Clowns want to sleep. #joker#joker #Harriet Quinn#cosplay

Flipping Robot vs Heavier And Heavier Objects

Flipping Robot vs Heavier And Heavier Objects

IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds

IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds

Learning 3D Human Pose Estimation from Dozens of Datasets by Bridging Skeleton Formats (WACV'23)

Learning 3D Human Pose Estimation from Dozens of Datasets by Bridging Skeleton Formats (WACV'23)

‘Godfather of AI’ on AI “exceeding human intelligence” and it “trying to take over”

‘Godfather of AI’ on AI “exceeding human intelligence” and it “trying to take over”

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

3D Human Pose Estimation via Intuitive Physics (CVPR 2023)

3D Human Pose Estimation via Intuitive Physics (CVPR 2023)

How to store data on DNA?

How to store data on DNA?

AI Learns to Do Deadlifts

AI Learns to Do Deadlifts

DINO: Self-Supervised Vision Transformers

DINO: Self-Supervised Vision Transformers

Diffusion Models (DDPM & DDIM) - Easily explained!

Diffusion Models (DDPM & DDIM) - Easily explained!

ซื้อบ้านเถอะมึง555555 #ไม่มีไรทุกคนแค่อยากเล่าเฉยๆ #เจนิส #เจณิสตา

ซื้อบ้านเถอะมึง555555 #ไม่มีไรทุกคนแค่อยากเล่าเฉยๆ #เจนิส #เจณิสตา

พล.ต.อ.เสรีพิศุทธ์ เตมียเวส เผยไร้หลักฐานปรากฎ ส่อพิรุธชั้น 14 | THAIRATH NEWSROOM

พล.ต.อ.เสรีพิศุทธ์ เตมียเวส เผยไร้หลักฐานปรากฎ ส่อพิรุธชั้น 14 | THAIRATH NEWSROOM

ไรเดอร์เจอผี | หลอนไดอารี่ EP.250

ไรเดอร์เจอผี | หลอนไดอารี่ EP.250

ติดกับดัก...รักบอสตัวร้าย l EP.1778 l 11 ต.ค.67 l#โหนกระแส

ติดกับดัก...รักบอสตัวร้าย l EP.1778 l 11 ต.ค.67 l#โหนกระแส

huh? chipi chipi chapa chapa #cat #cute #chipichipi #huhcat #brickbreaker #shorts

huh? chipi chipi chapa chapa #cat #cute #chipichipi #huhcat #brickbreaker #shorts

📌LIVE #12 : พี่ราตรี ผีนางรำ อย่าทำหนูเลย !! ( Home Sweet Home EP II )

📌LIVE #12 : พี่ราตรี ผีนางรำ อย่าทำหนูเลย !! ( Home Sweet Home EP II )

핑크 버블티로 체감되는 요즘 물가

핑크 버블티로 체감되는 요즘 물가