ไม่สามารถเล่นวิดีโอนี้

ขออภัยในความไม่สะดวก

Vision Transformer and its Applications

Open Data Science

มุมมอง 40 896

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 17 ส.ค. 2024
Vision transformer is a recent breakthrough in the area of computer vision. While transformer-based models have dominated the field of natural language processing since 2017, CNN-based models are still demonstrating state-of-the-art performances in vision problems. Last year, a group of researchers from Google figured out how to make a transformer work on recognition. They called it "vision transformer". The follow-up works by the community demonstrated superior performance of vision transformers not only in recognition but also in other downstream tasks such as detection, segmentation, multi-modal learning and scene text recognition to mention a few.
In this talk, Rowel Atienza will go into a deeper understanding of the model architecture of vision transformers. Most importantly, Rowel will focus on the concept of self-attention and its role in vision. Then, he will present different model implementations utilizing the vision transformer as the main backbone.
Since self-attention can be applied beyond transformers, Rowel Atienza will also discuss a promising direction in building general-purpose model architectures. In particular, networks that can process a variety of data formats such as text, audio, image and video.
→ To watch more videos like this, visit aiplus.training ←
Do You Like This Video? Share Your Thoughts in Comments Below
Also, You can visit our website and choose the nearest ODSC Event to attend and experience all our Trainings and Workshops:
odsc.com/calif...
odsc.com/apac/
Sign up for the newsletter to stay up to date with the latest trends in data science: opendatascienc...
Follow Us Online!
• Facebook: / opendatasci
• Instagram: / odsc
• Blog: opendatascienc...
• LinkedIn: / open-data-science
• Twitter: / odsc

ความคิดเห็น • 25

@jhjbm1959 8 หลายเดือนก่อน ⁺³
This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else.
Thank you.
@SarangBanakhede 9 วันที่ผ่านมา
10:58
Scale Equivariance:
Definition: A function is scale equivariant if a scaling (resizing) of the input results in a corresponding scaling of the output.
Convolution in CNNs: Standard convolutions are not scale equivariant. This means that if you resize an object in an image (e.g., making it larger or smaller), the CNN may not recognize it as the same object. Convolutional filters have fixed sizes, so they may fail to detect features that are significantly larger or smaller than the size of the filter.
Example: If a CNN is trained to detect a small object using a specific filter size, it might struggle to detect the same object when it appears much larger in the image because the filter is not capable of adjusting to different scales.
Why is Convolution Not Scale Equivariant?
The filters in a CNN have a fixed receptive field, meaning they look for patterns of a specific size. If the size of the pattern changes (e.g., due to scaling), the fixed-size filters may no longer detect the pattern effectively.
@crapadopalese ปีที่แล้ว ⁺⁸
10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.
@PrestonRahim ปีที่แล้ว ⁺⁵
Super helpful. Was very lost on the process from image patch to embedded vector until I watched this.
@DrAIScience 3 หลายเดือนก่อน
Very very very nice explanation!!! I like learning the foundation/origin of the concepts where models are derived..
@ailinhasanpour ปีที่แล้ว ⁺⁴
thanks for sharing , it was extremely helpful 💯
@OpenDataScienceCon ปีที่แล้ว
Thank you!
@xXMaDGaMeR ปีที่แล้ว ⁺³
amazing lecture, thank you sir!
@sahil-vz8or ปีที่แล้ว ⁺¹
you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.
@rikki146 ปีที่แล้ว ⁺¹
20:17 I think the encoder blocks are stacked in parallel fashion rather than sequential?
@DrAIScience 3 หลายเดือนก่อน
Do you have a video about beit or dino?
@scottkorman4953 ปีที่แล้ว ⁺⁴
What exactly is happening in the self-attention and MLP blocks of the encoder module? Could you describe it in a simplistic way?
@PRASHANTKUMAR-ze6mj ปีที่แล้ว ⁺¹
thanks for sharing
@mohammedrakib3736 5 หลายเดือนก่อน
Fantastic Video! Really loved the detailed explanation step-by-step.
@DrAIScience 3 หลายเดือนก่อน
Are you the channel owner??
@user-co6pu8zv3v ปีที่แล้ว
Thank you, sir
@anirudhgangadhar6158 ปีที่แล้ว
Great resource!
@muhammadshahzaibiqbal7658 2 ปีที่แล้ว
Thanks for sharing.
@liangcheng9856 ปีที่แล้ว
awesome
@hoangtrung.aiengineer ปีที่แล้ว
Thank you for making such a great video
@capocianni1043 ปีที่แล้ว
Thank you for this genuine knowledge.
@saimasideeq7254 9 หลายเดือนก่อน
thankyou much clearer
@improvement_developer8995 ปีที่แล้ว ⁺²
Tax evader 🤮
@improvement_developer8995 ปีที่แล้ว ⁺²
🤮

ต่อไป

เล่นอัตโนมัติ

Vision Transformer Basics

Vision Transformer Basics

Meaning Representation for Natural Language Understanding - Mariana Romanyshyn | - ODSC Europe 2019

Meaning Representation for Natural Language Understanding - Mariana Romanyshyn | - ODSC Europe 2019

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

ทนายเดชา ล่าสุด (หมดที่ยืน)#บิ๊กป้อม#เพื่อไทย#เศรษฐา#เท้ง ณัฐพงษ์#พรรคประชาชน

ทนายเดชา ล่าสุด (หมดที่ยืน)#บิ๊กป้อม#เพื่อไทย#เศรษฐา#เท้ง ณัฐพงษ์#พรรคประชาชน

พระบิดา ส่ง “ธิดาลุยไฟ” ว่าที่นายกฯ ใหม่ “แพทองธาร” ต้องเผชิญ “วิบาก” อะไร ? #ถกไม่เถียง

พระบิดา ส่ง “ธิดาลุยไฟ” ว่าที่นายกฯ ใหม่ “แพทองธาร” ต้องเผชิญ “วิบาก” อะไร ? #ถกไม่เถียง

วันแม่ #มดดำ ผมไม่เคยลืม วันแรกที่พี่ พาผมเปลี่ยนชีวิต มีแต่สิ่งดีๆเข้ามา#JL #joelong #โจล่ง #JL

วันแม่ #มดดำ ผมไม่เคยลืม วันแรกที่พี่ พาผมเปลี่ยนชีวิต มีแต่สิ่งดีๆเข้ามา#JL #joelong #โจล่ง #JL

ไฮไลท์ฟุตบอลพรีเมียร์ลีก 2024/25 สัปดาห์ที่ 1 : แมนเชสเตอร์ ยูไนเต็ด พบ ฟูแล่ม

ไฮไลท์ฟุตบอลพรีเมียร์ลีก 2024/25 สัปดาห์ที่ 1 : แมนเชสเตอร์ ยูไนเต็ด พบ ฟูแล่ม

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

These Illusions Fool Almost Everyone

These Illusions Fool Almost Everyone

The math behind Attention: Keys, Queries, and Values matrices

The math behind Attention: Keys, Queries, and Values matrices

Accidental Breakthrough In Biology: Is This Why All Proteins Are Left-Handed?

Accidental Breakthrough In Biology: Is This Why All Proteins Are Left-Handed?

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformers explained

Vision Transformers explained

[ 100k Special ] Transformers: Zero to Hero

[ 100k Special ] Transformers: Zero to Hero

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Mcdonald vs KFC #น้องนะโมkidsplay

Mcdonald vs KFC #น้องนะโมkidsplay

พระบิดา ส่ง “ธิดาลุยไฟ” ว่าที่นายกฯ ใหม่ “แพทองธาร” ต้องเผชิญ “วิบาก” อะไร ? #ถกไม่เถียง

พระบิดา ส่ง “ธิดาลุยไฟ” ว่าที่นายกฯ ใหม่ “แพทองธาร” ต้องเผชิญ “วิบาก” อะไร ? #ถกไม่เถียง

LISA - NEW WOMAN feat. Rosalía (Official Music Video) reaction [NEW LEVEL!]

LISA - NEW WOMAN feat. Rosalía (Official Music Video) reaction [NEW LEVEL!]

เกิดใหม่ทั้งทีก็เป็นสไลม์ไปซะแล้ว ซีซั่น 3 - ตอนที่ 66 [ซับไทย]

เกิดใหม่ทั้งทีก็เป็นสไลม์ไปซะแล้ว ซีซั่น 3 - ตอนที่ 66 [ซับไทย]

@ Zhindong Da Fei Xin (O2546321517)# Old Iron Alopecia Rescue Plan

@ Zhindong Da Fei Xin (O2546321517)# Old Iron Alopecia Rescue Plan

นายกฯ ต่อจาก "เศรษฐา" ใครมีโอกาสได้ตำแหน่งมากที่สุด? | Thai PBS News

นายกฯ ต่อจาก "เศรษฐา" ใครมีโอกาสได้ตำแหน่งมากที่สุด? | Thai PBS News

LIVE🔥 “คุณหญิงสุดารัตน์” นัดประชุมด่วน หลัง สส.ไทยสร้างไทย โหวตนายกฯ “อิ๊งค์” | 16 ส.ค.67

LIVE🔥 “คุณหญิงสุดารัตน์” นัดประชุมด่วน หลัง สส.ไทยสร้างไทย โหวตนายกฯ “อิ๊งค์” | 16 ส.ค.67