Learning To Classify Images Without Labels (Paper Explained)

Yannic Kilcher

มุมมอง 48 371

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 ก.ย. 2024

ความคิดเห็น • 102

@BeyondTheBrink 4 ปีที่แล้ว ⁺⁷
The fact you allowed us to participate in your confusion about the norm-not-norm-issue is sooo valuable. Great fan of your work, thx!
@eternalsecretforgettingfor8525 4 ปีที่แล้ว ⁺¹⁴
OUTLINE:
0:00-Intro & High-level Overview
2:15-Problem Statement
4:50-Why naive Clustering does not work
9:25-Representation Learning
13:40- Nearest-neighbor-based Clustering
28:00-Self-Labeling
32:10-Experiments
38:20- ImageNet Experiments
41:00-Overclusteringg
@twmicrosheep 4 ปีที่แล้ว ⁺¹²
Great explanations!
The self-labeling step reminds me of the paper "ClusterFit: Improving Generalization of Visual Representations", which shows a lot of promising results by using pseudo labels from clustering to retrain a new classifier.
@gforman44 3 ปีที่แล้ว ⁺⁵
This is very nice and a nice explanation of it. This works so well in this paper partly because the input dataset is nicely separable into discrete clusters. Try this with photos from the wild, not cropped to include the frog/car/object in the center of the photo. Unsupervised, it's pretty unlikely that you'll get classes you like.
@katharinahochkamp5415 3 ปีที่แล้ว ⁺³
I am currently bingeing your video during my work hours - but, as a PhD student in this field, I don't even feel guilty because I am learning so much. Great work, keep it up!
@Phobos11 4 ปีที่แล้ว ⁺⁶
Cool! I was actually going to try doing this myself, exactly the same steps and all, unsupervised learning -> k-means -> self labeling. Awesome to see I wasn't so crazy after all, great explanation 😁
@tedp9146 4 ปีที่แล้ว ⁺¹
I had also something similar (and simpler) in mind before I watched this video: Clustering the bottleneck-encodings of images. Surely that’s made before but I haven’t found any results on the internet
@esthermukoyagwada8578 3 ปีที่แล้ว
@Victor, Which Dataset are you aiming to work with?
@MrAmirhossein1 4 ปีที่แล้ว ⁺³¹
Thanks for the great content
Honestly, the entire channel is an actual gold mine!
Please keep up the excellent work :)
@saikrishnarallabandi1131 4 ปีที่แล้ว
+1
@dmc1308 6 หลายเดือนก่อน
being wondering inside the paper for hours and finding this vid is a big gift for me
@kumarrajamani2135 ปีที่แล้ว
Wonderful Video @Yannic. Couple of years back, during my Postdoc, I learnt Attention by hearing through your video on "Attention is All you Need" and then started my research work to build based on the intuition I got. I now get a good idea of Self Supervised Learning !!!
@ruskinrajmanku2753 4 ปีที่แล้ว ⁺⁴
There were some really interesting RL papers in ICLR'20. You should cover a few of them. Great explanation again, keep up this work !
@rongxinzhu 4 ปีที่แล้ว
Can you provide some links? I'm really interested in those papers.
@Squirrelonsand 3 ปีที่แล้ว
When I started watching the video, I was not sure if I'd be able to sit for 45 minutes to understand this paper, but thanks to your great explanation skills, I sailed through it...
@dippatel1739 4 ปีที่แล้ว ⁺¹⁷
label exists.
Augmentation : I am about to end this man's career.
@MrPrakalp 3 ปีที่แล้ว ⁺¹
Great paper review and explanation!! Thanks a ton!!! It definitely saved lot of my time in reading and understanding entire paper. Now its easy to go back and implement things
@myelinsheathxd ปีที่แล้ว
Amazing method, I hope RL can use this method during self cruosity rewards. So then there will be less manual rewards for bunch of locomotion tasks
@rahuldeora5815 3 ปีที่แล้ว
Point made in the last 30 seconds is such an important one. All hyper-param choices are based on label information making this more of a playground experiment rather than something robust
@ehtax 4 ปีที่แล้ว ⁺⁷
Super helpful, keep up the great work Yannic! Your ability to filter out the intuitions make you an incredible instructor.
ps: what is the note-taking software you're using?
@YannicKilcher 4 ปีที่แล้ว ⁺²
OneNote, thanks :)
@IndranilBhattacharya_1988 4 ปีที่แล้ว ⁺²
@@YannicKilcher fantastic..good job .. keep going .. I myself before reading a paper look through your videos in case you have reviewed it already
@shix5592 ปีที่แล้ว
me too, very good channel@@IndranilBhattacharya_1988
@dippatel1739 4 ปีที่แล้ว ⁺¹⁰
Summary of Paper
1. Learn good embedding
2. Learn Classes based on embedding
3. (Extra) Use learned classes to train new NN.
@herp_derpingson 4 ปีที่แล้ว ⁺²
K-nearest neighbours but with neural networks
@acl21 4 ปีที่แล้ว ⁺²
Great explanation as always, thank you!
It would have been even better if you had explained the evaluation metrics ACC (clustering accuracy), NMI (normalized mutual information) and ARI (adjusted rand index).
@ShivaramKR 4 ปีที่แล้ว ⁺¹
Don't worry so much about the mistakes you do. You are doing a great job!
@kapilchauhan9774 4 ปีที่แล้ว ⁺³
Thank you, for such an amazing overview
@BanjiLawal ปีที่แล้ว
This is what I have been looking for
@herp_derpingson 4 ปีที่แล้ว ⁺⁵
This is too good to be true. I wouldn't be surprised if nobody is able to replicate this. But if it does work, it could open up a lot of possibilities in unventured territories in computer vision.
@TijsMaas 4 ปีที่แล้ว ⁺¹
Many hyperparams indeed, the authors claim code + configuration files will be released soon, sounds really promising. Defining the class (dis)agreement on the embedding neighbourhood is a fine piece of representation learning 👌.
@simonvandenhende5227 3 ปีที่แล้ว ⁺¹
We released the code over here :) github.com/wvangansbeke/Unsupervised-Classification
@dennyw2383 3 ปีที่แล้ว
@@simonvandenhende5227 great work. what's the best way to communicate with you guys? For example, CIFAR100 ACC is significant lower than ImageNet-100 ACC, any thought why?
@simonvandenhende5227 3 ปีที่แล้ว
@@dennyw2383 You can contact me through email. CIFAR100 is evaluated using superclasses, e.g. vehicles = {bicycle, bus, motorcycle, pickup truck, train}, trees = {trees, maple, oak, palm, pine, willow}. These groups were composed based on prior human knowledge, and not on visual similarities alone. This is the main reason I see for the lower accuracy on CIFAR100. Another reason that also relates to the use of superclasses, could be the increased intra class variability.
@Renan-st1zb ปีที่แล้ว
Awesome explanation. Thanks a ton
@CodeShow. 4 ปีที่แล้ว ⁺¹
Can you explain the basics of deep learning using the published papers for algorithms as you do now. You. Have a way in teaching that makes me do not fear from scientific papers 🙂
@dimitrispolitikos1246 2 ปีที่แล้ว
Nice explanation! Thank you Yannic!
@ekkkkkeeee 4 ปีที่แล้ว ⁺²
I have a little question about equation 2 in the paper. How is the soft assignment phi^{c} is calculated? They simply they: "The probability of sample Xi being assigned to cluster c is denoted as Φcη(Xi) but never mention how to calculte it. Am I missing this?
@YannicKilcher 4 ปีที่แล้ว
It's probably a softmax after some linear layers
@Fortnite_king954 4 ปีที่แล้ว ⁺¹
Amazing review, thank you so much. Keep going....
@clivefernandes5435 3 ปีที่แล้ว ⁺¹
Hi I was training the model in the scan stage the total loss displayed is -ve hence to reduce we need to go from say -4 to -9 rite ? Silly question
@mohammadxahid5984 4 ปีที่แล้ว ⁺¹
Yannic, could you please make a video on essential mathematics that are required for to be DL researchers? I am an CS undergrad and I always find myself not knowing enough mathematics while reading paper. Is is the case for everyone?
I am amazed at your ability to go through papers with such understanding. Could you share with us how you prepared yourself that way?
BS: excuse my English.
@YannicKilcher 4 ปีที่แล้ว
Hey, never be ashamed of your English, it's cool that you participate :)
That's a good idea, but the answer will be a bit boring: linear algebra, real (multidimensional) calculus, probability / stats and numerics are most relevant
@choedward3380 ปีที่แล้ว
I have one question. If I have no labeled images, Is it possible? On update memory bank (in simclr as pretext), does it need labels??
@NehadHirmiz 4 ปีที่แล้ว
Your videos are amazing. Not only you have the technical knowledge, but you do a wonderful job explaining things. If I may suggest creating an advanced course where you show researchers/students how to implement the algorithms in these papers. I would be your first student lol :).
@YannicKilcher 4 ปีที่แล้ว
God that sounds like work.... just kidding, thanks for the feedback :)
@NehadHirmiz 4 ปีที่แล้ว
@@YannicKilcher I know there is a fine line between too much fun and work :P. This would be a graduate-level course.
@Vroomerify 2 ปีที่แล้ว
How do we avoid the network projecting all images to 0 in the 1st step if we are not using a contrastive loss function?
@vsiegel 3 ปีที่แล้ว
From the examples, I had the suspicion that it may works based on *colour and structure of the background* , combined with *confirmation bias* .
The shark cluster may not care much about the sharks, bur more so about the blue water that surrounds it. The spiders may just be things in focus in front of a blurred background, caused by the small depth of field of a macro photo.
It may also be based on the shape of the colour histogram, that covers more of the example clusters shown, and includes information about the structure and colours of object and background.
At least in some examples it is a very strong effect, so strong that it needs confirmation bias by the authors to miss it. Maybe it is discussed in the paper, I did not check.
@23kl104 3 ปีที่แล้ว ⁺¹
I would suspect that overclustering is done by shoving in a whole block of data from one class and assign the output with the highest peak the corresponding label, though can't be sure.
And shouldn't the accuracy be expected to be lower with more classes, since the entropy term is maximizing the number of different clusters.
@saikannanravichandar6171 4 ปีที่แล้ว ⁺²
This video is good 👌... If possible, can you explain the concept with coding..
@shuvamgiri8601 4 ปีที่แล้ว
+1
@tamooora87 4 ปีที่แล้ว ⁺¹
Thanks for the great effort 👍
@thebigmouth 2 ปีที่แล้ว
Thanks for the amazing content!
@sherryxia4763 3 ปีที่แล้ว
The thing I love most is the sassy hand drawing lmao
@MyTobirama 4 ปีที่แล้ว ⁺²
at 15:06 why do they use the log in the first term of the equation?
@YannicKilcher 4 ปีที่แล้ว ⁺¹
I guess it's so they can interpret the inner product as a likelihood
@nahakuma 4 ปีที่แล้ว ⁺³
Nice videos, in particular your skepticism. How do you select the papers you will review? I find myself with a mountain of papers to read but time is never enough.
@YannicKilcher 4 ปีที่แล้ว ⁺¹
Same here, I just read what seems interesting.
@tedp9146 4 ปีที่แล้ว ⁺²
How well would it work to cluster the bottleneck-encoding of an autoencoder?
@YannicKilcher 4 ปีที่แล้ว ⁺¹
Good question, worth a try
@ravipashchapur5803 2 ปีที่แล้ว
Hi there, hope you are doing well. I want to know can we use only supervised learning for unlabeled image dataset?
@tuanad121 3 ปีที่แล้ว
at 32:34 the accuracy of self-supervised learning followed by K-means is 35.9%. How do they decide the representing label of a cluster? Is the representing label is the majority ones in the cluster?
@DarioCazzani 2 ปีที่แล้ว
That's not a flute, it's an oboe! :p
Always enjoy your videos btw :)
@dinnerplanner9381 3 ปีที่แล้ว
I have a question, what would happen if we pass images through a pretrained model such as inception and then use the obtained feature map for clustering?
@jacobkritikos3499 3 ปีที่แล้ว
Congrats for your video!!!
@bibiworm 2 ปีที่แล้ว
6:58, a stupid question here. So if the downstream task is not a classification task, would the euclidean distance still make sense in the learned representation space? I think it does, but I am not sure. I'd really appreciate it, if anyone can shed some light here. Thanks.
@MrjbushM 4 ปีที่แล้ว
thanks cool videos! very informative, I always try to distill knowledge from your explanations :-)
@julespoon2884 4 ปีที่แล้ว ⁺¹
43:30 ive not read the paper yet but your argument for overclustering does not apply if the authors evaluated the model on a different set that they trained on.
@bowenzhang4471 3 ปีที่แล้ว
25:19 why in L2 space the inner product is always 1?
@egexiang588 4 ปีที่แล้ว ⁺¹
Should I be familiar with a information theory text book to appreciate this paper ? I'm really not sure which math text books to read to understand ML papers better.
@YannicKilcher 4 ปีที่แล้ว
Nope, this is very practical
@ProfessionalTycoons 4 ปีที่แล้ว
such dope research!!
@nightline9868 3 ปีที่แล้ว
Great Video. Really easy to understand. Thanks for that!
Can i ask you something?
I'm trying to compare different Clustering results of different Clustering approaches on image data. Is it possible to use internal validation indexes i.e. davies-bouldin-score? Or are there problems in terms of the euclidean space?
Keep it up
@linminhtoo 3 ปีที่แล้ว ⁺¹
Could you re-explain why euclidean distance would not work for raw images?
@YannicKilcher 3 ปีที่แล้ว ⁺¹
because two images can be very similar to humans, but every pixel is different
@linminhtoo 3 ปีที่แล้ว
@@YannicKilcher this makes sense. Thanks!
@bryand3576 4 ปีที่แล้ว
It would be great to contact the authors to see what they think of your videos !
@hafezfarazi5513 3 ปีที่แล้ว
I have a question: Why in representation learning, the network won't cheat and classify everything(all kinds of classes) the same? Is there a regularization that is not shown here (for example, encouraging having a diverse output)?
@YannicKilcher 3 ปีที่แล้ว ⁺¹
there are a nmber of tricks, but mostly it's because of stochasticity, normalization and the inclusion of negatives
@sarvagyagupta1744 4 ปีที่แล้ว ⁺¹
Hey, great videos. I have a question though. In the Representation Learning part, it seems very similar to the image reconstruction like using a variational autoencoder. Some people consider it as unsupervised learning. So what's exactly is the difference between self-supervised and unsupervised learning?
@YannicKilcher 4 ปีที่แล้ว ⁺²
It's pretty much the same thing. Self-supervised tries to make it more explicit that it's "like" supervised, but with a self-invented label.
@sarvagyagupta1744 4 ปีที่แล้ว
@@YannicKilcher thanks for the reply. So what, according to you, is a clear cut example that differentiates between self-supervised and unsupervised learning?
@huseyintemiz5249 4 ปีที่แล้ว
Nice overview.
@antonio.7557 4 ปีที่แล้ว
great video, thanks!
@sarc007 3 ปีที่แล้ว ⁺¹
Hi, Very interesting and informative video, I have a question how do I go about detecting symbols in an engineering drawing using your technique explained here?
@YannicKilcher 3 ปีที่แล้ว ⁺¹
you'd need a dataset,
@sarc007 3 ปีที่แล้ว
@@YannicKilcher Then it will be a labled data right , can you elaborate, my email id is sarc007@gmail.com
@clivefernandes5435 3 ปีที่แล้ว
So the first step is the most important thing rite ? Becz the later r learning from it.
@YannicKilcher 3 ปีที่แล้ว
yes
@Lord2225 4 ปีที่แล้ว
Woooho that is smart xD
@samipshah5977 ปีที่แล้ว
nice dog drawing
@AbdennacerAyeb 4 ปีที่แล้ว
Would you make tutorial every time.
@ivan.zhidkov 4 ปีที่แล้ว ⁺¹
Stop doing clicky sounds with your tongue. So annoying.
@ayushmehta5844 4 ปีที่แล้ว ⁺¹
What a choot :)
@vevui7503 3 ปีที่แล้ว
0:33x.

ต่อไป

เล่นอัตโนมัติ

DETR: End-to-End Object Detection with Transformers (Paper Explained)