Thank you, Yannic! Extremely well explained and enjoyed your correct critique of the paper as well. Look forward to seeing more such content. Kudos! Subscribed to your channel
I wonder if combining image pyramids with transformer networks would make the small features less useful than larger ones, or make them more independent, kind of like the "Processing Megapixel Images" paper. In the image pyramid case, larger features would show up somewhere in the most shrunken image, and several larger images, while smaller features would show up at the bottom and would be part of the larger features, but might not always. I think recognizing this way could improve recognizing drawings of cats after only seeing actual cats before.
This is a very valid thought. The counterpoint would be that if there is a signal that generalizes well, a good classifier will pick up on it, regardless of how well you "hide" it. I don't know what overweights, I guess it's at least worth a shot.
Very interesting paper. But it seems unnecessary to go all the way and create a new robust dataset... Why didn't they simply train a classifier and add a penalty term to the loss function that makes the first layers invariant to small changes of x+dy/dx.
Forget what I said. I see there is work on adversarial training that already does what I suggested. This was more of a theoretical work which is why they decided to modify the images themselves, to show what parts of the image were fooling the classifier.
You have a gift of explaining clearly! Keep up with the excellent work!
Thank you for doing this, I cant tell you enough how much I appreciate these videos!
Thank you, Yannic! Extremely well explained and enjoyed your correct critique of the paper as well. Look forward to seeing more such content. Kudos! Subscribed to your channel
I wonder if combining image pyramids with transformer networks would make the small features less useful than larger ones, or make them more independent, kind of like the "Processing Megapixel Images" paper.
In the image pyramid case, larger features would show up somewhere in the most shrunken image, and several larger images, while smaller features would show up at the bottom and would be part of the larger features, but might not always. I think recognizing this way could improve recognizing drawings of cats after only seeing actual cats before.
This is a very valid thought. The counterpoint would be that if there is a signal that generalizes well, a good classifier will pick up on it, regardless of how well you "hide" it. I don't know what overweights, I guess it's at least worth a shot.
Great explanation. Thanks
Helps a lot!!! Thx
Very interesting paper. But it seems unnecessary to go all the way and create a new robust dataset... Why didn't they simply train a classifier and add a penalty term to the loss function that makes the first layers invariant to small changes of x+dy/dx.
Forget what I said. I see there is work on adversarial training that already does what I suggested. This was more of a theoretical work which is why they decided to modify the images themselves, to show what parts of the image were fooling the classifier.
@@rpcruz I'm interested in the paper you mentioned. Can you introduce the title of that paper? Thank you!
omg thank you very much!