ERRATUM: At 3:44 showing the two equations: The second one should have the derivative with respect to x_t rather than w_t, to increase the loss as much as possible by travelling in the direction of the gradient of the loss with respect to the INPUT rather than the loss with respect to the weights. Thanks to Hannes Whittingham for pointing this out! 🎯
Thank you for the good video! Just watched this as part of the BlueDot AI safety fundamentals course and excited to learn more about adversarial examples
The paper „On Adaptive Attacks to Adversarial Example Defenses“ by Tramèr et al. shows, that none of the defense mechanisms against adv. ex. are robust.
It is not clear to me from the video how FGSM modifies the input to offset the SGD weight update calculated on loss. The input x is not in the axes of the graph. Why changing the input can interfere with the weight update?
Thanks for the question. What an old video, yes, I could have made it clearer. The idea is to backpropagate the loss through the weights up to the input neurons (input x) and in the same way in which SGD updates the weights, now we update the input x. I showed it for the weights because we can consider the input x, which is now variable, as additional sets of weights.
@@AICoffeeBreak YOU actually specified in the video that this is an whitebox (untargeted or targeted, we need access to the gradients..., which is whitebox, no?)
@@orellavie6233 Bonus points to you for paying this much attention. 👍 Yes, in the paper they used a white-box algorithm (acces to gradients), true. But the same result could be achieved with a black-box algorithm too.
@@AICoffeeBreak thanks :)! How it is possible to achieve it with blackbox? To use a transfer surrogate model like Papernot offered? Or I have missed something? You do need the gradients of the model, or to query a model until you find the right path?
@@orellavie6233 Brute-forcing is indeed an approach. And yes, the Papernot et al. Local substitute model could also be a thing. Here is a great survey on black box adversarial attacks: arxiv.org/abs/1912.01667
Hi Pavel! 1:54 is explaining one very simple way of how to do it. Here I try to break it down even further: We have the model with it's specific decision boundary (fixed and given). So instead of changing the parameters of the model, we change the *input* slightly, enough to pass to the opposite direction of the *decision boundary*. How we achieve that? By FGSM at 1:54, for example. This could have been a wonderful diagram to make and explain in the video, in hindsight...
@@AICoffeeBreak No - it is how to do it, but not why it works at all. I mean, why does it take so little to cross the decision boundary? If you and I didn't know about adversarials before and you came out with idea and said to me that you can fool neural network by small change of pixel values, I wouldn't believe you. Why when we create adversarial for some image, for example "car", and we want it to be classified as "airplane", we do not see that something like "wings" starts to appear, but instead added values looks like a noise? First when I saw it - I thought it is an overfitting problem - that decision boundary has very complicated shape and hence almost every input image is placed near decision boundary But it rises some questions: 1) why neural nets become more confident in prediction of adversarial example than in original image, if boundary condition is so complicatelly shaped? 2) why random noise doesn't change prediction class, why do we need specific directions? We would expect random predictions if boundaries has irregular shape 3) why we can add the same adversarial difference to any other image and still have the same misclassification with the same prediction class. We also would expect random results It means that there something interesting what's going on. And when I was searching for the answer, I found interesting video by Ian Goodfellow: th-cam.com/video/CIfsB_EYsVI/w-d-xo.html which I recommend. He proposed very interesting idea, that it can be not because of overfitting but because of underfitting, and that neural networks in spite of non-linearities in activation functions are piecewise-linear models in some extent. And because of the linearity of the model we can find some direction which goes deeply beyond the decision boundary - it would explain previous questions: 1) it's simply because in linear models, if we go very deep beyond the decision boundary - we have more confidence in the prediction 2) if the goal is to move far in certain direction, then it can be explained why random direction wouldn't give us the desired results 3) because of the linearity of the dicision boundary we can cross this boundary from any point, if the adversarial direction vector length is large enough And it gives us some interesting insights about how neural networks actually works and how difficult the problem of adversarial examples actually is
Now I understand your question much better, thanks for the lengthy answer! But here you have it: "why" is not at all trivial to answer. I recommend the link you suggested too for everyone who prefers all the juicy details in 1 and 1/2 hours instead of a 10 minute taste bite. 😃 Thank you! I'll add it to the video description (th-cam.com/video/CIfsB_EYsVI/w-d-xo.html).
What is the reasoning behind using the sign of the gradients instead of the gradients itself? It feels like you are just throwing own useful information when you just use the sign.
Hi and thanks for the question. I do not know exactly the part that confused, but the magnitude of the gradient is also used. The sign is used to determine the direction to move into. Then, one moves the input by (a fraction of) the magnitude of the gradient.
ERRATUM: At 3:44 showing the two equations:
The second one should have the derivative with respect to x_t rather than w_t, to increase the loss as much as possible by travelling in the direction of the gradient of the loss with respect to the INPUT rather than the loss with respect to the weights. Thanks to Hannes Whittingham for pointing this out! 🎯
Thank you for the good video! Just watched this as part of the BlueDot AI safety fundamentals course and excited to learn more about adversarial examples
Incredible that such great information is just free on TH-cam! Thanks for the video! Great job!!!
Thanks for your heartwarming message!
This channel is going to get super popular soon.
Wonderful Explanation Ma'am ! Thank you so much
This video truly deserves more views!! very informative content explained in a simple way, thank you very much for uploading it I love it
Excellent video
Awesome content!! Such a great and concise explanation💕.
Nice work!
thanks for your such incredible videos.
great explaination
Thanks. On spring break.
The paper „On Adaptive Attacks
to Adversarial Example Defenses“ by Tramèr et al. shows, that none of the defense mechanisms against adv. ex. are robust.
It is not clear to me from the video how FGSM modifies the input to offset the SGD weight update calculated on loss. The input x is not in the axes of the graph. Why changing the input can interfere with the weight update?
Thanks for the question. What an old video, yes, I could have made it clearer.
The idea is to backpropagate the loss through the weights up to the input neurons (input x) and in the same way in which SGD updates the weights, now we update the input x. I showed it for the weights because we can consider the input x, which is now variable, as additional sets of weights.
awesome
The initial panda-gibbon example will be an example of a targeted black-box attack, correct?
Corect. :)
@@AICoffeeBreak YOU actually specified in the video that this is an whitebox (untargeted or targeted, we need access to the gradients..., which is whitebox, no?)
@@orellavie6233 Bonus points to you for paying this much attention. 👍 Yes, in the paper they used a white-box algorithm (acces to gradients), true. But the same result could be achieved with a black-box algorithm too.
@@AICoffeeBreak thanks :)! How it is possible to achieve it with blackbox? To use a transfer surrogate model like Papernot offered? Or I have missed something? You do need the gradients of the model, or to query a model until you find the right path?
@@orellavie6233 Brute-forcing is indeed an approach. And yes, the Papernot et al. Local substitute model could also be a thing.
Here is a great survey on black box adversarial attacks: arxiv.org/abs/1912.01667
What about contrastive learning? For example, I think that the image that most matches CLIPs "a panda" would be a realistic image of a panda.
Why don't u try installation tutorials alongside with tgeese that could reach broader audience of your work
BTW awesome work 👌
Why nobody interested in WHY it is possible instead of how to apply it
Hi Pavel! 1:54 is explaining one very simple way of how to do it. Here I try to break it down even further: We have the model with it's specific decision boundary (fixed and given). So instead of changing the parameters of the model, we change the *input* slightly, enough to pass to the opposite direction of the *decision boundary*. How we achieve that? By FGSM at 1:54, for example.
This could have been a wonderful diagram to make and explain in the video, in hindsight...
@@AICoffeeBreak
No - it is how to do it, but not why it works at all.
I mean, why does it take so little to cross the decision boundary?
If you and I didn't know about adversarials before and you came out with idea and said to me that you can fool neural network by small change of pixel values, I wouldn't believe you.
Why when we create adversarial for some image, for example "car", and we want it to be classified as "airplane", we do not see that something like "wings" starts to appear, but instead added values looks like a noise?
First when I saw it - I thought it is an overfitting problem - that decision boundary has very complicated shape and hence almost every input image is placed near decision boundary
But it rises some questions:
1) why neural nets become more confident in prediction of adversarial example than in original image, if boundary condition is so complicatelly shaped?
2) why random noise doesn't change prediction class, why do we need specific directions? We would expect random predictions if boundaries has irregular shape
3) why we can add the same adversarial difference to any other image and still have the same misclassification with the same prediction class. We also would expect random results
It means that there something interesting what's going on. And when I was searching for the answer, I found interesting video by Ian Goodfellow: th-cam.com/video/CIfsB_EYsVI/w-d-xo.html which I recommend.
He proposed very interesting idea, that it can be not because of overfitting but because of underfitting, and that neural networks in spite of non-linearities in activation functions are piecewise-linear models in some extent. And because of the linearity of the model we can find some direction which goes deeply beyond the decision boundary - it would explain previous questions:
1) it's simply because in linear models, if we go very deep beyond the decision boundary - we have more confidence in the prediction
2) if the goal is to move far in certain direction, then it can be explained why random direction wouldn't give us the desired results
3) because of the linearity of the dicision boundary we can cross this boundary from any point, if the adversarial direction vector length is large enough
And it gives us some interesting insights about how neural networks actually works and how difficult the problem of adversarial examples actually is
Now I understand your question much better, thanks for the lengthy answer! But here you have it: "why" is not at all trivial to answer. I recommend the link you suggested too for everyone who prefers all the juicy details in 1 and 1/2 hours instead of a 10 minute taste bite. 😃 Thank you! I'll add it to the video description (th-cam.com/video/CIfsB_EYsVI/w-d-xo.html).
What is the reasoning behind using the sign of the gradients instead of the gradients itself? It feels like you are just throwing own useful information when you just use the sign.
Hi and thanks for the question. I do not know exactly the part that confused, but the magnitude of the gradient is also used. The sign is used to determine the direction to move into. Then, one moves the input by (a fraction of) the magnitude of the gradient.