I hope you're as excited as I am for this new series where we will start with the basics of how GANs work and then start to implement the most basic GAN and work our way up more influential/state of the art GAN architectures completely from scratch. The more challenging ones will have a paper walkthrough and start with a quick summary/presentation of how it works. I learned a lot and was inspired to make these GAN videos by the GAN specialization on coursera which I recommend. Below you'll find both affiliate and non-affiliate links, the pricing for you is the same but a small commission goes back to the channel if you buy it through the affiliate link. affiliate: bit.ly/2OECviQ non-affiliate: bit.ly/3bvr9qy Timestamps: 0:00 - Introduction 0:49 - Why GANs are awesome 4:53 - How GANs work 11:45 - Ending
Thanks for this series. In your first videos I discovered GANs and since then I started to learn about them on my own and now I want to go through all your videos as well. Great job!
at 9.30 the answer should be minimize as the for fake values output generated from 2nd part of the loss function would be close to 0 and hence the output from discriminant function should be 0 and as the job of discriminant is to identify the fake outputs by generator it will try to minimize its loss function: I might be wrong as I have just started studying this though help will be encouraged
9:30 discriminator wants to maximize the objective which is the D(x) be large and D(G) as small as possible so that decrease the loss, in the Question This refers to the objective exactly like the logistic regression function
Great video! Content was explained in a way that extremely easy to understand. Thank you...I will be watching ALL of your videos! Please keep 'em coming.
Man you got great skills in explaining topics with ease.Thank you for your efforts in making such great videos. Can you please share your linkedin profile link?
Hey. I am considering using GANs for my data augmentation to tacke imbalanced classed in the task of Facial Emotion Recognition. I am planning to use Google collab to train my GAN model with datasets contain specific emotion. What gan model would do best in creating most realistic results? CycleGAN?
Hey, Really great video. But I was a bit confused about why would the Discriminator will like to maximize the loss. According to my understanding, the Discriminator can output 1 for all the images (real or fake), which will lead to increase in the loss as we are taking log(0). Time Stamp: 9:20 Can you please explain where am I going wrong?
The discriminator does not want to output 1 for the second term in the loss function (log(1 - D(G(z))) because then it would be log(0) which is -infinity loss. This also means that the generator has managed to fool the discriminator and then it needs to be reflected in the loss, so if we want to maximize the loss term with respect to the discriminator -infinity is a huge price to pay
@@AladdinPersson 1) I'm not sure I understand what's meant by -infinity loss, and specifically by negative loss? 2) In addition, to minimize the total expression of the loss function, the discriminator needs to output 1 in the left term D(xi), and zero in the right term log(1 - D(G(z)) - which makes sense because whatever the Generator outputs, the Disc wants to say that it's fake (=0). And if that's what happening for each i, then we'll get 0 loss. Or I'm wrong? 3) why the answer to the question mentioned in the slide is "maximize"? regarding loss functions, the goal is to minimize them. I guess there's something that I miss.. the relevant time stamp for my comment is 9:34 Thanks a lot for your answers and your videos.
@@alonalon8794 This loss is similar to binary cross entropy with few changes one of which is lack of -1 before 1/m. This way it's -inf and you have to maximize the loss
Fast transform (fixed filter bank) neural networks trained as autoencoders behave like GANs. Feed in noise and get out images. However there are no libraries for that. You still have to code it yourself. Basically Fast Transform nets use fixed dot products (enacted with fast transforms) and adjustable (parametric) activation functions. Adjustability is swapped compared to conventional nets. The fixed dot products force very statistical behavior.
@@AladdinPersson There is a blog post somewhere on the internet about that. Basically what is adjustable in a neural net is swapped around. The dot products become fixed (and enacted by fast transforms) and the activation functions become adjustable. Parametric (adjustable) ReLU is anyway an already known thing. The fixed dot products force more statistical type behavior from the neural network than conventional neural networks which may account for the GAN type effect.
There are some slight technical things. To stop the first transform from taking a spectrum of the input data you apply a random fixed pattern of sign flips to the input data. You can use a final transform as a sort of readout layer. You can use fi(x)=ai.x x=0 as the activation functions, i=0 to n-1. The fast Walsh Hadamard trsnsform is good. The net is then sign flips, transform, activation functions, transform, activation functions...transform. There is no need for bias terms.
9:58 we want the Discriminator D(g(z)) to give a number close to 1 but, the loss will be large in this case since log(1-0.9) is large, and this is not what the discriminator wants, but this is what the generator wants, so discriminator and generator fight here, generator want to maximize and discriminator want to minimize D(G(z)). A generator wants to minimize the term I don't really get why, but I have written what I understand?
I had similar confusion but maximise loss means not going to -infinity and staying close to 0. Here is answer from author. The discriminator does not want to output 1 for the second term in the loss function (log(1 - D(G(z))) because then it would be log(0) which is -infinity loss. This also means that the generator has managed to fool the discriminator and then it needs to be reflected in the loss, so if we want to maximize the loss term with respect to the discriminator -infinity is a huge price to pay.
Bro my next video is also on gans... I'm kinda struggling generating really good quality of image of size 128x128 and 256x256... that's why the delay... I'm sure you'll like the video when it'll come out :)
@@AladdinPersson Before, I tried quite complex blocks like resnets and densnets... but non of em worked... in fact I've found that unnecessary complex blocks don't work really well easily... to make them work you have to couple them with very specific initialization and stuff... After many experiments I've found that simple blocks don't necessarily produce amazing results either, but they can be trained very easily... so ultimately they produce better results for me. For now in generator, I'm using block of - conv>leaky-relu>pixel-norm And in discriminator, its - conv>batchnorm>leaky-relu Yup its just that simple :) And I'm getting pretty good results... Although I'm also using some other known techniques like mini batch discrimination, gradient penalty and adding noise to the true data etc... which are very important actually... All of which I'll explain in my video :)
Aladin on face generation: "Honestly, that makes me a little bit sad"... Just 2 years later, Midjourney and gpt4 shows up. Aladin, you better buckle up for the next 2 years... 😂 By the way, it also kind of scares me 🤔
I hope you're as excited as I am for this new series where we will start with the basics of how GANs work and then start to implement the most basic GAN and work our way up more influential/state of the art GAN architectures completely from scratch. The more challenging ones will have a paper walkthrough and start with a quick summary/presentation of how it works.
I learned a lot and was inspired to make these GAN videos by the GAN specialization on coursera which I recommend. Below you'll find both affiliate and non-affiliate links, the pricing for you is the same but a small commission goes back to the channel if you buy it through the affiliate link.
affiliate: bit.ly/2OECviQ
non-affiliate: bit.ly/3bvr9qy
Timestamps:
0:00 - Introduction
0:49 - Why GANs are awesome
4:53 - How GANs work
11:45 - Ending
What about Multimodal Unsupervised Image-to-image translation ? Please make awesome video lectures tutorials just like the rest of GANs.
This is really one of the best explanation of GAN's loss function I found on the YT. Thanks.
Thanks for this series. In your first videos I discovered GANs and since then I started to learn about them on my own and now I want to go through all your videos as well. Great job!
I been trying to develop GAN image generation model, and quickly learned that it's not easy.
at 9.30 the answer should be minimize as the for fake values output generated from 2nd part of the loss function would be close to 0 and hence the output from discriminant function should be 0 and as the job of discriminant is to identify the fake outputs by generator it will try to minimize its loss function: I might be wrong as I have just started studying this though help will be encouraged
Can't Wait For this amazing series
Congz to 4k subs! :D
Thanks! 😁
The example of fake money explains the idea of GANs really well!
9:30 discriminator wants to maximize the objective which is the D(x) be large and D(G) as small as possible so that decrease the loss,
in the Question This refers to the objective exactly like the logistic regression function
Great video! Content was explained in a way that extremely easy to understand. Thank you...I will be watching ALL of your videos! Please keep 'em coming.
you deserve more views
Love this series. Very informative. Thanks for this!!
Man you got great skills in explaining topics with ease.Thank you for your efforts in making such great videos. Can you please share your linkedin profile link?
I don't use Linkedin much at all :p
@@AladdinPersson okk
Excellent content. I'm exited to watch the rest of the playlist (sry for bad english)
Just amazing 🤩
Hey. I am considering using GANs for my data augmentation to tacke imbalanced classed in the task of Facial Emotion Recognition. I am planning to use Google collab to train my GAN model with datasets contain specific emotion. What gan model would do best in creating most realistic results? CycleGAN?
Just fabulous video, thank you 😀
Thanks for your videos. Can you make a video on SAGAN?
Hey,
Really great video. But I was a bit confused about why would the Discriminator will like to maximize the loss.
According to my understanding, the Discriminator can output 1 for all the images (real or fake), which will lead to increase in the loss as we are taking log(0). Time Stamp: 9:20
Can you please explain where am I going wrong?
The discriminator does not want to output 1 for the second term in the loss function (log(1 - D(G(z))) because then it would be log(0) which is -infinity loss. This also means that the generator has managed to fool the discriminator and then it needs to be reflected in the loss, so if we want to maximize the loss term with respect to the discriminator -infinity is a huge price to pay
@@AladdinPersson Ahh i got it. It would be negative infinity if the generator fools the Discriminator.
@@AladdinPersson 1) I'm not sure I understand what's meant by -infinity loss, and specifically by negative loss?
2) In addition, to minimize the total expression of the loss function, the discriminator needs to output 1 in the left term D(xi), and zero in the right term
log(1 - D(G(z)) - which makes sense because whatever the Generator outputs, the Disc wants to say that it's fake (=0).
And if that's what happening for each i, then we'll get 0 loss. Or I'm wrong?
3) why the answer to the question mentioned in the slide is "maximize"? regarding loss functions, the goal is to minimize them.
I guess there's something that I miss..
the relevant time stamp for my comment is 9:34
Thanks a lot for your answers and your videos.
@@alonalon8794 This loss is similar to binary cross entropy with few changes one of which is lack of -1 before 1/m. This way it's -inf and you have to maximize the loss
@@alonalon8794 correct , i have same issue can you see my comments
Really good stuff, thank you for making this
Thank you very much for this amazing explanation.
thanks for this amazing video and series.
Awesome!!! thank you
at 9:30 he says the discriminator wants to maximize the loss? shouldn't it be wanting to minimize it? can someone help me understand? TIA
Fast transform (fixed filter bank) neural networks trained as autoencoders behave like GANs. Feed in noise and get out images. However there are no libraries for that. You still have to code it yourself. Basically Fast Transform nets use fixed dot products (enacted with fast transforms) and adjustable (parametric) activation functions. Adjustability is swapped compared to conventional nets. The fixed dot products force very statistical behavior.
That's interesting, got any resource for checking that out some more?
@@AladdinPersson There is a blog post somewhere on the internet about that. Basically what is adjustable in a neural net is swapped around. The dot products become fixed (and enacted by fast transforms) and the activation functions become adjustable. Parametric (adjustable) ReLU is anyway an already known thing. The fixed dot products force more statistical type behavior from the neural network than conventional neural networks which may account for the GAN type effect.
There are some slight technical things. To stop the first transform from taking a spectrum of the input data you apply a random fixed pattern of sign flips to the input data. You can use a final transform as a sort of readout layer. You can use fi(x)=ai.x x=0 as the activation functions, i=0 to n-1. The fast Walsh Hadamard trsnsform is good. The net is then sign flips, transform, activation functions, transform, activation functions...transform. There is no need for bias terms.
9:58 we want the Discriminator D(g(z)) to give a number close to 1 but, the loss will be large in this case since log(1-0.9) is large, and this is not what the discriminator wants, but this is what the generator wants, so discriminator and generator fight here, generator want to maximize and discriminator want to minimize D(G(z)).
A generator wants to minimize the term I don't really get why, but I have written what I understand?
I had similar confusion but maximise loss means not going to -infinity and staying close to 0. Here is answer from author.
The discriminator does not want to output 1 for the second term in the loss function (log(1 - D(G(z))) because then it would be log(0) which is -infinity loss. This also means that the generator has managed to fool the discriminator and then it needs to be reflected in the loss, so if we want to maximize the loss term with respect to the discriminator -infinity is a huge price to pay.
From where did you learn this cool stuff brother?? By the way Thanks for sharing with us!
Awesome courses, papers, blog posts:)
@@AladdinPerssonshould we minimise the loss of discriminator? As its job is to find the fake, video around 9:30
The output of discriminator is in range [0, 1]
Thanks, I am motivated to complete this series on priority. but I have heard a lot of data is required for gans. Is this true?
thnx man!
This is a Russian name at 2:36 and is pronounced as "de-'nis' 'shee-'riaef'
Thanks Slav, I definitely butchered that name :\
@@AladdinPersson nah, you did it great, I have enjoyed and appreciate to hear your version 💖
@@DenisShiryaev Heey 😍 Honor having you here. I'm a big fan!
Thanks bro
how the discriminator, at the start , knows to distinguish for instance a dollar bill?
Started GAN, let's meet you all at end of this series.
Bro my next video is also on gans... I'm kinda struggling generating really good quality of image of size 128x128 and 256x256... that's why the delay... I'm sure you'll like the video when it'll come out :)
Hey that's awesome :) Yeah training GANs can be a nightmare, what architecture you using to to try and get 256x256?
@@AladdinPersson Before, I tried quite complex blocks like resnets and densnets... but non of em worked... in fact I've found that unnecessary complex blocks don't work really well easily... to make them work you have to couple them with very specific initialization and stuff...
After many experiments I've found that simple blocks don't necessarily produce amazing results either, but they can be trained very easily... so ultimately they produce better results for me.
For now in generator, I'm using block of
- conv>leaky-relu>pixel-norm
And in discriminator, its
- conv>batchnorm>leaky-relu
Yup its just that simple :)
And I'm getting pretty good results...
Although I'm also using some other known techniques like mini batch discrimination, gradient penalty and adding noise to the true data etc... which are very important actually...
All of which I'll explain in my video :)
Is the video out?
@@talha_anwar Yes, here: th-cam.com/video/cqXKTC4IP10/w-d-xo.html
Dude, I can help you with the pronunciation of that difficult name. It is Денис Ширяев. That simple!
Aladin on face generation: "Honestly, that makes me a little bit sad"... Just 2 years later, Midjourney and gpt4 shows up. Aladin, you better buckle up for the next 2 years... 😂 By the way, it also kind of scares me 🤔
Why does discriminator want to maximise the loss and the generator want to minimise the loss?
I will never sleep again (0:13)
I think discriminator wants to minimize the loss and generator's purpose is fooling that's why generator wants to maximize the loss.
This confused me as well at first however remember that that log function goes to negative inf when its close to 0 not pos inf.