Pix2Pix Paper Walkthrough

Aladdin Persson

มุมมอง 33 602

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 9 ม.ค. 2025

ความคิดเห็น • 47

@AladdinPersson 3 ปีที่แล้ว ⁺²³
Next video will be a from scratch implementation on Pix2Pix. Like the video if you want to see more paper implementations!
Timestamps:
0:00 - Introduction
1:29 - Overview of paper
2:25 - Why GANs for Pix2Pix
3:16 - Loss Function
5:12 - Generator Architecture
9:24 - Discriminator Architecture
12:00 - Some training details
13:24 - Turkers to evaluate GANs
14:10 - Patch size for Discriminator
15:19 - Generator works for larger images
15:50 - More details for implementation
19:05 - Ending
@udbhavprasad3521 3 ปีที่แล้ว
Definitely!
@superaluis 3 ปีที่แล้ว ⁺¹⁰
Normally people don't explain the implementation details of the papers like you did (and very clearly). Awesome video.
@AladdinPersson 3 ปีที่แล้ว ⁺⁵
I appreciate the kind words Antonio :) I think since I have actually implemented it from scratch before making the paper review I know all the details and the tricky parts, which is different (and understandably so) from a lot of other people who make paper reviews. But in this way I feel that there's a really high probability that what I am actually saying is not nonsense but is actually valuable information, that's the goal anyways
@Lutz64 3 ปีที่แล้ว ⁺²⁴
I really like and appreciate your videos, there are no good channels for practical deep learning coding.
@achleshwarluthra4608 3 ปีที่แล้ว
Totally agreed!
Really amazing videos.
@joelsabiti4828 2 หลายเดือนก่อน
A good friend of mine shared with me this entire videoset. I like how we start from the theory to the practical bit. you do not just see things being done and wonder why. it also triggers you to think about how you would solve a different problem.
@verve2831 3 ปีที่แล้ว ⁺⁴
I didn't know how much I needed this until I saw this :")
@oliverl7312 3 ปีที่แล้ว
Thanks for the video! I think that "we alternate between one gradient descent step on D, then one step on G. We use minibatch SGD and apply the Adam solver" might solve your confusion at 12:44.
@damascenoalisson 3 ปีที่แล้ว
Amazing explanation, I've read this paper many times before but only now I really understood it!
@23kl104 3 ปีที่แล้ว ⁺³
Minibatch SGD is just referring to running on mini batches I guess. Not using the "full" gradient of the data set, but stochastic estimates.
Thanks for the video man.
@superaluis 3 ปีที่แล้ว
Indeed, they should mean using it just for estimates. And then the optimization algorithm is Adam.
@sureshgohane1297 3 ปีที่แล้ว
Superlike !!! can't wait for implementation.
@alonalon8794 3 ปีที่แล้ว
@Aladdin Persson
great explanations.
some questions:
at 13:25..the paragraph below the yellow marked test, what do they mean by applying dropout at test time why do they do that? dropout technique is usually used to avoid overfitting in training phase and isn't relevant to inference if i'm not mistaken.
Also, what's meant by applying batchnorm using the statistics of the test batch? batchnorm is also something that is relevant to training and not inference, isn't it?
@alonalon8794 3 ปีที่แล้ว
11:31
why classifying each NxN patch in the image as real or fake is a good approach? say you have x patches that are classified as
real and all the other patches are classified as fake, then what's the conclusion regarding the classification of the whole image?
what's the tradeoff/ratio between the number of fake classified patches and real classified patches?
they just average the responses and if the average value is greater than 0.5, they assume the whole image is real, otherwise fake?
@alonalon8794 3 ปีที่แล้ว
11:25
I'm not sure regarding the "fewer parameters and run faster".
At the end of the day, I split the original image to multiple patches
and convolutions are done w.r.t these patches which compose the SAME ORIGINAL IMAGE
whether we split it to x patches or y patches where x > y for example.
@baohuynh5462 3 ปีที่แล้ว
The best channel. Thank you so much
@googlable 2 ปีที่แล้ว
You rock bro. Keep it up
@harshmankodiya9397 3 ปีที่แล้ว
I think in depth explanation for loss function and the notations used for it in paper could have helped more
@TheAndryhaTV 3 ปีที่แล้ว
Thank you so much!
@yifeipei5484 3 ปีที่แล้ว
Always the best!
@IndrainKorea 3 ปีที่แล้ว
Nice video man, great explanation as well 👍👍
@muhammadzubairbaloch3224 3 ปีที่แล้ว
Great research work
@hanantanasra5121 2 ปีที่แล้ว
thanks for the tutorial!! great job. do you have a tutorial regarding GauGan? thanks!
@sfaroy 3 ปีที่แล้ว
What PDF viewer do you use? I like the annotation toolbar
@joviandsouza6008 3 ปีที่แล้ว ⁺¹
Hi Aladdin Awesome video, Just curious which software are you using to annotate the pdfs ?
@bedoelsayed6970 2 ปีที่แล้ว
+
@saharmokarrami7722 3 ปีที่แล้ว
Please implement adversarial attacks in nlp, thanks
@prajotkuvalekar2348 3 ปีที่แล้ว
Can u please make a vdo on implementation of ssd in pytorch
@riis08 3 ปีที่แล้ว
@Aladdin Persson.... thanks once again... for these good videos....
@Georgesbarsukov 3 ปีที่แล้ว
Personally, my favorite part of the paper is the PatchGAN.
@palashkamble2325 3 ปีที่แล้ว
Can you shed some light on what it means to learn a loss function mentioned in the paper? And how is it different from the other loss functions used in, say, conv nets? My interpretation is that usual loss functions are hand-engineered but have no idea regarding the former loss function.
@AladdinPersson 3 ปีที่แล้ว
The loss for the Generator is to try to fool the Discriminator but what does that exactly mean? It's quite different from say, minimize the mean squared error for which we know the exact formula. The discriminator might focus on different things throughout training, and so if you think about it in this way we are "learning" the loss function which essentially corresponds to "be indistinguishable from reality". It's perhaps an unusual way to think about it but I like this way of thinking about it that they proposed
@palashkamble2325 3 ปีที่แล้ว
@@AladdinPersson Thanks. Now I got it. 👍
@shourabhpayal1198 3 ปีที่แล้ว
Great vid
@HamzaKhan-qi2rz 3 ปีที่แล้ว
Can you please do a vid2vid or fewshot-vid2vid paper.
@hoaxuan7074 3 ปีที่แล้ว
That's nice. What do you think about low curvature initialization of neural nets versus high curvature initialization with random noise. My view is you will never squeez all the randomness out of the system and that the net is actually harder to train.
However my personal view is no training algorithm can do more than search the set of statistical solutions to fit a neural network. More than that is not so possible in higher dimensional space.
So the you would expect random initialization only to slow training but negatively leave a residue of noisy responses in the net.
I suppose pruning and retraining would help you move away from purely statistical behavior. And likewise 'explainable' neural networks where you train a first net to map inputs to human concepts and then train a second net from those concepts to the wanted results.
@AladdinPersson 3 ปีที่แล้ว ⁺⁴
Are you GPT3 bro? Your comments make no sense to me \:
@hoaxuan7074 3 ปีที่แล้ว
@@AladdinPersson No, GP3 is built with the primative topology. I use fast transforms as my essence. The chicken bones say Aladdin should be more magical and shake off the heavy shakels of dogma he is enchained in.
@hoaxuan7074 3 ปีที่แล้ว
A low curvature initialization would cause the least changes as the data moves through the net. Ideally the output would be automatically the same as the input with no changes. However that is difficult. It would be much easier in conventional nets if a 2-sided parametric ReLU was available. A one sided parametric ReLU is relatively well known in conventional research.
Another different thing is to find YT videos on is splines and neural nets. And then consider ReLU as an actual switch. You may find you end up with a better understanding than top researchers. Eg. Ankit Patel breaking bad video.
@musashi_hp 3 ปีที่แล้ว ⁺¹
that's what we do babe
@mostafamousa7093 3 ปีที่แล้ว
Amazing
@ccuuttww 3 ปีที่แล้ว
I will try kids color book to see what it looks like lol
@ttaylor9916 ปีที่แล้ว
doing examples.. good.. but you need to state what files you are using before each video... the numbering and names in the videos and the git dont match. confusing.
@madhuvarun2790 3 ปีที่แล้ว
Your voice and the way you talk somehow sounds similar to Shawn Mendes.
@pdbsstudios7137 3 ปีที่แล้ว
Ok all this talking and fucking math but where do i play this for myself?
@AladdinPersson 3 ปีที่แล้ว
what

ต่อไป

เล่นอัตโนมัติ