What's great is I will usually look up a tutorial and it'll say something like "A residual block does this" And I'm like "Great... why?" And this lecture just put everything in perspective. It motivated the problems, showed the purpose of every step, showed how each iteration went to solve the problems of the previous. This is honestly a great lecture.
That was a truly very impressive and informative lecture. I wish I could also listen to his summary and comments on the latest developments until 2021 such as NFnet.
Amazing lecture explaining the history of convnets from alexNet to resNets and mobileNets and also gives us idea as to which network to use if we are custom designing convnet architecture for our problem. Slides contain ton of information Thank you Justin Johnson
The formula for calculating the output size of a convolutional layer in a Convolutional Neural Network (CNN) depends on several factors, including: 1. Input size (W_in, H_in): The spatial dimensions (width and height) of the input image or feature map to the convolutional layer. 2. Filter size (F): The spatial dimensions (width and height) of the convolutional filter (kernel). 3. Stride (S): The step size at which the filter is applied to the input. It defines how much the filter is shifted across the input. 4. Padding (P): The number of pixels added to the input on all sides to preserve spatial dimensions after convolution. The formula to calculate the output size (W_out, H_out) of the convolutional layer can be given as: W_out = ((W_in - F + 2 * P) / S) + 1 H_out = ((H_in - F + 2 * P) / S) + 1 If you want to maintain the spatial dimensions (W_in, H_in) of the input after convolution (i.e., no spatial downsampling), you can set the padding as: P = (F - 1) / 2 This formula assumes that the stride S is the same in both the horizontal and vertical directions. If you use different strides for width and height, the formula will change accordingly. It's worth noting that some frameworks and implementations may use slightly different conventions for padding (e.g., 'valid' or 'same' padding), so it's essential to check the documentation and specifications of the specific CNN implementation you are using.
Each pooling operation is done on one input layer at a time, so it's kind of a 2D operation. Each pooling layer downsamples a corresponding input channel.
What's great is I will usually look up a tutorial and it'll say something like "A residual block does this" And I'm like "Great... why?" And this lecture just put everything in perspective. It motivated the problems, showed the purpose of every step, showed how each iteration went to solve the problems of the previous. This is honestly a great lecture.
This is one of the best lectures on CNN architectures.
what a great documentary on CNN architectures..slides are comprehensive and lecture(Dr Johnson) knows his stuff to extreme
That was a truly very impressive and informative lecture. I wish I could also listen to his summary and comments on the latest developments until 2021 such as NFnet.
Great resource, thoroughly researched + beautifully curated! Thanks a lot for the teachings!
Amazing lecture explaining the history of convnets from alexNet to resNets and mobileNets and also gives us idea as to which network to use if we are custom designing convnet architecture for our problem.
Slides contain ton of information
Thank you Justin Johnson
28:26 why use 3x3 kernel in vgg
39:20 Google's way of dealing with kernel size
49:33 resnets
Truly amazing lecture to listen many times
The formula for calculating the output size of a convolutional layer in a Convolutional Neural Network (CNN) depends on several factors, including:
1. Input size (W_in, H_in): The spatial dimensions (width and height) of the input image or feature map to the convolutional layer.
2. Filter size (F): The spatial dimensions (width and height) of the convolutional filter (kernel).
3. Stride (S): The step size at which the filter is applied to the input. It defines how much the filter is shifted across the input.
4. Padding (P): The number of pixels added to the input on all sides to preserve spatial dimensions after convolution.
The formula to calculate the output size (W_out, H_out) of the convolutional layer can be given as:
W_out = ((W_in - F + 2 * P) / S) + 1
H_out = ((H_in - F + 2 * P) / S) + 1
If you want to maintain the spatial dimensions (W_in, H_in) of the input after convolution (i.e., no spatial downsampling), you can set the padding as:
P = (F - 1) / 2
This formula assumes that the stride S is the same in both the horizontal and vertical directions. If you use different strides for width and height, the formula will change accordingly.
It's worth noting that some frameworks and implementations may use slightly different conventions for padding (e.g., 'valid' or 'same' padding), so it's essential to check the documentation and specifications of the specific CNN implementation you are using.
1:08:22 now 1K gpus for several month is common for big giant cooperation
Great lecture. Thank you so much
Thank you so much!
29:50 how is output having the same dimension with input after 2 by 2 max pooling with stride 2 ?
halve in the sense of making it half of the value it used to be
Depthwise convolution...any references for this..?
14.14 time : why didn't you include Cin in calculating FLOPS in pool 1?
Each pooling operation is done on one input layer at a time, so it's kind of a 2D operation. Each pooling layer downsamples a corresponding input channel.
I believe there is a small error here: AlexNet has 96(not 64) filters in first layer (48+48 = 96).
But overall the lecture is awesome.
In the paper, it is 96. But I also find 64 in Pytorch alexnet model pytorch.org/docs/stable/_modules/torchvision/models/alexnet.html#alexnet