Interesting. Building a CNN model always depended on my intuition from existing CNN models. I never questioned the significance of each scale-up method. The analysis by disentangling is very helpful to the community. Excellent.
I feel like the grid search to find alpha, beta and gamma was not elaborated on enough in the paper. Does anyone understand this more deeply, or how one could reproduce it?
maybe i am a bit stupid on this but there is a mistake at 4:43 and i checked in the paper too w should be equal to -0.07 instead of 0.07 because if assuming asper the video the ratio is 0.07 then say if the flops of the resultant model are half of the target flops then this will become ACC * (0.5)^0.07 ~ ACC * 0.95 which is less than 1 hence penalizing the model (since we need to maximize this, right ? ) which is wrong it should actually support such a model while if we keep w equals to -0.07 then objective fx bcomes Acc * (0.5)^-0.07 ~ Acc * 1.04 I was a bit confused in the begenning of the vide since i didnt read the paper first, but now i am quite certain of it . I am quite surprised no one else noticed it !!!
Also, I'm looking over the actual paper. The chart at 5:32 is a bit different from what I'm seeing. Everything's about the same but every BOLDED Top1 Acc. entry (recorded from their own architecture) has been boosted up a few percentage points to outshine their rival counterparts. I wonder if they updated the paper since you posted this video, or maybe they figure it best to fudge the numbers since this chart is located on the front page of the paper.
Really great video! One thing I don't understand though, is how the scaling works exactly. Are the network dimensions scaled while training and while keeping the weights from the smaller scale or is the entire network retrained from scratch on each scaling? Also, if I do transfer learning with a model pre-trained on efficientnet I could get the benefits of reducing the network size but wouldn't have to run through the same scaling process?
Hi! I think you got the resolution scaling wrong. They don't change the input dimensions (from say 224 to 360) but rather increase the number of convolution filters in every convolution, effectively increasing the number of feature maps of the low-level representation of the input at any given point in the model.
I have a question. For my custom dataset I have used effnet b0-b5 & the results were getting poor each time I am using more complex models. Which means b0 gave best outcome while b5 gave the worst.... image sizes were 2000x1500 ...what could be the reason for that?
I have a hard time reading papers because my English isn't very good, but you've been very helpful in explaining it in your videos. Thank you.
Interesting. Building a CNN model always depended on my intuition from existing CNN models. I never questioned the significance of each scale-up method. The analysis by disentangling is very helpful to the community. Excellent.
I know it is quite off topic but do anyone know of a good place to stream new movies online?
@@roymarley5178 fed
Clean, simple, and great explanation! Thanks
1:07 is the image right? b) and d) should change the figures? how is higher resolution resulting in deeper blocks?
Looking forward to EfficientNet-V2 paper!
Great explanations thank you!
I feel like the grid search to find alpha, beta and gamma was not elaborated on enough in the paper. Does anyone understand this more deeply, or how one could reproduce it?
Great intuitive explanation! Thank you!!
maybe i am a bit stupid on this but there is a mistake at 4:43 and i checked in the paper too w should be equal to -0.07 instead of 0.07 because if assuming asper the video the ratio is 0.07 then say if the flops of the resultant model are half of the target flops then this will become ACC * (0.5)^0.07 ~ ACC * 0.95 which is less than 1 hence penalizing the model (since we need to maximize this, right ? ) which is wrong it should actually support such a model while if we keep w equals to -0.07 then objective fx bcomes Acc * (0.5)^-0.07 ~ Acc * 1.04
I was a bit confused in the begenning of the vide since i didnt read the paper first, but now i am quite certain of it .
I am quite surprised no one else noticed it !!!
What a lovely summary thanks!
224*224 image resolution (r=1.0) --> 560*560 image resolution (r=2.5)
Well explained. Thank you!
great explanation! Thanks
At 2:37, shouldn't 2^n more computational resources imply a B^(2n) and a gamma^(2n) increase given the constraint A*B^2*gamma^2 = 2 ?
Also, I'm looking over the actual paper. The chart at 5:32 is a bit different from what I'm seeing. Everything's about the same but every BOLDED Top1 Acc. entry (recorded from their own architecture) has been boosted up a few percentage points to outshine their rival counterparts. I wonder if they updated the paper since you posted this video, or maybe they figure it best to fudge the numbers since this chart is located on the front page of the paper.
Great explanation!
Thank you!!
Very well explained. Thanks!
Thank you!
Thanks, great explanation.
Superb explanation!!!!
hi can u make a video that explained a efficientnet lite?
Thanks Henry!
Thank you!!
Really great video! One thing I don't understand though, is how the scaling works exactly. Are the network dimensions scaled while training and while keeping the weights from the smaller scale or is the entire network retrained from scratch on each scaling? Also, if I do transfer learning with a model pre-trained on efficientnet I could get the benefits of reducing the network size but wouldn't have to run through the same scaling process?
Thank you so much!!!
a question. by this equation you said : Alpha * (Beta^2) * (gamma^2) = 2 .
when I increase Alpha I should decrease two other variables?
Hi! I think you got the resolution scaling wrong. They don't change the input dimensions (from say 224 to 360) but rather increase the number of convolution filters in every convolution, effectively increasing the number of feature maps of the low-level representation of the input at any given point in the model.
I have a question. For my custom dataset I have used effnet b0-b5 & the results were getting poor each time I am using more complex models. Which means b0 gave best outcome while b5 gave the worst.... image sizes were 2000x1500 ...what could be the reason for that?
Did you find the reason for that?
Ur data scale depends!
Thanks a lot!
Thank you!
good explanation
thank you
MobileNetV2 and EfficientNet' video
Большое спасибо!