Deep L-Layer Neural Network (C1W4L01)

Interview Question in Deep Leaning: Why NOT initialize all the weights in a NN to the same value?

C4W2L02 Classic Network

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

หนูขอไปด้วย #แม่สุซูกัส #ตลก #shorts

Random Initialization (C1W3L11)

DeepLearningAI

มุมมอง 51 623

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 3 ก.พ. 2025

ความคิดเห็น • 28

@swfsql ปีที่แล้ว ⁺¹
I have noted that my models would not converge nicely (last assignment from C1W4, 3 ReLU + 1 sigmoid layers) when comparing to a notebook reference that I'm following.
If I just initialized my weights from a normal distribution, the cost would get stuck at a high value. I've tried scaling the weights, changing to a uniform distribution, changing the learning rate to various values, nothing worked.
Then following your code, I saw that if I divided the weights for each layer according to the sqrt of the number of features to that layer, then it would start converging beautifully. Would be interesting to know why!
Thanks for your lessons!
@arthurkalb1817 2 ปีที่แล้ว
It seems like the most general statement of the solution is that the coefficients must form full rank matrices.
@sakshipathak1855 3 ปีที่แล้ว
from where can we access the practice questions?
@RealMcDudu 5 ปีที่แล้ว
If you use tanh activation function you have an even bigger problem - the gradients will always be equal to zero, and no learning is feasible (not even a disabled - all weights go in the same direction - learning).
@sangwoohan1177 5 ปีที่แล้ว ⁺⁵
That random korean subtitle tho...
@jagadeeshkumarm3333 7 ปีที่แล้ว ⁺¹
what is the best choice for learning rate(alpha)...?
@byteofcnn3519 4 ปีที่แล้ว
@Amey Paranjape can the learning rate be learned?or is it meaningful to do so?
@acidtears 4 ปีที่แล้ว
@@byteofcnn3519 how would you learn what the perfect learning rate is? It makes sense to initialize it at 0.01 as that rate is similar to the pace of learning in humans (tiny changes over time).
@X_platform 7 ปีที่แล้ว ⁺⁵
Since we are using leaky ReLU for most cases now, should we initiate weights as extreme as possible so when back propagation take place, they will have higher chance to land in different local extremas?
@wolfisraging 7 ปีที่แล้ว
kiryu nil, what do u mean by 'as extreme'??
@X_platform 7 ปีที่แล้ว
Using tf.random_normal, to set high standard deviation*
@wolfisraging 7 ปีที่แล้ว ⁺¹
kiryu nil, well I think the best way to initialize the weights is by using xaveir initializer, from my observations, its the best way, i think that is why the default initializer in tensorflow is this
@dhirajupadhyay01 5 ปีที่แล้ว
@@wolfisraging But why? (if you could explain)
@wolfisraging 5 ปีที่แล้ว
@@dhirajupadhyay01 , In short, it helps signals reach deep into the network.
If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful.
If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.
Xavier initialization makes sure the weights are ‘just right’, keeping the signal in a reasonable range of values through many layers.
@danielchin1259 5 ปีที่แล้ว ⁺¹
**UNSTABLE EQUILIBRIUM**
@shubhamsaha7887 4 ปีที่แล้ว
If W = 0, B = 0, then A = 0. Similarly all vectors should be zero. Isn't it?
@anarbay24 4 ปีที่แล้ว ⁺¹
I also think all nodes should be equal to zero. Interestingly, though, Andrew never mentions that property.
@zql7351 4 ปีที่แล้ว
Yes. But the symmetry can also explain why we cannot initialize all weights to an nonzero same value.
@benjaminbraun9371 2 ปีที่แล้ว ⁺¹
I would say, that it depends on the chosen activation function
@jessicajiang3781 5 ปีที่แล้ว
can anyone explain why gradient descent study slow when the slope is 0 (flat)? arent we are trying to find the max and min in this function? thanks
@AllmohtarifeBlogspot 5 ปีที่แล้ว
As you can see here www.desmos.com/calculator/hzsiwhfmdw
When x is too large sigmoid(x) is ~flat. thus, the derivative~=0
and when we have a very small gradient/derivative we're going to make very small steps towards the minimum which means a slow learning
@saanvisharma2081 6 ปีที่แล้ว
Best activation function???
@ahmed_nyc 5 ปีที่แล้ว
ReLu
@shahbazquraishy143 5 ปีที่แล้ว
Could have been a shorter video....

ต่อไป

เล่นอัตโนมัติ

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Interview Question in Deep Leaning: Why NOT initialize all the weights in a NN to the same value?

Interview Question in Deep Leaning: Why NOT initialize all the weights in a NN to the same value?

C4W2L02 Classic Network

C4W2L02 Classic Network

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

หนูขอไปด้วย #แม่สุซูกัส #ตลก #shorts

หนูขอไปด้วย #แม่สุซูกัส #ตลก #shorts

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

What Are Neural Networks Even Doing? (Manifold Hypothesis)

What Are Neural Networks Even Doing? (Manifold Hypothesis)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Deep Learning(CS7015): Lec 9.4 Better initialization strategies

Deep Learning(CS7015): Lec 9.4 Better initialization strategies

Why Neural Networks can learn (almost) anything

Why Neural Networks can learn (almost) anything

Weight Initialization for Deep Feedforward Neural Networks

Weight Initialization for Deep Feedforward Neural Networks

Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)

Building a neural network FROM SCRATCH (no Tensorflow/Pytorch, just numpy & math)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

ทำผิดกฏหมาย 100 ข้อ ในวันเดียว!!

ทำผิดกฏหมาย 100 ข้อ ในวันเดียว!!

เนื้อเรื่องที่ท่านจะโมโหจนน้ำตาไหล | Mouthwashing

เนื้อเรื่องที่ท่านจะโมโหจนน้ำตาไหล | Mouthwashing

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

ใครขยับไม่ได้เป็น!!

ใครขยับไม่ได้เป็น!!

#WOWxดราม่าคอมเม้นแฟนบอลอาเซียน ตะลึง!! แห่ชื่นชมสปิริตทีมชาติไทย หลังเกมส์พลิกชนะสิงคโปร์ 4-2

#WOWxดราม่าคอมเม้นแฟนบอลอาเซียน ตะลึง!! แห่ชื่นชมสปิริตทีมชาติไทย หลังเกมส์พลิกชนะสิงคโปร์ 4-2

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร