0:03

Mikael Laine

มุมมอง 76 024

2 400

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ก.พ. 2025
Easy explanation for how backpropagation is done. Topics covered:
gradient descent
exploding gradients
learning rate
backpropagation
cost functions
optimization steps

ความคิดเห็น • 120

@shanruan2524 2 ปีที่แล้ว ⁺⁷
The best backpropagation explainer on youtube we have in 2022
@chinmay6144 2 ปีที่แล้ว ⁺¹
I can't thank you enough. I gave so much money by taking loan for one course but did not understand it there. Thank you for your help.
@Orthodoxforever71 4 ปีที่แล้ว ⁺⁸
Hey! This is the best explanation I have ever seen in the internet .I was trying to understand these concepts watching videos, etc but without positive results. Now I understand how these networks function and their structure. I have forgotten my calculus and here you explain the chain rule in very simple words anyone can understand. Thank you for these great videos and God bless.
@originalandfunnyname8076 2 ปีที่แล้ว ⁺¹
amazing, I spend hours trying to understand this from different sources and now I think I finally understand, thank you!
@ekoprasetyo3999 3 ปีที่แล้ว
Struggling this subject for weeks, now i have better understanding after watching this video.
@alexandrefabretti1174 4 ปีที่แล้ว ⁺³
hello Mikael, finally somone who is able to explain complexity by simplicity. Thank you very much to reveal secrets hidden by most of videos
@allenjerjiss3163 4 ปีที่แล้ว ⁺²
you guys know that you can just turn up the volume right? Thank you Mike for breaking it down so clearly!
@qzwwzt 6 ปีที่แล้ว ⁺²²
Good Job! This is a tough subject and you tried, with success, simplify the explanation as much it was possible. I did the AndrewNG course at coursera, and his explanation even for me that had previous knowledge of maths involved was difficult to understand. Now I think you should implement in this algorithm in Python, for example.
@farenhite4329 4 ปีที่แล้ว ⁺¹
Knew what it was but never understood why. Thank you for this video!
@turkirob 5 ปีที่แล้ว ⁺¹
Absolutely the best explanation for the backpropagation thank you thank you thank you
@vunpac5 5 ปีที่แล้ว ⁺²
Hi Mike, I want to thank you for this great explanation. I was really struggling to grasp the concept. No one else went quite as far in depth.
@andrew-cb6lh ปีที่แล้ว
very well explained👍
@rohitd7834 4 ปีที่แล้ว ⁺³
I was trying to understand this for a long! You made my day.
@denisvoronov6571 4 ปีที่แล้ว ⁺¹
That's the best explanation I have seen. Thanks a lot!
@gillesgardy8957 4 ปีที่แล้ว ⁺²
Thank you so much Mikael. Extremely clear. A good fundation before going further !
@obsidianhead ปีที่แล้ว
Thank you, sir. Helped a smooth brain understand.
@nemuccio1 4 ปีที่แล้ว ⁺¹
Great! Finally you understand something.
Without a hidden layer it is a bit difficult to understand how to apply bckpropagation. But the thing that doesn't explain any tutorial is this and you would be the right person to teach us. I use keras but also python would be good; "How to create your own classification or regression dataset". Thank you.
@mikaellaine9490 4 ปีที่แล้ว ⁺¹
Thank you for your comment! At the end of the video the generalized case is briefly explained. If you follow the math exactly as in the single-weight case, you will see it works out. If I find time, I may make a video about that, but it might be a bit redundant.
@klyntonh7168 2 ปีที่แล้ว
Thank you so much! Best explanation I’ve seen ever.
@xyzaex 3 ปีที่แล้ว
Simply outstanding , clear and concise explanation. I wonder how people with no calculus background learn deep learning?
@jeremygarrard6311 หลายเดือนก่อน
Amazing explanation!! Any chance you can add in a bias and show how that works too?
@trevortyne534 2 ปีที่แล้ว
Excellent
@JoeBurnett 5 หลายเดือนก่อน
Fantastic video! I wish you were still making videos on the subject of AI with this teaching method.
@brendawilliams8062 13 วันที่ผ่านมา
It is an excellent visual aid.
@AlejandroFernandezDaCosta หลายเดือนก่อน
very good explanation
@prof.meenav1550 2 ปีที่แล้ว
good effort
@flavialan4544 3 ปีที่แล้ว
It is one of the best explanation on this subject! Thanks so much!
@faisalriazbhatti 4 ปีที่แล้ว
Thanks Mikael, simplest explanation. You made my day mate.
@mrinky8129 หลายเดือนก่อน
amazing explanation
@JoeWong81 5 ปีที่แล้ว ⁺¹
great explanation Mikael thanks a lot
@imed6240 3 ปีที่แล้ว
wow, so far the best explanation I found. So simple, thanks a lot !
@muhammeddal9661 6 ปีที่แล้ว ⁺¹
Great job Mikael, you explained it very clear.
Thank you
@murat2073 2 ปีที่แล้ว ⁺¹
thanks man. You are a hero!
@cvsnreddy1700 5 ปีที่แล้ว
Extremely good and easy explanation
@ahmidiedu7112 2 ปีที่แล้ว ⁺¹
Good Job! …. Thanks
@joelmun2780 2 ปีที่แล้ว
totally underrated video. love it.
@jarrodhaas 3 ปีที่แล้ว
good stuff! a clear, simple starting case to build on.
@talhanaeemrao4305 ปีที่แล้ว
There are some videos which you wish that it never end. This video in among top of these.
@hasanabdlghani5244 4 ปีที่แล้ว ⁺¹
Its not easy!! You made it easy. Thanks alot
@danikhan21 4 ปีที่แล้ว
Good stuff. Thanks for contributing
@gulamm1 8 หลายเดือนก่อน
The best explanation.
@wilfredomartel7781 6 หลายเดือนก่อน
great video!
@TheStrelok7 3 ปีที่แล้ว
Thank you very much best explanation ever!
@raaziyahshamim4761 ปีที่แล้ว
What software did you use to write the stuff.. good lecture
@atlantaguitar9689 2 ปีที่แล้ว
At 7:53 what are the values for a and y that have the parabola experiencing a minimum around 0.3334 when for a desired y value of 0.5 the value of "a" would have to be 0.5? That is, the min for the cost function occurs when a is 0.5 so why in the graph has the min for it been relocated to 0.3334 ?
@thechen6985 6 ปีที่แล้ว
Thank you very much. This helped alot. I now understand the lecture given to me
@АртурЗарипов-б2й 11 หลายเดือนก่อน
Good job! Thank you very much!
@ingoampt 7 หลายเดือนก่อน
What about when we have like activation function like relu. Or etc ?
@datasow9493 6 ปีที่แล้ว ⁺¹
thank you, it really helped me to understand the principle behind the backpropagation. In the future i would like to see how to implement it with layers that have 2 or more neurons. How to calculate the error for each neuron in that case, to be precise
@dabdas100 4 ปีที่แล้ว ⁺¹
Finally i understand this! Thanks
@AleksanderFimreite 4 ปีที่แล้ว ⁺³
I understand the logic and the thoughts behind this concept. Unfortunately I just can't wrap my head around how to calculate it with these kinds of formulas.
But if I saw a code example I would understand it without an issue. I don't know why my brain works like that. But mathematical formulas are mostly useless to me =(
@cachaceirosdohawai3070 10 หลายเดือนก่อน
Any help dealing with multi-neuron layers?, the formulas in 11:19 look different for multi-neuron layers
@mikaellaine9490 10 หลายเดือนก่อน
Check my channel for another example with multiple layers.
@stuartallen2001 5 ปีที่แล้ว ⁺²
Thank you for this video it really helped me!
@georgeruellan 4 ปีที่แล้ว ⁺³
Amazing explanation but the audio is painful to listen to
@_FLOROID_ 4 ปีที่แล้ว
What changes in the equation if I have more than just 1 Neuron per Layer though? Especially since they are cross-connected via more weights, I don't know exactly how to deal with this.
@jancsi-vera 9 หลายเดือนก่อน
Wow, thank you
@petermpeters 6 ปีที่แล้ว ⁺²⁶
something happened to the sound at 8:21
@jayanttanwar4703 5 ปีที่แล้ว ⁺⁴
You got that right Peter Peters Peterss
@garychap8384 5 ปีที่แล้ว ⁺²
Don't you hate it when the lecturer goes outside for a cigarette in the middle of a lecture... but continues teaching through the window.
Yes, we get it... your powerpoint remote works through glass! But WE CAN'T HEAR YOU! XD
@mikaellaine9490 5 ปีที่แล้ว ⁺²
Yes, sorry about that!
@BrandonSLockey 4 ปีที่แล้ว
@@garychap8384 LMFAO
@goksuceylan8844 3 ปีที่แล้ว ⁺¹
Peter Peters Peterss Petersss
@safiasafia9950 5 ปีที่แล้ว
Thanks sir it is very good explanation
@ksrajavel 4 ปีที่แล้ว
Cool. Thanks Mikael!!!
@mehedeehassan208 2 ปีที่แล้ว
How do we determine which way to go? I mean ,the direction of change in weight .If we are in the left side of concave curve?
@3r1kz 9 หลายเดือนก่อน
I don't know anything about this subject but I was understanding it until the rate of change function. Probably a stupid question but why is there a 2 in the rate of change function, as in 2(a-y). Is this 2 * (1.2 - 05)? Why the 2? I can't really see the reference to the y = x^2 but that's probably just me not understanding the basics. Maybe somebody can explain for a dummy like me.
Wait maybe I understand my mistake, the result should be 0.4 right? So its actually 2(a-1) because otherwise multiplication goes first and you end up with 1.4?
@joemurray1 8 หลายเดือนก่อน
The derivative of x^2 (x squared) is 2x. The cost function C is the square of the difference between actual and desired output i.e. (a-y)^2. Its derivative (slope) with respect to a is 2(a-y).
We don't use the actual cost to make the adjustment, but the slope of the cost. That always points 'downhill' to zero cost.
@zeta4542 9 วันที่ผ่านมา
I am lost in 6:53 =2(a-y) how= isnt it a² -2ay +y² ....dont see it
@MukeshArethia 5 ปีที่แล้ว
very nice explanation!
@newcoder7166 6 ปีที่แล้ว ⁺¹
Excellent job! Thank you!
@onesun3023 4 ปีที่แล้ว ⁺¹
Where does the '-1' come from? It looks like it is in the position of y but y is 0.5. Not -1.
@onesun3023 4 ปีที่แล้ว
Oh. I see. the 2 was distributed to it but not to a.
@cdxer 6 ปีที่แล้ว ⁺¹
do you move back a layer after gettiing w_1 = 0.59? or after getting w_1 = 0.333
@FPChris 2 ปีที่แล้ว
No one ever says when multiple layers and multiple outputs exist when the weights get adjusted do you do numerous forward passes after each individual weight is adjusted? Or do you update ALL the weights THEN do a single new forward pass.
@mikaellaine9490 2 ปีที่แล้ว ⁺¹
Yeah, single forward pass (during which gradients get stored, see my other videos) followed by a single backpropagation pass through the entire network, updating all weights by a bit.
@FPChris 2 ปีที่แล้ว
@@mikaellaine9490 Thanks. Much appreciated.
@puppergump4117 2 ปีที่แล้ว
If I had different amounts of neurons per layer, then would the formula at 11:30 be changed to (average of the activations of the last layer) * (average of the weights of the next layer) ... * (average cost of all outputs)?
@TheRainHarvester 2 ปีที่แล้ว
From what i read, yes. But distributing the error canbe varied too.
@RagibShahariar 4 ปีที่แล้ว
Thank you Mikael for this concise lecture. Can you share a lecture with the cost function of logistic regression implemented in Neural Network?
@bubblesgrappling736 4 ปีที่แล้ว
nice video, im a little confused with which letters for whitch values
- a = value from activation function / or just simply output from a ny given neuron?
- C = loss/error gradient
and which of these values qualify as the gradient?
@mikaellaine9490 4 ปีที่แล้ว ⁺¹
a=activation (with or without activation function)
C=loss/error/cost (these are all the same thing, the naming varies between textbooks and frameworks)
WRT gradients: this is a 1-dimensional case for educational/amusement purposes. In actual networks, you would have more weights, therefore more dimensions and you would use the term 'gradient' or 'jacobian', depending on how you implement it etc.
I have an example with two dimensions here: th-cam.com/video/Bdrm-bOC5Ek/w-d-xo.html
@oposicionine4074 ปีที่แล้ว
There is one thing I dont understand.
Suposse you have two inputs, for the first input the perfect value is w1=0.33
But for the secon input, the perfect value would be w1 = 0.67.
How would you compute the backpropagation to get the perfect value to minimize the cost function?
@Urban83 ปีที่แล้ว
Run multiple experiments with different inputs and measure the outcome: if the outcome is perfect, there is no learning. How would you answer the question?
@semtex6412 ปีที่แล้ว
on 2:40, Mikael mentioned "...and the error therefore, is 0.5" i think he meant "and the *desired output*, therefore is 0.5"? slight erratum perhaps?
@semtex6412 ปีที่แล้ว
because otherwise, the cost (C) is 0.49, not 0.5
@chrischoir3594 4 ปีที่แล้ว ⁺¹
"as per usual? um what is usaul?
@maravilhasdobrasil4498 2 ปีที่แล้ว
Maybe this is a dumb question, but how you go from 2(a-y) to 2a-1? (7:17)
@maravilhasdobrasil4498 2 ปีที่แล้ว ⁺¹
oh, i got it. 0.5 is the desired input * 2 = 1
@knowhowww 5 ปีที่แล้ว ⁺¹
simple and neat! thanks!
@dennistsai5348 5 ปีที่แล้ว ⁺¹
Would you please talk about the situation with activate function(sigmoid)?
It's a little bit confusing for me..
thanks a lots!
@mikaellaine9490 4 ปีที่แล้ว
There is now a video about this: th-cam.com/video/CoPl2xn2nmk/w-d-xo.html
@hikmetdemir1032 ปีที่แล้ว
what if the number of neurons in the layer is more than one
@dilbreenibrahim4128 4 ปีที่แล้ว ⁺¹
Please how can I update bias please some one answers me?
@xc5838 4 ปีที่แล้ว
Can you please tell me how did you graph that cost function? I plotted this cost function in my calculator and I am getting a different polynomial. I graphed ((x*0.8)-0.5)**2 thanks.
@mikaellaine9490 4 ปีที่แล้ว
Hi and thank you for your question. I've used Apple's Grapher for all the plots. It should look like in the video. Your expression ((x*0.8)-0.5)**2 is correct.
@rafaelramos6320 ปีที่แล้ว
Hi,
a = i * w
1.5. * 2(a -y) = 4.5 * w - 1.5
What happened to the y?
@LaurentPrat 11 หลายเดือนก่อน
y is given = the target value, here = 0.5. => 1.5*2(1.2-0.5) = 2.1 which equal to 4.5*0.8-1.5
@bubblesgrappling736 4 ปีที่แล้ว
also, im not really able to find anywhere what delta signifies here, only stuff on the delta rule
@vijayyarabolu9067 6 ปีที่แล้ว
Thanks Laine.
@jameshopkins3541 4 ปีที่แล้ว
what about code?
@scottk5083 5 ปีที่แล้ว ⁺¹
Thank you!
@coxixx 5 ปีที่แล้ว ⁺³
it wasn't for dummies.it was for scientist.
@mukonazwotheramafamba3549 5 ปีที่แล้ว
he is the dummy
@snippletrap 5 ปีที่แล้ว
It's for anyone who has taken calculus.
@hakankosebas2085 3 ปีที่แล้ว
what about 2d input
@mikaellaine9490 3 ปีที่แล้ว
There is a video for 2d input: th-cam.com/video/Bdrm-bOC5Ek/w-d-xo.html
@vudathabhavishya9629 ปีที่แล้ว
Can anyone explain how to plot for 2(a-y),c=(a-y)2. i=1.5
@a3103-j7g ปีที่แล้ว
it was moreless comprehensible until the "mirrored 6" character appeared with no explanation of what it was, how it was called and why it was there. so let's move on to another video on backpropgagation...
@theyonly7493 4 ปีที่แล้ว
If all I want is:
a = 0.5
with:
a = i · w
then:
w = a / i = 0.3333
One simple division, no differential calculus, no gradient descent :-)
@mikaellaine9490 4 ปีที่แล้ว
Brilliant. Now generalize that to any sized layer and any number of layers. I suppose you won't need bias units at all. You have just solved deep learning. Profit.
@SureshBabu-tb7vh 5 ปีที่แล้ว
Thank you
@shameelfaraz 4 ปีที่แล้ว ⁺¹
Suddenly, I feel depressed... around 8:20
@bettercalldelta 2 ปีที่แล้ว
4:36 me watching this in 8th grade: bruh
@NavyCuda 3 ปีที่แล้ว
During editing do you not notice how much you lip smack? Makes it so hard to listen to. Otherwise, thank you, the content is helpful.
@vast634 4 ปีที่แล้ว
7:04: 2(1.5 * w) -1 = 2(1.5 * 0.8) -1 = 1.4 not 1.5
@thamburus7332 10 หลายเดือนก่อน
@dmitrikochubei3569 4 ปีที่แล้ว
Thank you !

ต่อไป

เล่นอัตโนมัติ

The Most Important Algorithm in Machine Learning