Watch the 3blue1brown one first and then come back to this one. If I hadn't done that I would have no idea what I just saw. Siraj is a great teacher for people who already know what they are doing.
Wholly crap this is amazing. I've been taking a course where the instructor just teaches us random crap and I had no idea how it all connected till I watched this video! Thank you, thank you
I think that backpropagation is best explained as a computation graph instead of neural network figures. That way you understand how the computers really compute these stuff and it generalizes to how you think about deep dream or style transfer. And it's also easier to understand. With neural net figures, it's hard to visualize the process.
A derivative is a slope of tangent line.... Man you explained something I never really understood in 10 years of academic life. I came here to learn backpropagation but ended up learning something which I couldn't in my academic life. Now I can die peacefully.
i finally made something that works correctly!! god damn it its 2AM and i was banging my head around this for the last 12 hours. I made my first basic network that can solve XOR thanks to your example of back propagation! Thank you so much for this video !!
Siraj, I am your fan. You really represent well. Just give credit if you get this work from someone else. You deserve credit for your awesome presentation.
What an awesome video! As someone entering college to be a data scientist who already has somewhat understanding of calc, stats, and compsci, this video perfectly connected all three!
That is the FIRST TIME I have ever followed this to feel some inkling of understanding it. Thanks!! Also, to compensate for my lack of understanding, for many years I liked a genetic algorithm for training weights instead. But now I think that might still be the better approach.
Great video! There's so much variance in explanations for back propagation that it can be a bit difficult to grasp at first. Most of the time, you get people writing a blog on BP who aren't that well versed in deep learning in terms of the math.
This is a really great video. I love how the theory is explained alongside a simple example. The best part is that the code is actually runnable!! Kudos!!
I stand corrected...this was a fucking great video. Doing some late night reading and this couldn't have popped up at a better time... really, you rock
Man, this is awesome! I was going through the Andrew's Course and kind of felt a bit lost while implementing the back propagation. This helped in clearing the sky. Looking forward to Deep Learning Nano Degree.
Siraj this is terrific! Deep learning has become so easy to implement due to libraries like tensorflow or pytorch that I often forget what's really happening under the hood when I make such networks. This video was perfect for refreshing my knowledge and tying the dots together again. Thanks!
You should do one of those videos where you teach it in 5 stages : first to a child, then teenager, then college student, then grad student, then expert. There is a lot of foundational stuff that would be interesting to some and the expert stuff could come out too. This is the perfect subject for that kind of format.
Back to basics, excellent! At some point, I would LOVE it if you could do a video on how to build a network using LSTMs at a low level, like with Numpy and stuff. I use them all the time, but I can't wrap my head around they do the unrolling trick or compute the gradient. Thanks!
thank you for talking slower in this video, it is very helpful. and although i enjoy the memes that usually randomly popup, they can become very distracting while trying to understand the complex subjects discussed in your videos
This is a good revision type video for those who want to brush up their concepts. However its not wise to consider this as a learning video , best sequence would be to first watch 3Blue1Browns video(or any other back propogation video that explains the concept in length).Then come to Siraj's video to get a summary
wow... at first I thought what the hell did I click? Who is this crazy guy? But you are actually very good. That was excellent explanation. Thank you sir....:) lolz ...
Siraj, off topic question, would you be interested in doing a series on the Google A.I. Experiments, specifically the Infinite Drum Machine? We would need to setup AudioNotebooks get some sounds in .wav form, which there are many of, and SamplesToFingerprints and to t-SNE to map but for me it would be allot of fun to play with sounds and very practical. I think this would yield the most learning about deep neural networks because sounds are messy so loads of tinkering would be required and trial and error is how people learn. Are you interested or is this more of an intermediate than an introduction? Would you be interested in presenting this to those that are interested? My guess is this sound project would be a blast to work through. Imagine how jazzed people would be to be able to go out and record snips of sounds and run them through the neural network to see if the neural network can guess the sound or get it close to something similar with some degree of accuracy. Back propagate to update the net and also attempt to alter the structure of the net to improve accuracy. I think people would love it and that would get people interested in mucking around a lot.
@4:40 There is a bit of glossing over detail on part of the subject I see a number of confused people posting on Stack Overflow or Datascience Stack Exchange. Namely that you don't backpropagate the *error* value per se, but the gradient of the error with respect to a current parameter. This is made more confusing to many software devs implementing back propagation because usual design of neural nets is to cleverly combine the loss function and the output layer transform, so that the derivative is numerically equal to the error (specifically only at the pre-transform stage of the output layer). It really matters to understand the difference though because in the general case it is not true, and there are developers "cargo culting" in apparently magic manipulations of the error because they don't understand this small difference.
Again an Awesome Video, You should also do a video on Clac 2 and 3, and Multivariable Calculus, I've already been through Tons of Definite and Indefinite Integrals, Maxima and Minima, Limits and Continuity in Senior High School but it still seems too vast for me, a concise video would really help me out understanding them.
Hi Siraj, at 0:38 , shouldn't there be 3 Inputs? one for each of the features including the bias? If I understand correctly, number of input layer neurons is equal to number of features in our data. Am i correct?
Hey Aneesh, I was wondering the same. I do think this is an error. The number of neurons in the input layer need to be the features in the training data set. The fourth neuron in the input layer would probably be a bias unit.
Hey Siraj. I've asked on your live stream if you were going to to implement a LSTM from scratch. What I meant was that if you were going to implement it without using any additional libraries in python besides the most common ones. So, no Tensorflow or Keras.
you have earned a subscriber. I am watching Andrew NG but I can't seem to get proper intuition. Your video helped me a bit to understand. Thank you for that
Hi Saraj, I just discovered how much I love ML and I love your videos, I watch at least one every day! Without sounding rude I wanted to ask you, did you learn all of this by yourself? Or did you study this topic at University? I ask because your knowledge and teaching skills are awesome and I would love to understand the topic this good :)
So you try to bring the slope to 0 in an effort to hit the local minimum with the lowest value of f(x). How do you know that you don't end up in one, where another is way lower, but you kind of roll down the slope and can't get up again?
in case you want to optimize on more complex surfaces (which is almost always the case in real life) you can avoid getting stuck in a local minima by using gradient descent optimization algorithms like "momentum"
I'm not an expert, but consider looking up "non-convex loss functions". They are exactly the problem you are adressing. Most loss functions are convex though, which means they have only one global minimum. Being trapped in a local minimum can't happen in these cases.
Olf Mombach The mathematical definition of convexity is a bit more complicated, but quadratic polynomials are indeed convex. Gradient descent isn't a function though. It's a method used to minimize the loss function.
Great video! Small mistake in 3:30 - you said 'the derivative of f(g(x)) is equal to the derivative of f(x) times the derivative of g(x)' where you meant to say what is actually written in the slide - (f(g(x))' = f'(g(x))*g'(x) and not f'(x)*g'(x)
I think, at the beginning the way you have shown input arrays or matrix is confusing, We convert the matrix of pixels of input image to a single column then feed each number to each node in input layer. By this what i mean is at a time we input one image and one number to one node.
So because I've never been so hot on calculus, just to clarify: the delta of a particular neuron (from which we get the adjustment for the weight of each input from the previous layer by multiplying this delta by each input value, right?) is found by multiplying the sum of the weighted deltas of the next layer by the gradient of this layer (which in the case of the sigmoid function would be x*(x-1) where x is the output of the current neuron). Is this right? I'm building a simple neural network in Max/MSP and trying to wrap my head around how multiple layers work.
Hey man, really love your video. Just a simple question: in the end of this video you said you're going to calculate the derivative of life. Just wondering have you done that yet ? :)
In the derivation function the (time approx. 2:53) the slope is indeed 4 but the graph is incorrect (wrongly plotted black line representing the slope)
I'm a bit lost at the dot product example at 1:21 why are we multiplying a row of inputs on the same input node by a column of different weights? Wouldnt the value of each node in the next row be based on a column of inputs (the curent value of each node) * the weight of that each nodes connection to the next one? How many output nodes are there in this equation?
I have a question? The weights are common for all data sets i think. When they are adjusted for the first dataset, they will be readjusted for the next data set if there is. Doesn't it forget its previous adjustment?
For the final Backpropagation equation, shouldn't the current weights subtract, not add, the deltaWeights * learningRate? Since we want to move down the gradient instead of up.
HI Siraj, awesome videos, I got a question, in you funcion: " def nonlin(x,deriv=False): if(deriv==True): return x*(1-x) return 1/(1+np.exp(-x))" The derivative should be= -e^x / (e^x +1)^2 ??
With regards to deriving the meaning of life, if the answer to all questions is 42, what would the weights of each input neuron be so that every question we ask the network results in 42?
URGENT HELP NEEDED!!!!! function [pred,t1,t2,t3,a1,a2,a3,b1,b2,b3] = grDnn(X,y,fX,f2,f3,K) %neural network with 2 hidden layers %t1,t2,t3 are thetas for every layer and b1,b2,b3 are biases n = size(X,1); Delta1 = zeros(fX,f2); Db1 = zeros(1,f2); Delta2 = zeros(f2,f3); Db2 = zeros(1,f3); Delta3 = zeros(f3,K); Db3 = zeros(1,K); t1 = rand(fX,f2)*(2*.01) - .01; t2 = rand(f2,f3)*(2*.01) - .01; t3 = rand(f3,K)*(2*.01) - .01; pred = zeros(n,K); b1 = ones(1,f2); b2 = ones(1,f3); b3 = ones(1,K); %Forward Propagation wb = waitbar(0,'Iterating...'); for o = 1:2 for i = 1:n waitbar(i/n); a1 = X(i,:); z2 = a1*t1 + b1; a2 = (1 + exp(-z2)).^(-1); z3 = a2*t2 + b2; a3 = (1 + exp(-z3)).^(-1); z4 = a3*t3 + b3; pred(i,:) = (1 + exp(-z4)).^(-1); %Backward Propagation d4 = (pred(i,:) - y(i,:)); d3 = ((d4)*(t3')).*(a3.*(1-a3)); d2 = ((d3)*(t2')).*(a2.*(1-a2)); Delta1 = Delta1 + (a1')*d2; Db1 = Db1 + d2; Delta2 = Delta2 + (a2')*d3; Db2 = Db2 + d3; Delta3 = Delta3 + (a3')*d4; Db3 = Db3 + d4; for l = 1:100 t1 = t1*(.999) - .001*(Delta1/n); b1 = b1*(.999) - .001*(Db1/n); t2 = t2*(.999) - .001*(Delta2/n); b2 = b2*(.999) - .001*(Db2/n); t3 = t3*(.999) - .001*(Delta3/n); b3 = b3*(.999) - .001*(Db3/n); end end end delete(wb); end %I can't seem to understand the fault, is it the matrix multiplication, because the code does run successfully but when I test the t1,t2,t3 with some testing examples, the prediction for all examples are exactly same and are equal to the predicted vector for the last trained example. % please help, i am stuck here for over a month now, thanks!!
Watch the 3blue1brown one first and then come back to this one. If I hadn't done that I would have no idea what I just saw. Siraj is a great teacher for people who already know what they are doing.
These kind of videos are superb for refreshing memory, when you already know the stuff but forgot it! Thanks!
I want an explanation in 2 hours, to actually understand it.
Gotta wait for the next part, though.
+EBR Read it. Watched it again. And it still was 5 minutes long. Your solution didn't work! xD
hahaha++
Yeah, totally, his videos are on another level. Plus explaining a concept like the gradient of a function shouldn't be done in 5 minutes imo.
other channels "explain" this in 2 hours but fails to deliver, Siraj did a great job here
the best video to get an overview on backpropagation algorithm
Short video with huge information + How it's working = Perfect Video
Wholly crap this is amazing. I've been taking a course where the instructor just teaches us random crap and I had no idea how it all connected till I watched this video! Thank you, thank you
wow, that was fantastic! Didn't expect such a short video about this topic to be this clear and understandable.
I think that backpropagation is best explained as a computation graph instead of neural network figures. That way you understand how the computers really compute these stuff and it generalizes to how you think about deep dream or style transfer.
And it's also easier to understand. With neural net figures, it's hard to visualize the process.
A derivative is a slope of tangent line.... Man you explained something I never really understood in 10 years of academic life. I came here to learn backpropagation but ended up learning something which I couldn't in my academic life. Now I can die peacefully.
i finally made something that works correctly!! god damn it its 2AM and i was banging my head around this for the last 12 hours. I made my first basic network that can solve XOR thanks to your example of back propagation!
Thank you so much for this video !!
just the amount of time that i needed for my presentation, thank you
Great work Siraj. The wacky profile makes it easier to learn, I don't know how. Thanks for making this subject easier.
thanks Ahmad
Siraj Raval dude check fb
Siraj, I am your fan. You really represent well. Just give credit if you get this work from someone else. You deserve credit for your awesome presentation.
best video I've found on back-propagation. Thank you so much
I am the 124.000 subscriber and you convinced me with 2 videos. Looking forward for more interesting and very nicely made videos from you.
Oh my GOD! This is a great resume of many AI teory!!
What an awesome video! As someone entering college to be a data scientist who already has somewhat understanding of calc, stats, and compsci, this video perfectly connected all three!
Thanks Suresh! Finally a clear explanation on Backpropagation!!!
Keep Going!!!!!
Siraj* thanks!
Hands down your best video so far! Keep up the good work!
thanks!
That is the FIRST TIME I have ever followed this to feel some inkling of understanding it. Thanks!! Also, to compensate for my lack of understanding, for many years I liked a genetic algorithm for training weights instead. But now I think that might still be the better approach.
Wow, thank you for this Birdseye view of this complicated subject.
This an awesome explanation of backprop, and I majored in the liberal arts. Thanks, Siraj.
When you have a data mining test tomorrow and Siraj drops this!
aditya verma haha pursuing from?
VIT Vellore... nuf said
Great video! There's so much variance in explanations for back propagation that it can be a bit difficult to grasp at first. Most of the time, you get people writing a blog on BP who aren't that well versed in deep learning in terms of the math.
good explanation. I like how you broke down the calculus bits
This is a really great video. I love how the theory is explained alongside a simple example. The best part is that the code is actually runnable!! Kudos!!
I stand corrected...this was a fucking great video. Doing some late night reading and this couldn't have popped up at a better time... really, you rock
thanks George, more to come~
Man, this is awesome! I was going through the Andrew's Course and kind of felt a bit lost while implementing the back propagation. This helped in clearing the sky. Looking forward to Deep Learning Nano Degree.
Siraj this is terrific! Deep learning has become so easy to implement due to libraries like tensorflow or pytorch that I often forget what's really happening under the hood when I make such networks. This video was perfect for refreshing my knowledge and tying the dots together again. Thanks!
You should do one of those videos where you teach it in 5 stages : first to a child, then teenager, then college student, then grad student, then expert. There is a lot of foundational stuff that would be interesting to some and the expert stuff could come out too. This is the perfect subject for that kind of format.
I actually grasped this! *Thank you!* Now on to understanding...
Nice, covered everything I’m learning about in my intro to Neural Networks class.
You should do more videos like this. I was actually able to keep up with everything first time around without reducing the speed to 0.5. Great work!
will do thanks Kyle
Back to basics, excellent!
At some point, I would LOVE it if you could do a video on how to build a network using LSTMs at a low level, like with Numpy and stuff. I use them all the time, but I can't wrap my head around they do the unrolling trick or compute the gradient. Thanks!
definitely. great idea thanks
thank you for talking slower in this video, it is very helpful. and although i enjoy the memes that usually randomly popup, they can become very distracting while trying to understand the complex subjects discussed in your videos
It was an excellent video Siraj. Simple e objective!
thx Lucas!
Hey that was super- fast and duper- great !
4:23 - best sketch ever.
This is a good revision type video for those who want to brush up their concepts. However its not wise to consider this as a learning video , best sequence would be to first watch 3Blue1Browns video(or any other back propogation video that explains the concept in length).Then come to Siraj's video to get a summary
Awesome explanation !
This is the only youtube video I actually slowed down the speed.
Absolute fire video please make more
wow, amazing, i think I understand it now
nice explanation bro. everything is clear now
awesome explanation!
your channel is very good
thank you for back-propagation video
very good explanation
Very nice your explanation, thanks a lot!
thx Cesar
wow... at first I thought what the hell did I click? Who is this crazy guy? But you are actually very good. That was excellent explanation. Thank you sir....:) lolz ...
Thanks that was very clear!
Thank you for your great effort ... luv all your videos.... thanks for the effort
Great concise presentation! I appreciate seeing the code as well. Thanks for the upload! 👍
great video! loved it!!
thanks Mick!
Great video :) full of energy!
This video is what you watch after already understanding back propagation. That kind of goes for pretty much most of Siraj's videos lol
Great explanation
thanks!
that's very clear, thank you
np
Awesome !! Thanks :)
these videos are awesome! thank you
Siraj, off topic question, would you be interested in doing a series on the Google A.I. Experiments, specifically the Infinite Drum Machine? We would need to setup AudioNotebooks get some sounds in .wav form, which there are many of, and SamplesToFingerprints and to t-SNE to map but for me it would be allot of fun to play with sounds and very practical. I think this would yield the most learning about deep neural networks because sounds are messy so loads of tinkering would be required and trial and error is how people learn. Are you interested or is this more of an intermediate than an introduction? Would you be interested in presenting this to those that are interested? My guess is this sound project would be a blast to work through. Imagine how jazzed people would be to be able to go out and record snips of sounds and run them through the neural network to see if the neural network can guess the sound or get it close to something similar with some degree of accuracy. Back propagate to update the net and also attempt to alter the structure of the net to improve accuracy. I think people would love it and that would get people interested in mucking around a lot.
Great video
@4:40 There is a bit of glossing over detail on part of the subject I see a number of confused people posting on Stack Overflow or Datascience Stack Exchange. Namely that you don't backpropagate the *error* value per se, but the gradient of the error with respect to a current parameter. This is made more confusing to many software devs implementing back propagation because usual design of neural nets is to cleverly combine the loss function and the output layer transform, so that the derivative is numerically equal to the error (specifically only at the pre-transform stage of the output layer). It really matters to understand the difference though because in the general case it is not true, and there are developers "cargo culting" in apparently magic manipulations of the error because they don't understand this small difference.
Again an Awesome Video, You should also do a video on Clac 2 and 3, and Multivariable Calculus, I've already been through Tons of Definite and Indefinite Integrals, Maxima and Minima, Limits and Continuity in Senior High School but it still seems too vast for me, a concise video would really help me out understanding them.
thanks Akash, great points
Hi Siraj,
at 0:38 , shouldn't there be 3 Inputs? one for each of the features including the bias?
If I understand correctly, number of input layer neurons is equal to number of features in our data.
Am i correct?
Hey Aneesh, I was wondering the same. I do think this is an error. The number of neurons in the input layer need to be the features in the training data set. The fourth neuron in the input layer would probably be a bias unit.
Hi.
He has already made the 3rd feature in all the datapoints 1 to act as the bais.
Also the diagram shows one datapoint going to one neuron.
you're right. There should be 3 input in image
its just an example image but yeah there should be 3
Hey Siraj. I've asked on your live stream if you were going to to implement a LSTM from scratch. What I meant was that if you were going to implement it without using any additional libraries in python besides the most common ones. So, no Tensorflow or Keras.
great video!!!
Really helpful video btw just didn't understand the error part
you have earned a subscriber. I am watching Andrew NG but I can't seem to get proper intuition. Your video helped me a bit to understand. Thank you for that
This is awesome:)
thanks so much!
You are my hero. But gradient descent is still nightmare to me 😂
Nice video Siraj it helped me,but I felt it was speed enough..
If you're talented and you know it, clap your hands.
Hi Siraj, Can we say that narrow minded people are that way because they are overfit ?
indeed. they need more dropout. drugs can help
Hi Saraj, I just discovered how much I love ML and I love your videos, I watch at least one every day! Without sounding rude I wanted to ask you, did you learn all of this by yourself? Or did you study this topic at University? I ask because your knowledge and teaching skills are awesome and I would love to understand the topic this good :)
thanks Gabriel! lots of online studying from sources like r/machinelearning and twitter
So you try to bring the slope to 0 in an effort to hit the local minimum with the lowest value of f(x). How do you know that you don't end up in one, where another is way lower, but you kind of roll down the slope and can't get up again?
You don't. That is one problem with this type of NN. However, if your program is simple enough, then this will work fine.
in case you want to optimize on more complex surfaces (which is almost always the case in real life) you can avoid getting stuck in a local minima by using gradient descent optimization algorithms like "momentum"
I'm not an expert, but consider looking up "non-convex loss functions". They are exactly the problem you are adressing. Most loss functions are convex though, which means they have only one global minimum. Being trapped in a local minimum can't happen in these cases.
Silas Alberti
So convex means the grade of the polynome is less than 3?
And just to make sure, Gradient Descent is a loss function?
Olf Mombach The mathematical definition of convexity is a bit more complicated, but quadratic polynomials are indeed convex.
Gradient descent isn't a function though. It's a method used to minimize the loss function.
Great video!
Small mistake in 3:30 - you said 'the derivative of f(g(x)) is equal to the derivative of f(x) times the derivative of g(x)' where you meant to say what is actually written in the slide - (f(g(x))' = f'(g(x))*g'(x) and not f'(x)*g'(x)
You get my undivided attention right from the time of ' HELLO WORLD IT'S SIRAJ ' :P
Siraj, could you do another video explaining batch normalization and the math behind that?
great work Siraj, you are the "Greek Freak" of AI tutorials!
Are they any plans for a longer and more in depth video for this algorithm?
Hello Siraj, it's world... Great video as always!😊 What do you think about tf-slim as interface for tensorflow?
(thanks for the content BTW 👍)
I think, at the beginning the way you have shown input arrays or matrix is confusing, We convert the matrix of pixels of input image to a single column then feed each number to each node in input layer.
By this what i mean is at a time we input one image and one number to one node.
So because I've never been so hot on calculus, just to clarify: the delta of a particular neuron (from which we get the adjustment for the weight of each input from the previous layer by multiplying this delta by each input value, right?) is found by multiplying the sum of the weighted deltas of the next layer by the gradient of this layer (which in the case of the sigmoid function would be x*(x-1) where x is the output of the current neuron). Is this right? I'm building a simple neural network in Max/MSP and trying to wrap my head around how multiple layers work.
Hey man, really love your video. Just a simple question: in the end of this video you said you're going to calculate the derivative of life. Just wondering have you done that yet ? :)
thanks! Not yet haha
Siraj Raval lifehackq
siraj bro what can i do with this partial derivative garabge formulas they look so complicated
awesome!
Great. I think I got it, but just in case, tell me the whole thing again. I wasn't listening.
I think I heard it. But I don't get it.
In the derivation function the (time approx. 2:53) the slope is indeed 4 but the graph is incorrect (wrongly plotted black line representing the slope)
I'm a bit lost at the dot product example at 1:21
why are we multiplying a row of inputs on the same input node by a column of different weights? Wouldnt the value of each node in the next row be based on a column of inputs (the curent value of each node) * the weight of that each nodes connection to the next one?
How many output nodes are there in this equation?
Siraj please teach us a bit slow. Why you teach and explain it so fast ?
will go slower thanks
because he has to go fast, like sonic.
also he was trying to do it in 5 minutes heahahha
So he can make eye catching titles and get more views?
I actually like the speed. If I don't quite get something instantly, I can pause or rewatch that part.
For a guy who is revising his knowledge, sure. But someone who is learning step by step, this video is a nightmare.
Hello Siraj, around 3:26, you say that df/dx = (df/dx)*(dg/dx) , which is wrong, but in the screen it is stated correctly.
Amazing videos bro!
I have a question? The weights are common for all data sets i think. When they are adjusted for the first dataset, they will be readjusted for the next data set if there is. Doesn't it forget its previous adjustment?
For the final Backpropagation equation, shouldn't the current weights subtract, not add, the deltaWeights * learningRate? Since we want to move down the gradient instead of up.
HI Siraj,
awesome videos, I got a question, in you funcion:
" def nonlin(x,deriv=False):
if(deriv==True):
return x*(1-x)
return 1/(1+np.exp(-x))"
The derivative should be= -e^x / (e^x +1)^2 ??
Thanks Raj. I think there is a problem with the first example (binary operation example): the inputs are 3 but the graph shows 4.
please make a video on how to do backpropagation on Convolutional Neural Network.
Nice video! Can I ask what software(s) you used to create the animation
in 2:00? (This looks like the same one that 3Blue1Brown uses to me)
With regards to deriving the meaning of life, if the answer to all questions is 42, what would the weights of each input neuron be so that every question we ask the network results in 42?
URGENT HELP NEEDED!!!!!
function [pred,t1,t2,t3,a1,a2,a3,b1,b2,b3] = grDnn(X,y,fX,f2,f3,K)
%neural network with 2 hidden layers
%t1,t2,t3 are thetas for every layer and b1,b2,b3 are biases
n = size(X,1);
Delta1 = zeros(fX,f2);
Db1 = zeros(1,f2);
Delta2 = zeros(f2,f3);
Db2 = zeros(1,f3);
Delta3 = zeros(f3,K);
Db3 = zeros(1,K);
t1 = rand(fX,f2)*(2*.01) - .01;
t2 = rand(f2,f3)*(2*.01) - .01;
t3 = rand(f3,K)*(2*.01) - .01;
pred = zeros(n,K);
b1 = ones(1,f2);
b2 = ones(1,f3);
b3 = ones(1,K);
%Forward Propagation
wb = waitbar(0,'Iterating...');
for o = 1:2
for i = 1:n
waitbar(i/n);
a1 = X(i,:);
z2 = a1*t1 + b1;
a2 = (1 + exp(-z2)).^(-1);
z3 = a2*t2 + b2;
a3 = (1 + exp(-z3)).^(-1);
z4 = a3*t3 + b3;
pred(i,:) = (1 + exp(-z4)).^(-1);
%Backward Propagation
d4 = (pred(i,:) - y(i,:));
d3 = ((d4)*(t3')).*(a3.*(1-a3));
d2 = ((d3)*(t2')).*(a2.*(1-a2));
Delta1 = Delta1 + (a1')*d2;
Db1 = Db1 + d2;
Delta2 = Delta2 + (a2')*d3;
Db2 = Db2 + d3;
Delta3 = Delta3 + (a3')*d4;
Db3 = Db3 + d4;
for l = 1:100
t1 = t1*(.999) - .001*(Delta1/n);
b1 = b1*(.999) - .001*(Db1/n);
t2 = t2*(.999) - .001*(Delta2/n);
b2 = b2*(.999) - .001*(Db2/n);
t3 = t3*(.999) - .001*(Delta3/n);
b3 = b3*(.999) - .001*(Db3/n);
end
end
end
delete(wb);
end
%I can't seem to understand the fault, is it the matrix multiplication, because the code does run successfully but when I test the t1,t2,t3 with some testing examples, the prediction for all examples are exactly same and are equal to the predicted vector for the last trained example.
% please help, i am stuck here for over a month now, thanks!!
helpful video. looks like varun dhawan. 😁
thx correction: he looks like me