Mathematically, at 4:19, the w and x vectors have been written incorrectly. Basically, vectors are always written as columns and covectors are written as rows. And dot product can be understood as a covector eating a vector and spitting out a real number. Therefore, at 4:19, there should be a transpose on the row vectors if he wants to write it like that.
dekho yaar baat aisi hai mujhe computer science mein kuch nahi ata lekin main neural network seekhna chahta hun to isko seekhne ke liye kya kya ana chahiye?
@@shubhamide 12thclass/undergrad level - Linear algebra, Matrices, Calculus, Statistics and Probability. Some basic programming. Inke bina dimag kharab hoga
The site below mentions the prerequisites for the entire course along with the relevant links to access the content www.cse.iitm.ac.in/~miteshk/CS7015.html Hope this helps!
@@abhirupbhattacharya3373 hey bro! Can u suggest me some basic materials to study concepts of vectors? I don’t understand vectors and that’s why i always feel trouble in studying deep learning concepts.
at 8 minutes, it works not only by given explanation but also it comes from epoch training formulas : a) Y2^(i)=beta(x^|I|t *W) b) err=y^|I|-Y2^|I| ...(true-prediction) c) w=w+err*X|I|... depending on error the equation is gonna change, if error is -1 then from the c) equation we can see thhen=w-X^i. It basically means if the predicted output(Y2) is =0 and target(Y) is 1, then we need to=x+w, what graphically would mean that line dividing classes would go to the left side from class 1 to class 0. And opposite can happen: if output is 1, target 0, then need to subtract input vector from weights. It is something to deal with errors and its not understandable from this clearly.
thank god you mentiuoned that x'x is a positive quantity thats why cos alpha is < or > in respective cases at 9:14. i was starting to get really worried
This is a university level course and linear algebra, calculus are prerequisites for this. But yea he should've mentioned them for the sake of online audience
Algorithm will surely converge but some of the training points whether positive and negative it will make errors because we never get 100℅ accuracy score even on training data if somehow we achieved it our system or model get overfitted
Can anyone explain this please, How at 10.10 he is assuming angle between p1 and initial w is greater than 90? I mean how the angle is measured? I believe between w and pi right? I cant figure it out how is it seems to be greater than 90 seeing this figure. and this happens again after modifying w between new w and p2. Please explain.
See this is done in order to find the line between linear separable data. Look w is selected randomly. So now we select x which is any of the points so if the x is a positive point the we computed w.x and if w.x
Take a cosine distance formula i.e, (w^t x) / (||w|| ||x||), if we add x to w (w=w+x) => w^t x increases => cosine distance increased. When does cosine distance increase? Only when angle θ decreases between two vectors. So to make correct predictions for every error,(a +ve point is predicted as -ve) we add x or subtract (a -ve point is predicted as +ve) x from w accordingly. Weights and biases are randomly initialized (by using Glorot initializers, He initializers etc.)in neural networks and are only trainable parameters in whole of DNN.
I'm sure you must have got the answer by now but I'm just answering if anyone in future has the same doubt. Take a cosine distance formula i.e, (w^t x) / (||w|| ||x||), if we add x to w (w=w+x) => w^t x increases => cosine distance increased. When does cosine distance increase? Only when angle θ decreases between two vectors. So to make correct predictions for every error, in this case a +ve point is predicted as -ve, we add or subtract (a -ve point is predicted as +ve) x from w accordingly.
Yeah you can.. but earlier when perceptron paper was presented, this was using this way only. But later on gradient descent method was also discovered in relation to this.
Assume cos A = 0, then A = cos^-1 (0)= 90 deg Now increasing the value of cos A, ie, cos A = 0+0.1 = 0.1 , ie, cos(A new) = 0.1 and A new = cos^-1 (0.1) = 84.2 deg When value of cos A increase the angle it represents decreases. So from the above explanation we can reach a conclusion that when cos(A new) > cos(A) -> A new < A Hope it helps
At 1:40 what does he mean by saying positive and negative input? As in the previous slide, he clearly mentioned that either input is binary or between 1 and 0. Really a poor quality from NPTEL
He clearly defines the positive and negative input at 1:38. Positive inputs are the ones which give output (i.e. label) 1 and negative inputs are the ones which give the output 0. Why just 0 and 1? Because it is a binary classification problem we are referring to. And even if you will google, you will see that the Perceptron algorithm is a two-class (binary) classification machine learning algorithm. eg. We have two possible output in the example that he has considered. So, the labels will be: Liked the movie = 1 Not liked the movie = 0
I wished other NTPL courses had you as the instructor.
seriously, his method of teaching is excellent
Loved the way the subject is explained! Hats off!!!
Great! Understood the whole concept in 10 mins
Mathematically, at 4:19, the w and x vectors have been written incorrectly. Basically, vectors are always written as columns and covectors are written as rows. And dot product can be understood as a covector eating a vector and spitting out a real number. Therefore, at 4:19, there should be a transpose on the row vectors if he wants to write it like that.
Hey can you suggest some links to study basics of vectors, so that there is not any problem in understanding deep learning algorithms?
@@sachinsinghchauhan9861 3 blue 1 brown search it up
How interesting is it..! amazing communication skill
Very nice explanation Sir.. especially the change of W which you explained by considering each positive and negative point.
it seems students r there just to revise the topics.....
5:39 every point on the line is perpendicular to w. But what is the direction of such points on the line?
So many post production staff and cannot add a fade out and fade in to the vid and audio ?
The outro is scaring me when i'm on headphones 13:02
dekho yaar baat aisi hai mujhe computer science mein kuch nahi ata lekin main neural network seekhna chahta hun to isko seekhne ke liye kya kya ana chahiye?
@@shubhamide 12thclass/undergrad level - Linear algebra, Matrices, Calculus, Statistics and Probability.
Some basic programming. Inke bina dimag kharab hoga
@@farooq8fox maths to ata h ye batao programming ke liye koi source batao
@@shubhamide basic Python programming, and then ek machine learning ka course karlo, preferably the one by Andrew Ng, then you will understand better
Could somebody kindly share the pre-requisites for this lecture that the professor is mentioning at 4:18 ?
Linear algebra ,
@@ayushshukla9070and a little respect for those who want to learn.
th-cam.com/video/LyGKycYT2v0/w-d-xo.html
The site below mentions the prerequisites for the entire course along with the relevant links to access the content
www.cse.iitm.ac.in/~miteshk/CS7015.html
Hope this helps!
@@pranavsawant1439 this guy is in IITM, but why does the course say IIT Ropar?
Why are we not considering the loss while updating the weights here? Why are we summing up or subtracting the input vector from the weight vector?
content and topics is far better than others but everything is going fast not in detail ... so hard to understand
Why are we doing w+x and w-x ? /We can add any vector to w in the direction of x right? Why 'x' precisely?
to correctly set the value of W. this is done during training to set the optimum value of W when inputs are X in n+1 dimensional space
@@abhirupbhattacharya3373 hey bro! Can u suggest me some basic materials to study concepts of vectors? I don’t understand vectors and that’s why i always feel trouble in studying deep learning concepts.
at 8 minutes, it works not only by given explanation but also it comes from epoch training formulas : a) Y2^(i)=beta(x^|I|t *W) b) err=y^|I|-Y2^|I| ...(true-prediction) c) w=w+err*X|I|... depending on error the equation is gonna change, if error is -1 then from the c) equation we can see thhen=w-X^i. It basically means if the predicted output(Y2) is =0 and target(Y) is 1, then we need to=x+w, what graphically would mean that line dividing classes would go to the left side from class 1 to class 0. And opposite can happen: if output is 1, target 0, then need to subtract input vector from weights. It is something to deal with errors and its not understandable from this clearly.
You are absolutely correct.
thank god you mentiuoned that x'x is a positive quantity thats why cos alpha is < or > in respective cases at 9:14. i was starting to get really worried
Same problem. Indian institutions have this perception that everyone knows everything. He did not bother to explain many things.
This is a university level course and linear algebra, calculus are prerequisites for this. But yea he should've mentioned them for the sake of online audience
@@farooq8fox bhai calculus aur linear algebra to ata h lekin ye "while do" kya cheez hai?
@@shubhamide It's a loop, in programming pseudo code language. google while loops.
Just wow!!
what is the error at 9.27?
can anyone explain how point p1 , p2 ,p3 , n1 are greater than 90
Algorithm will surely converge but some of the training points whether positive and negative it will make errors because we never get 100℅ accuracy score even on training data if somehow we achieved it our system or model get overfitted
Can anyone explain this please, How at 10.10 he is assuming angle between p1 and initial w is greater than 90? I mean how the angle is measured? I believe between w and pi right? I cant figure it out how is it seems to be greater than 90 seeing this figure. and this happens again after modifying w between new w and p2. Please explain.
lol you are so dumb
at @8:40
cos(alpha_new) > cos(alpha)
=> alpha_new < alpha
Why are we doing w=w+x and w=w-x can anyone please explain how did we get this?
See this is done in order to find the line between linear separable data. Look w is selected randomly. So now we select x which is any of the points so if the x is a positive point the we computed w.x and if w.x
Take a cosine distance formula i.e, (w^t x) / (||w|| ||x||), if we add x to w (w=w+x) => w^t x increases => cosine distance increased. When does cosine distance increase? Only when angle θ decreases between two vectors. So to make correct predictions for every error,(a +ve point is predicted as -ve) we add x or subtract (a -ve point is predicted as +ve) x from w accordingly. Weights and biases are randomly initialized (by using Glorot initializers, He initializers etc.)in neural networks and are only trainable parameters in whole of DNN.
Should the x vector be a unit vector as if we add XtX to WtX it should be such that cos(anew) should not exceed 1
not needed. Norm of weight (w) is in denominator.
Great
can anybody please explain me, why we add w = w+x, and w=w-x??????
I'm sure you must have got the answer by now but I'm just answering if anyone in future has the same doubt.
Take a cosine distance formula i.e, (w^t x) / (||w|| ||x||), if we add x to w (w=w+x) => w^t x increases => cosine distance increased. When does cosine distance increase? Only when angle θ decreases between two vectors. So to make correct predictions for every error, in this case a +ve point is predicted as -ve, we add or subtract (a -ve point is predicted as +ve) x from w accordingly.
@@arvind31459 bhai cosine distance kya cheez hoti hai..aur iska iss cousre se kya lena dena h?
I have 1 question:
why are we considering x0 = 1 ? because we are writing the equation as summation ( 0 to n) (w(i)x(i))
See previous video regarding MP Neurons.
merci
we can apply gradient descent?
Yeah you can.. but earlier when perceptron paper was presented, this was using this way only. But later on gradient descent method was also discovered in relation to this.
Bro plz tell me what python libraries are used in the course.. Tenserflow or pytorch or anything else.. Then I'll start working on them
can anyone explain how Cos(A new) > Cos(A) , but (A new) < (A old) 8:18
cos A is a decreasing function from 0-90 degrees
Assume cos A = 0, then A = cos^-1 (0)= 90 deg
Now increasing the value of cos A, ie, cos A = 0+0.1 = 0.1 , ie, cos(A new) = 0.1 and A new = cos^-1 (0.1) = 84.2 deg
When value of cos A increase the angle it represents decreases.
So from the above explanation we can reach a conclusion that when cos(A new) > cos(A) -> A new < A
Hope it helps
At 1:40 what does he mean by saying positive and negative input? As in the previous slide, he clearly mentioned that either input is binary or between 1 and 0.
Really a poor quality from NPTEL
By positive input he means inputs having a positive output.
He clearly defines the positive and negative input at 1:38. Positive inputs are the ones which give output (i.e. label) 1 and negative inputs are the ones which give the output 0. Why just 0 and 1? Because it is a binary classification problem we are referring to. And even if you will google, you will see that the Perceptron algorithm is a two-class (binary) classification machine learning algorithm.
eg. We have two possible output in the example that he has considered. So, the labels will be:
Liked the movie = 1
Not liked the movie = 0
poor fellow..never seen any waste lecture from nptel..he assuming all other knows everything