This channel is a blessing. I've had some really bad professors and I've had some really good professors. But even the really good professor never made the concepts click with me as well as these videos do. Like, not only do I understand the math better, but just the little diagram you drew showing A's column space, and visibly showing how b is outside of A's column space yet could still be approximated using a vector v in A's column space, like idk how else to describe it but that just made it click for me. Edit: I guess _one_ way to describe it and how it clicked for me: So we use a line to approximate a bunch of data points on a graph, or plane. If these data points were in a straight line, the "approximation" would have no error. However, this is often not the case. Now, think about the equation y=mx+b. Let's use c instead of b to avoid confusion in the next step. So we have y=mx+c. This is the equation used to represent our line. Suppose b=y-c. Then we have mx=b, which looks a lot like Ax=b. And it is! A is just a 1x1 matrix. So the line is bounded by the column space of A, or m, and our variable(s) (in this case, just x) can be changed to get b. Just basic algebra: if m=3 and b=6, then x=2. But say b is a 2D vector, e.g. b=(1, 2)^T. Well now, no matter what x you use, you can't get b (unless b just happens to lie on the line). You can only get as close to b as the column space of A will allow you. In the diagram drawn in the video, the column space of A is a plane, so the span of A is 2. For simplicity, let's suppose A is a 3x2 matrix (a geometrical interpretation of this is that A is a 2D plane "floating" in a 3D space). b appears to be a 3D vector (so while A is only a 2D slice of the 3D space, b is a point that could be anywhere in the 3D space). So, just like before, we try to use a line bounded by the column space of A to get as close to b as possible by changing our variables (in this case, x1 and x2). Correct me if my understanding is wrong :)
This lesson is fantastic! I understood the problem in only 15 minutes! You're absolutely better than my numerical analysis teacher at university, that can't properly teach an argument in two hours! Thank you!
I was doing an online machine learning course and got lost when the lecturer introduced the normal equation (which this is, with a different name). Needless to say, I'm finna binge-watch your linear algebra lectures now because I get insecure about using equations I don't understand. Thanks for the playlist, I really wanna put ML in my toolset so we're doing this!
@@khaledsherif7056 not sure which course he used for ML, but I'm studying Machine Learning by Andrew NG on Coursera. When he was teaching us normal equation as an alternative to gradient descent In Week 2 of the course, I realized I had seen this in Linear algebra but with a different name which is the title of this video.
You are like a billion times better than my professor... and my professor isn't even bad. On the contrary he's my favorite! You're just even better at explaining things. Plus it's impossible for me to lose focus with the pretty colors and your beautiful handwriting. lol I have my Linear Algebra final tomorrow (technically today) and I owe the A that I'm sure to get to you and all your helpful videos!
This is a good preface before machine learning. The star notation is always the most optimal/best, and you can gradient descent to minimize the square error
It would be great having links when says "I explained (whatever) in a different video" to access that explanation. In this case I wanted to know why C(A)transpose=N(Atranspose). Thanks¡
www.khanacademy.org/math/linear-algebra/alternate-bases/othogonal-complements/v/linear-algebra-orthogonal-complements go through this to understand how C(A)transpose=N(Atranspose).
consider any vector x perpendicular to Column space of A i.e. belongs to A _|_. Then dot product of A and x is 0, i.e. (A^T)(x) = 0 Now consider b = A^T, so clearly above equation is bx = 0, i.e. x lies in null space of b Thus x lies in null space of A^T also as in the first line I said x belongs to A perpendicular , thus C(A _|_) = null(A^T)
Very useful! In my lecture slides I had this term Hx=z for the same problem and I couldn't make sense of how we could get to this as the best solution: x = (Ht*H)^-1 * Ht * z. Now I understand:-)
I wish to know how to solve this: x has values of : -2 0 1 2 3 and y : 17 5 2 1 2 and i'm asked to use the least squares method, but i've been absent and i don't know exactly what my teacher ment by that or what that method consists of. Can anyone help me solve this ?
For that you need to study orthogonal components, and the concept of what spanning sets are which further derive the concept of column space, null space, etc.
There is always a solution to the least squares problem. Why? x* is in colspace(A) by definition of being a projection from b into C(A) so there must be a set of weights that yield a linear combination of a that equal b.
nice vid, but why did you take the length squared? i understand that the length of the vector would be sqrt(b1^2 + b2^2...bn^2) but why did you square even that?
I tried using this trick for the problem I'm facing, but it turns out that when I multiply AT by A, I get a matrix which isn't invertible, so I still can't solve it. LOL This _still_ seems odd to me, because even if some element in the input matrix A was contributing 0 to the result b, it should _still_ be possible to get a point as close as possible to the result.
"Some of you might already know where this is going.."
Me: Nope
Hahaha
What do u mean
When I get a real job, I will donate my bonus to Khan Academy. This has saved me so much time and you are so awesome.
Did you get a job yet?
Bol na jana❤❤
They are still waiting fir your bonus mate
This channel is a blessing. I've had some really bad professors and I've had some really good professors. But even the really good professor never made the concepts click with me as well as these videos do. Like, not only do I understand the math better, but just the little diagram you drew showing A's column space, and visibly showing how b is outside of A's column space yet could still be approximated using a vector v in A's column space, like idk how else to describe it but that just made it click for me.
Edit: I guess _one_ way to describe it and how it clicked for me:
So we use a line to approximate a bunch of data points on a graph, or plane. If these data points were in a straight line, the "approximation" would have no error. However, this is often not the case.
Now, think about the equation y=mx+b. Let's use c instead of b to avoid confusion in the next step. So we have y=mx+c. This is the equation used to represent our line. Suppose b=y-c. Then we have mx=b, which looks a lot like Ax=b. And it is! A is just a 1x1 matrix.
So the line is bounded by the column space of A, or m, and our variable(s) (in this case, just x) can be changed to get b. Just basic algebra: if m=3 and b=6, then x=2. But say b is a 2D vector, e.g. b=(1, 2)^T. Well now, no matter what x you use, you can't get b (unless b just happens to lie on the line). You can only get as close to b as the column space of A will allow you.
In the diagram drawn in the video, the column space of A is a plane, so the span of A is 2. For simplicity, let's suppose A is a 3x2 matrix (a geometrical interpretation of this is that A is a 2D plane "floating" in a 3D space). b appears to be a 3D vector (so while A is only a 2D slice of the 3D space, b is a point that could be anywhere in the 3D space). So, just like before, we try to use a line bounded by the column space of A to get as close to b as possible by changing our variables (in this case, x1 and x2).
Correct me if my understanding is wrong :)
This lesson is fantastic! I understood the problem in only 15 minutes! You're absolutely better than my numerical analysis teacher at university, that can't properly teach an argument in two hours! Thank you!
Comes in handy while studying machine learning.
yes, same
Very true. When I was studying ML, "normal equation", I really thought that I had seen it somewhere. Then I realized I studied it in Lin. algb.
I was doing an online machine learning course and got lost when the lecturer introduced the normal equation (which this is, with a different name). Needless to say, I'm finna binge-watch your linear algebra lectures now because I get insecure about using equations I don't understand. Thanks for the playlist, I really wanna put ML in my toolset so we're doing this!
Can you please mention the name/link of the course ?
@@khaledsherif7056 not sure which course he used for ML, but I'm studying Machine Learning by Andrew NG on Coursera. When he was teaching us normal equation as an alternative to gradient descent In Week 2 of the course, I realized I had seen this in Linear algebra but with a different name which is the title of this video.
You are like a billion times better than my professor... and my professor isn't even bad. On the contrary he's my favorite! You're just even better at explaining things.
Plus it's impossible for me to lose focus with the pretty colors and your beautiful handwriting. lol
I have my Linear Algebra final tomorrow (technically today) and I owe the A that I'm sure to get to you and all your helpful videos!
7 years later... did you get an A? :)
11 years later did you get that A?
12 years later did you get that A?
Very useful man you are doing an amazing job this literally saved me hours of searching and reading can't thank you enough :)
This was incredible, I started this video off being so confused about the least squares, and I just get it entirely now! Thank you so much :)
Best linear algebra playlist.
Indebted to Khan academy forever!
This is super useful in solving assignments.THanks khan academy.
This is a good preface before machine learning. The star notation is always the most optimal/best, and you can gradient descent to minimize the square error
Best approach to the problem. No gradient, no multivariable calculus. you're master!
It would be great having links when says "I explained (whatever) in a different video" to access that explanation. In this case I wanted to know why C(A)transpose=N(Atranspose).
Thanks¡
+Sergio Prada same thing here
www.khanacademy.org/math/linear-algebra/alternate-bases/othogonal-complements/v/linear-algebra-orthogonal-complements go through this to understand how C(A)transpose=N(Atranspose).
+1
consider any vector x perpendicular to Column space of A i.e. belongs to A _|_.
Then dot product of A and x is 0, i.e. (A^T)(x) = 0
Now consider b = A^T, so clearly above equation is bx = 0, i.e. x lies in null space of b
Thus x lies in null space of A^T
also as in the first line I said x belongs to A perpendicular ,
thus C(A _|_) = null(A^T)
Excellent explanation of a valuable technique.
Thank you Salman Khan. I appreciate the opportunity to relearn the method here. You can never hear this stuff enough times.
god dang it I knew I should have chosen other bachelor thesis..
haha!!!!
just realizing this now as well
first semester stuff at my uni
@@rob6129 what uni u attending?
Helpful exploration of least square properties
Awesome explanation! Keep up the good work!
Your videos are just great !!! The concepts with geometrical examples make very good sense !!! Thanks a lot
Very useful! In my lecture slides I had this term Hx=z for the same problem and I couldn't make sense of how we could get to this as the best solution: x = (Ht*H)^-1 * Ht * z.
Now I understand:-)
very helpful! Thanks a lot! you are doing great things! I also listened to your other videos, all very wonderful!
This is surprisingly easy
Nice derivation of the normal equation
Thank you so much. You just simplified long boring hours of confusing lecture
It seems I have seen the best video!
can you teach me cubic expressions and cubic equations :)
eg. solve the equation x(3X3X3) - 2x(2X2) - x + 2 = 0
by using the factor theorem formula :)
thank you very much sir
Thanks so much Khan...wonderful explanation in two videos that explains everything...great. You are wonderful
Thanks a lot, very comprehensive ! great job!
This guy is good...........
great geometric intuition of linear regression
can we please get a video for the maximum likelihood estimation
당신은 나의 구원자입니다. 정말 명쾌한 강의입니다. 감사합니다!! 👍👍👍
thank you sir
Super clarity......
thanks
really helpful
I wish to know how to solve this: x has values of : -2 0 1 2 3 and y : 17 5 2 1 2 and i'm asked to use the least squares method, but i've been absent and i don't know exactly what my teacher ment by that or what that method consists of. Can anyone help me solve this ?
Should have used n instead of k its usually mxn in R^n
thaks
This is the first Khan Academy video I watch and don't understand...
For that you need to study orthogonal components, and the concept of what spanning sets are which further derive the concept of column space, null space, etc.
Good video!!!! And nice work! Good luck with the KhanAcademy :)
Excelent video.
Thanks much :))))))))
Vahag
I have a question..
does least sequare approximation has always solution..
+Zulfiqar Ali not if you don't solve it.
+Conor Raypholtz it still has a universally reasonable solution
I'm pretty sure that is the idea of least squares: to provide a close answer when you can't give an exact one
it does always have one - if Ax = b has a solution than it's a vector on A and if not it's the projection on A.
There is always a solution to the least squares problem. Why? x* is in colspace(A) by definition of being a projection from b into C(A) so there must be a set of weights that yield a linear combination of a that equal b.
what happens when AT*A is singular. How do we solve for the least square solution?
love this guy
2018? Im alone :(
I'm here.
Onto 2019!
2024 here
I have one question, whether the LSS always consistent? if yes, how can I prove it? please answer
Hi, not sure if you're still looking for the answer, but could you please describe what do you mean by consistent?
It means that wheather we can always find least square solution of a system.
nice vid, but why did you take the length squared? i understand that the length of the vector would be sqrt(b1^2 + b2^2...bn^2) but why did you square even that?
utte12
Because it’s easier to work with minimizing the sum of squares than minimizing the square root of a sum of squares. That’s my guess
I tried using this trick for the problem I'm facing, but it turns out that when I multiply AT by A, I get a matrix which isn't invertible, so I still can't solve it. LOL
This _still_ seems odd to me, because even if some element in the input matrix A was contributing 0 to the result b, it should _still_ be possible to get a point as close as possible to the result.
how did you know that it was a projection to the Col(A) and not anything else like the Range(A)?
Winnie Shi
Col(A) already is the range of A.
I am the 60th guy liking it !! :P :D
Great vid, thank you. :)
Big brajn
accha hai
ICAM ! ICAM ! .... .. ...... !
❤
🤩
Sometimes I can't see what he's writing.
bro just do an example lol
n1
gorgeous
Respond to this video...