I highly recommend trying to prove this theorem in R2, R3, and R4 just to get a better intuition on how the variables cancel out. My proffessor gave us a take home test where we had to prove cauchy schwartz in R4 and it was honestly a very beautiful proof. The day after in class he was able to use the structure for the proof in R4 to generalize to Rn
b/2a to minimize the function --- it's a quadratic function; it achieve its minimum at b/2a... "Let me pick any number, b/2a" -- we know that's not any value :) You can't prove it without b/2a You can also differentiate p(t) From (y.y)t^2 - 2(x.y)t +x.x p'(t) = 2t||y|| - 2(x.y) *** x.x is a scalar p'(t) achieve its min when = 0 => 2t||y||^2 - 2(x.y)= 0 => t||y||^2 = 2(x.y) => t = (x.y)/||y||^2 **Substitute t; the proof is complete.
at 8:17, the intuition for using b/2a is because t at b/2a is the minimum point of the curve. By completing the square, we get a ( t - b/2a )^2 - (b^2)/4a + c. Since a is positive, t = b/2a is at the minimum point of the curve. So the reason for choosing b/2a is to evaluate the inequality at the point closest to 0. Can anyone confirm or deny?
I think Sal failed to explain a key insight here. First lets break down P(t)=|| ty-x ||^2 Notice that ty is a parameterization of a line so ||ty-x|| ^2 is essentially the squared distance between a point on that line and a vector x. Now what we want to do is find the minimum distance between the vector and the line. Since the function P(t)= at^2-bt+c obtains a minimum at t=b/2a (notice the negative sign in the function) we substitute that in to get the minimum distance and get the inequality using thr fact that P(t)>=0 (all lengths are positive) . Note:If ty and x are on the same line their minimum distance is zero which explains the equality and why one must be a multiple of the other
@@anjalip7949 If x and y are linearly independent, you get the minimum distance (i.e. 0). However, if they are NOT linearly independent, then we must actually SUBTRACT x from ty. If we use ||ty+x|| ^2, the inequality is backwards/reversed/errant.
you are my motivation. I think your videos are more of complements to classes rather than actual lessons. it so diverse and inclusive that if you were to study a topic complete your still guaranteed to learn something new.
Awesome proof! I think an argument with the discriminant would be a little more natural, rather than plugging in b/2a which takes a bit of foresight and might seem arbitrary. For those asking, the Cauchy-Schwarz inequality is extremely useful in Linear Algebra, Analysis, and probability. A good amount of proofs in mathematics use this inequality (even in physics too, where CS is used to arrive at the Heisenberg uncertainty principle).
how do I know what constant to put in to P(t) to prove the theorem? It seemed like you knew the answer and you used the "answer"(the constant t which was b/2a)) to "proof" the theorem.
I think that it's more convenient to just use the identity that the dot product of any two vectors is their magnitudes multiplied together, times cosine of the smallest angle between them, to prove the inequality, or the equality when they are parallel vectors since theta (smallest angle) is 0, so cos0 is 1. For any other non-parallel vector pair, since costheta has the domain between neg 1 and pos 1, then we divide both sides by the magitudes to get 1 >/= costheta, which is true.
he started with an expression that has t in it, and he explained in the beginning why this expression is always greater or equal to zero regardless of the value of t, later he just started referring to that expressions as a function of t.
The function P(t) describes the squared distance between a vector and a point on the parameterized line ty. Essentially what were trying to do is to find the minimum distance between that line and that vector. Since P(t)=at^2-bt+c its minimum is obtained at t=b/2a plugging that in we get the desired inequality. Also notice that if the vector and line are the same line their minimum distance will be zero hence equality and why the other vector must be the scaled version of the other
Is there some specific reason why you defined P(t)=||ty-x||? Can you define any other vector differently? Also later in the proof you evaluated p(t) at b/2a you have not mentioned what is the a reason for that? It was a nice explanation I would be grateful if you explained above questions. Thank you
Why not just say A dot B = |A||B| cos(theta),and the largest value cosine can contribute is 1, giving A dot B = |A||B|. Cos(theta)=0 occurs at 0 and Pi radians (or 0 and 180 degrees), meaning A and B must be parallel or antiparallel. Either way, this means they are scalar multiples of one another.
Why choosing the function p(t) ? And then t=b/2a ? It might be better if some purposes were told for choosing the function and then that particular value of t
Hello Salman Khan. I love your videos and have used them for calculus. You are truly an asset and valuable resource to any student. The main purpose of this comment (if you see it) is to ask what you recommend I use with my wacom bamboo tablet whilst taking notes in class. In terms of functionality, my only prefernce is that I could have typewritten text together with my tablet writing which would be diagrams. Anyone who feels like answering this, comment here or message me
The equation at(squared)-bt+c in the video is very similar to the equation to find distance after t seconds when acceleration is an arbitrary constant(1/2at(squared)+bt+c)
On third viewing, it's making more sense. My problem was that even though I could understand each step, I wasn't getting any intuition from it. At 16:50 he says in future videos he'll give intuition as to WHY it makes sense. I will rest easy again now!
I am really grateful for this video! You have provided a really clear explanation about this inequality. Thanks! My mathematics exam is approaching. I am so nervous about it.
It's a max/min thing, if you're searching for a vertex on a parabola, it will give you the y coordinate. (I think, it has been a while since I've really done that.
It could be any vector squared to begin with. He just picks the vector 'ty-x' because without breaking any rules of mathematics it can give him the result he wants. He uses that specific vector because the thing he's trying to get has x's and y's in it. And because any vector can be written as 'ty-x' (ie some scalar multiple of a vector minus some other vector) then what he proves for this example must hold true for all examples.
Kahn, Great proof and explanation of the the Cauchy-Schwarz inequality is. Could you explain the use of P(t) and why the function was chosen as it was presented as a linear differnce combination of two vectors. I'd like to re-run the proof using the addition of two vectors to see if anything is lost in this proof.
7:50 Why didn't Khan just take the discriminant of P(t) as less than or equal to zero? Consequently (b/2)^2 is less than or equal to ac and Cauchy-Schwarz is proven.
@bach1229 Its a teaching strategy to have you remember it better sir. its not just for mathmaticians either, it can be used in any subject, I use it for language, for example if I were to teach english and translate to spanish I would use red for english and blue for spanish, there is lots of research behind it. It works most of the time.
Maybe what you're asking is deeper than that... If you're asking how the first guys who did this knew what function to start with, then... that's a good question.... it possibly came out of exploring and playing around with squaring the magnitudes of vectors and seeing what happened. when this popped out they realised it made a definite statement about all vectors, a statement that could come in very handy in the future.
11:26 Ok I've seen it proven like this other times. I still don't get how b^2 - 4ac is allowed to be less than zero. Isn't this pretty much allowing there to be negative square roots if you were to use the quadratic equation on this theoretical equation???
@regingwapo: The only if part are the assumptions. Assumptions are usually given as true and thus doesn't need to be proven. For example, he/she is human only if he/she is a man or women. That only if part is given to be someone either male or female and unnecessary to prove.
Since I don't really know the definitions i can't prove them by my self. And I can't find a starting point for this. Is it necessary to know some linear algebra to understand topology
Is this somehow related to the quadratic formula? While proving the CS inequality at one of the steps he got b^2-4ac is less than or equal to 0. Is that just because he started from a quadratic equation or is there another reason?
I know this reply comes two months late, but I hope it can still be of some use. When b^2/2a is multiplied through by two, it is just being changed to an equivalent fraction. An equivalent fraction has the same value so it does not need to change the rest of the equation. Just like if you add 1/2 or 2/4 to a number you'd get the same result both times.
wait, but the CS equality is an if and only if proof, you've only proved one way assuming that x = cy (where x & y are vectors and c is a scalar) and then plugged it in what about the other way around? Maybe I'm not looking at it right..
@FreeziiS If you know the answer, would you care to explain? I can't find the comment on the Khan Academy site that allegedly explains the choice of t=b/2.
If you learn scalar product rule. You remove the cosx and the side that had cosx now becomes bigger or equal to the other side as cosx is only from -1 to 1. This produces the same inequality. You can quickly test on a piece of paper.
Ahh, I thought she wanted a fast way to know the expression since from scratch its long and troublesome. And I don't think any exam tells u to derive this literally from scratch so... Anyways, thanks for reminder
Can't we just prove it by the definition of dot product, a.b=|a||b|cos(theta), the maximum value of this is when theta=0 which is a.b=|a||b| cos0= |a||b|. For any other value of theta a.b
unfortunately you did not prove if and only if...basically . In a real inner product space where u and v are two vectors, given |u.v|=||u|| ||v||, then u and v are linearly dependent
Instead of learning Ruby frameworks or multithreading devices, I'm here learning about this useless (to me) inequality that I will never see again.... College is dogshit sometimes. BRAVO PRACTICALITY! Anyway, good video. Just the moron mathematicians that haven't worked a day in the software development field making me watch this shouldn't be in charge of course building...
I highly recommend trying to prove this theorem in R2, R3, and R4 just to get a better intuition on how the variables cancel out. My proffessor gave us a take home test where we had to prove cauchy schwartz in R4 and it was honestly a very beautiful proof. The day after in class he was able to use the structure for the proof in R4 to generalize to Rn
b/2a to minimize the function --- it's a quadratic function; it achieve its minimum at b/2a... "Let me pick any number, b/2a" -- we know that's not any value :) You can't prove it without b/2a
You can also differentiate p(t)
From (y.y)t^2 - 2(x.y)t +x.x
p'(t) = 2t||y|| - 2(x.y) *** x.x is a scalar
p'(t) achieve its min when = 0
=> 2t||y||^2 - 2(x.y)= 0
=> t||y||^2 = 2(x.y)
=> t = (x.y)/||y||^2
**Substitute t; the proof is complete.
at 8:17, the intuition for using b/2a is because t at b/2a is the minimum point of the curve. By completing the square, we
get a ( t - b/2a )^2 - (b^2)/4a + c. Since a is positive, t = b/2a is at the minimum point of the curve. So the reason for choosing b/2a is to evaluate the inequality at the point closest to 0. Can anyone confirm or deny?
I think Sal failed to explain a key insight here. First lets break down P(t)=|| ty-x ||^2 Notice that ty is a parameterization of a line so ||ty-x|| ^2 is essentially the squared distance between a point on that line and a vector x. Now what we want to do is find the minimum distance between the vector and the line. Since the function P(t)= at^2-bt+c obtains a minimum at t=b/2a (notice the negative sign in the function) we substitute that in to get the minimum distance and get the inequality using thr fact that P(t)>=0 (all lengths are positive) .
Note:If ty and x are on the same line their minimum distance is zero which explains the equality and why one must be a multiple of the other
thanks cleared up alot
Hello, can you please elaborate on the minimum distance part?
THANK YOU
@@anjalip7949 If x and y are linearly independent, you get the minimum distance (i.e. 0). However, if they are NOT linearly independent, then we must actually SUBTRACT x from ty. If we use ||ty+x|| ^2, the inequality is backwards/reversed/errant.
how to calculate the minimum of at^2-bt+c?
you are my motivation. I think your videos are more of complements to classes rather than actual lessons. it so diverse and inclusive that if you were to study a topic complete your still guaranteed to learn something new.
Awesome proof!
I think an argument with the discriminant would be a little more natural, rather than plugging in b/2a which takes a bit of foresight and might seem arbitrary.
For those asking, the Cauchy-Schwarz inequality is extremely useful in Linear Algebra, Analysis, and probability. A good amount of proofs in mathematics use this inequality (even in physics too, where CS is used to arrive at the Heisenberg uncertainty principle).
8:06 could use some motivation for t=b/2a
Thank you soo much for giving us free lessons!
Education is the most valuable thing ever.
how do I know what constant to put in to P(t) to prove the theorem? It seemed like you knew the answer and you used the "answer"(the constant t which was b/2a)) to "proof" the theorem.
I think that it's more convenient to just use the identity that the dot product of any two vectors is their magnitudes multiplied together, times cosine of the smallest angle between them, to prove the inequality, or the equality when they are parallel vectors since theta (smallest angle) is 0, so cos0 is 1. For any other non-parallel vector pair, since costheta has the domain between neg 1 and pos 1, then we divide both sides by the magitudes to get 1 >/= costheta, which is true.
Very interesting, but I've got one problem. From where did you get this p(t) artificial function?
he started with an expression that has t in it, and he explained in the beginning why this expression is always greater or equal to zero regardless of the value of t,
later he just started referring to that expressions as a function of t.
The function P(t) describes the squared distance between a vector and a point on the parameterized line ty. Essentially what were trying to do is to find the minimum distance between that line and that vector. Since P(t)=at^2-bt+c its minimum is obtained at t=b/2a plugging that in we get the desired inequality. Also notice that if the vector and line are the same line their minimum distance will be zero hence equality and why the other vector must be the scaled version of the other
Is there some specific reason why you defined P(t)=||ty-x||? Can you define any other vector differently?
Also later in the proof you evaluated p(t) at b/2a you have not mentioned what is the a reason for that?
It was a nice explanation I would be grateful if you explained above questions.
Thank you
Why not just say A dot B = |A||B| cos(theta),and the largest value cosine can contribute is 1, giving A dot B = |A||B|. Cos(theta)=0 occurs at 0 and Pi radians (or 0 and 180 degrees), meaning A and B must be parallel or antiparallel. Either way, this means they are scalar multiples of one another.
Why choosing the function p(t) ? And then t=b/2a ? It might be better if some purposes were told for choosing the function and then that particular value of t
Hello Salman Khan. I love your videos and have used them for calculus. You are truly an asset and valuable resource to any student.
The main purpose of this comment (if you see it) is to ask what you recommend I use with my wacom bamboo tablet whilst taking notes in class. In terms of functionality, my only prefernce is that I could have typewritten text together with my tablet writing which would be diagrams.
Anyone who feels like answering this, comment here or message me
The equation at(squared)-bt+c in the video is very similar to the equation to find distance after t seconds when acceleration is an arbitrary constant(1/2at(squared)+bt+c)
On third viewing, it's making more sense. My problem was that even though I could understand each step, I wasn't getting any intuition from it. At 16:50 he says in future videos he'll give intuition as to WHY it makes sense. I will rest easy again now!
I am really grateful for this video! You have provided a really clear explanation about this inequality. Thanks! My mathematics exam is approaching. I am so nervous about it.
why t=b/2a ?
It's a max/min thing, if you're searching for a vertex on a parabola, it will give you the y coordinate. (I think, it has been a while since I've really done that.
Omg... this makes things sooo clear...
Thank you so much sir 🙌🏼
Where is the equation p(t) = ||ty-x||^2 from? How did you choose it?
awesome deliverance there, atleast i have understand.
Well explained sir, thanks for sharing
Thank youuuuuuu!!!
It could be any vector squared to begin with. He just picks the vector 'ty-x' because without breaking any rules of mathematics it can give him the result he wants. He uses that specific vector because the thing he's trying to get has x's and y's in it. And because any vector can be written as 'ty-x' (ie some scalar multiple of a vector minus some other vector) then what he proves for this example must hold true for all examples.
Kahn, Great proof and explanation of the the Cauchy-Schwarz inequality is. Could you explain the use of P(t) and why the function was chosen as it was presented as a linear differnce combination of two vectors. I'd like to re-run the proof using the addition of two vectors to see if anything is lost in this proof.
15:29 here you have | || y||^2 | not || y||^2. But they are equal.
Thank you, Sir.
7:50
Why didn't Khan just take the discriminant of P(t) as less than or equal to zero? Consequently (b/2)^2 is less than or equal to ac and Cauchy-Schwarz is proven.
This was very helpful
@bach1229 Its a teaching strategy to have you remember it better sir. its not just for mathmaticians either, it can be used in any subject, I use it for language, for example if I were to teach english and translate to spanish I would use red for english and blue for spanish, there is lots of research behind it.
It works most of the time.
Maybe what you're asking is deeper than that... If you're asking how the first guys who did this knew what function to start with, then... that's a good question.... it possibly came out of exploring and playing around with squaring the magnitudes of vectors and seeing what happened. when this popped out they realised it made a definite statement about all vectors, a statement that could come in very handy in the future.
You make maths fun
Fantastic!
What's the intuition behind plugging in b/2a to p(t)?
Where does the P(t) function come from, what is the intuition for using it?
can you say something on ||>=?
Thank you Sal!
11:26 Ok I've seen it proven like this other times. I still don't get how b^2 - 4ac is allowed to be less than zero. Isn't this pretty much allowing there to be negative square roots if you were to use the quadratic equation on this theoretical equation???
I think it's much easier to proof that (x1y2-x2y1)^2 >= 0, where x1,x2 and y1,y2 are coordinates of vectors x and y.
3:11 "It was 2 videos ago"--😂😂
Very helpful, thanks.
I'm confused, shouldn't the equality iff cy = x have another part to it, where we start off with just x & y and prove that they're scalar multiples...
@regingwapo: The only if part are the assumptions. Assumptions are usually given as true and thus doesn't need to be proven.
sir thank u so much
8:03 why did we choose b/2a as our value? how did we get this? so confused!
For all you guys who want to know why b^2 - 4ac needs to be < or = to zero go and read my tip/comment on this video in khanacademy.
@regingwapo: The only if part are the assumptions. Assumptions are usually given as true and thus doesn't need to be proven. For example, he/she is human only if he/she is a man or women. That only if part is given to be someone either male or female and unnecessary to prove.
Since I don't really know the definitions i can't prove them by my self. And I can't find a starting point for this. Is it necessary to know some linear algebra to understand topology
Isn't this much much much easier to prove if you introduce the dot product and do an inequality equation with -1
13:48 what do you mean principal square root
Is this somehow related to the quadratic formula? While proving the CS inequality at one of the steps he got b^2-4ac is less than or equal to 0. Is that just because he started from a quadratic equation or is there another reason?
Sal -- are you dedicated full time to your academy?
what if you use complex numbers?
Why we have put t= b/2a
Sir ?please tell🙏
good stuff! I got it!
Have to say I struggled a bit with this so I'm going to find another source.
what are Non Real vectors ?
it is really nice proof!! :) thanks
complicate an easy thing.
Why does the video description say "Linear algebra describes things in two dimensions" which is just not true lol?
09:37
how can we multiply b^2/2a by 2 and leave the rest untouched?
I know this reply comes two months late, but I hope it can still be of some use. When b^2/2a is multiplied through by two, it is just being changed to an equivalent fraction. An equivalent fraction has the same value so it does not need to change the rest of the equation. Just like if you add 1/2 or 2/4 to a number you'd get the same result both times.
it will become much easier when I use the other dot product formula X.Y = ||X||.||Y||.cos(o). but I know he spouses we don't know this formula yet.
ECE 485 UofA
wait, but the CS equality is an if and only if proof, you've only proved one way assuming that x = cy (where x & y are vectors and c is a scalar) and then plugged it in what about the other way around? Maybe I'm not looking at it right..
@frr Only if means that it can't be something else, so you have to prove that that something else doesn't produce the same conclusion.
awesome
Is magnitude and norm same or interchangeable?
magnitude, length and norm are the same (commenting in case others have the same question in the future)
Why evaluating precisely at that value? (Please more than just "it works".)
-b/2a is the turning point of a parabola.
+FreeziiS On Khan Academy's site there's a comment that explains what Sal is doing.
+Darko Bakula (Nightmare1066): I myself know the answer but it would be an improvement for your video. ;)
@FreeziiS If you know the answer, would you care to explain? I can't find the comment on the Khan Academy site that allegedly explains the choice of t=b/2.
you didn't prove the "only if" part
Hey x=cy . Here c has to be positive or it could be negative also for the |x.y|=||x|| ||y|| to be true
I keep hearing in the last video what’s the last video.
16:26 . You did not prove that statement. You just proved the reverse of that statement.
14 years ago !
un grand salut
toi existe machallah
How I can prove that the Cauchy-Schwarz inequality holds only when the vectors are linearly dependent?
Thanks but how could I ever think of this from scratch on an exam?
If you learn scalar product rule. You remove the cosx and the side that had cosx now becomes bigger or equal to the other side as cosx is only from -1 to 1. This produces the same inequality.
You can quickly test on a piece of paper.
@@oneinabillion654 Prove without invoking triangle inequality or dot product geometric definition, that is, from scratch.
Ahh, I thought she wanted a fast way to know the expression since from scratch its long and troublesome. And I don't think any exam tells u to derive this literally from scratch so...
Anyways, thanks for reminder
@glyn hodges O.O
a second degree polynomial has at most one real solution (is non-negative for all real t) iff its discriminant is
If D < 0 it has 0 real solutions
VERY nice ı anderstand evrysing.
ı am 10,
but ı understand you to.
This vidio is very interasting.
🕳😀😀😀😀😀😀😀
Can't we just prove it by the definition of dot product,
a.b=|a||b|cos(theta), the maximum value of this is when theta=0 which is a.b=|a||b| cos0= |a||b|.
For any other value of theta a.b
Where's the denominator b?? 🤔
I guess it is just necessary for the proof.
all we need is 3-4 steps... talking about efficiency
Idol !!!
Walker Christopher Gonzalez Timothy Lee Ronald
unfortunately you did not prove if and only if...basically
. In a real inner product space where u and v are two vectors, given |u.v|=||u|| ||v||, then u and v are linearly dependent
One little problm I have is pls don't use this display, yellow green highlighter on the black..! 🙏
Clark Brenda Thomas Richard Lee Amy
it's Schwarz. If I'd say it in german the way you wrote it, I'd have a lisp:D
is it me or is there no audio??
It's you.
Man, this stuff is so hard, I don't know how to do my math hw ;_;
Did you finish your homework yet?
güzel ama ingilizce kim bilir ne diyo
türkçe altyazı seçeneği var
Just... draw a triangle?
I like your videos, but I would find them even better if you didn't sound so bored.
too hard -_-
Instead of learning Ruby frameworks or multithreading devices, I'm here learning about this useless (to me) inequality that I will never see again.... College is dogshit sometimes. BRAVO PRACTICALITY!
Anyway, good video. Just the moron mathematicians that haven't worked a day in the software development field making me watch this shouldn't be in charge of course building...
Great video!
Cow she tho... :D You need to work on that pronunciation.
Why do a 16 min video when you can prove this in ONE line?
|x . y| = |cos(x, y) |x| |y|| which is bounded above by |x| |y| and below by 0
DONE
...