When I remember that var(x) is the same as cov(x, x), the formulas in the covariance matrix seem more consistent and make more sense to me. In other words, the whole matrix can also be defined in terms of covariances alone.
Great video, very intuitive Thanks a lot. At 12:25 in the variance formula, we divide by the sum of all weights and not the sum of weights squared and it is the same for the covariance, right?
thank you for this, there were so many hidden tidbits of knowledge in this. thank you for making these and appreciate the attention in explaining small details.
Extremely helpful and easy to understand as someone new to this topic. Thank you for your work and actually showing examples with numbers for how each part in the covariance matrix was calculated.
Thank you for this great video. A bit inconsistency between the correction 1/3 (under ur comments) and the formula alpha^2 (at 12:24). I think the formula is correct and in the concrete example 1/3 should changed to be 1/9.
Thanks! Luis for the awesome video about covariance. I would like to ask at 10:56 , shouldn't we divide by 1+1+1+(1/3)^2? same happen to finding covariance at 11:37. we should divide by 1+1+1+(1/3)^2, shouldn't we? please correct me if i was wrong.
Hi Luis, fantastic and clear video but quick question, at 11:01 are we sure we need to square the 1/3 weight? I don't follow the intuition of this as I'd assume having a single point weighted 1/3 would be equivalent to a situation where that point has a weight of 1 and the remaining three points have weights of 3 (said another way, imagine three of the points [(-0.4, 0.8), (1.6, 0.8), (-0.4, -1.2)] being duplicates three times and stacked upon one another). If this were the case we'd calculate the variance of 10 points of equal weights [-0.4, -0.4, -0.4, 1.6, 1.6, 1.6, -2.4, -0.4, -0.4, -0.4] with a value of 1.44 Thank you in advance for any thoughts on this question and for making such a great video! I just pre-order your upcoming ML book this morning and subscribed to your channel.
thanks for the explanation! I'm confused about the portion or weighted points, what does it mean when a data point has a fraction? Besides, at 8:58 isn't the covariance divided by (n-1)?
This is a great visualization and a perspective that should everyone need to know. To see what is the magic behind the scene, visualization is best way as always...
Hi Luis, this is an excellent video, and I thank you for making it. At 3:42 you say "Why square", then your answer is so the negative numbers do not cancel out the positive numbers. By that logic, then why not use absolute values and achieve the same. Turns out that the technical explanation for using squares seems hard to come by, and is not at all obvious. Perhaps you can do some digging and do another video on this? I found the discussions at Cross Validated forum to be helpful.
Thank you Luis for always putting sense before the equations. May I suggest a related topic to cover: Gaussian Process? So far the only TH-cam video that I found intuitive is "Vincent Warmerdam: Gaussian Progress | PyData Berlin 2019". Even after watching several times, I feel that I'm still missing something fundamental, like how the conditioning on data works, or how to predict using multivariate features. A walkthrough of a real problem solved with Gaussian Process will be really helpful!
Good video! Does the highly correlated future in our dataset produces less ERROR when regresson is run? How to determine if a particular feature in our dataset should be considered or not??
XClent. I suppose in a two dimensional space is easier to explain variance than 4 dimensions. Is the variance of every point in space, time included? What if the space is expanding anisotropically (lets say due to gravity wave), could the variance have an extra term?.
its nice when somebody explain this, who knows what he is talking about... very nice and thank. and for all the profesors and teachers out there who cant explain this like this dude.... quit your job 😑
If the data in the examples in the beginning weren't centered at 0, what would multiplying just the coordinates (without subtracting the mean) represent?
Luis can you tell us whats your approach when learning a new thing or concept like where do you head to first? is it google, youtube , or some text books
Hola Luis!! your videos are very nice to get an intuitive approach of what I am doing. I have seen your videos about Covariance and PCA few times now. Any plans for iterative closest point relating two sets of points? Muchas gracias pelo trabajo!
I hope if you can make new lessons about generative modeling algorithms. It promosing for many fields. I have check your GAN lesson, but looking for more from amazing teacher.
Thank you for the video. I have heard of the covariance matrix from the context of processing velocity-map imaging. I am finding it a bit hard to tie the information you have shared back to this context (how does constructing a covariance matrix help construct a covariance-map images?) Wonder if anyone can help me make the connection :)
thanks. I am viewing your unlisted video! and I got the discount for the book, reading it in my kindle before sleep is a blessing. I have a little query: 1.what is meant the by half/3-4th of a point? the size of the point is virtual, which I confused at first time. now I get it: it is a portion of the vector associated with each point. now the classic question arrrises: how many clusters? how do we know? 2. It is similar to kmeans, where we pick k-cluster centres and here pick "k" gaussian distribution. Can you please compare different clusterings in one video and what is the limitations, change, or similarity between them and how to overcome the limitations, or in what situation which method is to be used in real world data?
Thanks Sourav, great questions! 1. In some algorithms like Gaussian mixture models, you need only a fraction of the point there. I imagine it as, if every point weighted 1 kg, then points are allowed to weight any fraction of that weight. As for how many clusters, there are many heuristics such as the elbow method that work, you can find it here: th-cam.com/video/QXOkPvFM6NU/w-d-xo.html 2. Yes, very similar to K-means. In k-means, we only update the mean, but in GMM we update mean, variances, and covariances. Also in k-means we have hard assignments (a point can belong to only one cluster), but in GMM (th-cam.com/video/q71Niz856KE/w-d-xo.html) we have soft assignments (this is why points can be split into several clusters, which goes back to question 1). In real life both are used, but there are times, for example in sound classification (telling voice apart from music and noise, etc), that the clusters really intersect, and you need a soft clustering algorithm to do a better job. Hope that helps! Let me know if there is anything else that needs clarification. :)
Good visuals, great teaching, best quality. Thank you! Your channel and StatQuest have been a huge help through understanding all this math. In this video, an extra example with a 3x3 or 4x4 covariance matrix would have been awesome, but I understand you might not have gone into it to simplify things (since 3D/4D)
I think in the second part of the video the calculation of the var(x) var(y) covar(x,y) is not done correctly..It should not be divided by 4 instead it should be divided by summation of square of weights..
When I remember that var(x) is the same as cov(x, x), the formulas in the covariance matrix seem more consistent and make more sense to me. In other words, the whole matrix can also be defined in terms of covariances alone.
seriously one of the best and most intuitive channels on this subject. I can show your videos to my child and he will understand
Great video, very intuitive Thanks a lot.
At 12:25 in the variance formula, we divide by the sum of all weights and not the sum of weights squared and it is the same for the covariance, right?
Just want to leave a comment so that more people could learn from your amazing videos! Many thanks for the wonderful and fun creation!!!
thank you for this, there were so many hidden tidbits of knowledge in this. thank you for making these and appreciate the attention in explaining small details.
Extremely helpful and easy to understand as someone new to this topic. Thank you for your work and actually showing examples with numbers for how each part in the covariance matrix was calculated.
You know the way how people understand. Keep posting videos. These are much elaborated.
After watching many videos on the subject, this one finally helped me understand. Thank you
Thank you for this great video. A bit inconsistency between the correction 1/3 (under ur comments) and the formula alpha^2 (at 12:24). I think the formula is correct and in the concrete example 1/3 should changed to be 1/9.
While I know covariance matrix, It is always interesting to learn concepts from your perspective.
at 10:56, shouldn't it be divided by 10/3 instead of 4 as we've 3 and one third data points?
Yikes, you’re right!!!! Thank you!
I’ll add a comment
@@SerranoAcademy Thank you. Excellent video BTW
Thanks! Luis for the awesome video about covariance. I would like to ask at 10:56 , shouldn't we divide by 1+1+1+(1/3)^2? same happen to finding covariance at 11:37. we should divide by 1+1+1+(1/3)^2, shouldn't we? please correct me if i was wrong.
Yes, You are right. he has added a comment for 10:56 and forgot about 11:37.
Hi, in 11:23, why are you not subtracting the values by mean before squaring them? the formulae you showed earlier shows subtracting by mean
In my opinion it would be useful to see connection between the covariance matrix and matrix transformations. Could you make a video on that please?
Fantastic video. Made the covariance very intuitive. Thank you!
best explanation of covariance on youtube
this video deserves more views. Incredible work, thank you.
Hi Luis, fantastic and clear video but quick question, at 11:01 are we sure we need to square the 1/3 weight?
I don't follow the intuition of this as I'd assume having a single point weighted 1/3 would be equivalent to a situation where that point has a weight of 1 and the remaining three points have weights of 3 (said another way, imagine three of the points [(-0.4, 0.8), (1.6, 0.8), (-0.4, -1.2)] being duplicates three times and stacked upon one another). If this were the case we'd calculate the variance of 10 points of equal weights [-0.4, -0.4, -0.4, 1.6, 1.6, 1.6, -2.4, -0.4, -0.4, -0.4] with a value of 1.44
Thank you in advance for any thoughts on this question and for making such a great video! I just pre-order your upcoming ML book this morning and subscribed to your channel.
¡Gracias Luis!
thanks for the explanation! I'm confused about the portion or weighted points, what does it mean when a data point has a fraction? Besides, at 8:58 isn't the covariance divided by (n-1)?
Yeah i also think variance an covariance must be summation divided by (n-1)
This is a great visualization and a perspective that should everyone need to know. To see what is the magic behind the scene, visualization is best way as always...
Very clearly explained, well done and many thanks!
Hi Luis, this is an excellent video, and I thank you for making it. At 3:42 you say "Why square", then your answer is so the negative numbers do not cancel out the positive numbers. By that logic, then why not use absolute values and achieve the same. Turns out that the technical explanation for using squares seems hard to come by, and is not at all obvious. Perhaps you can do some digging and do another video on this? I found the discussions at Cross Validated forum to be helpful.
This guy has a gift to make the tough look easy!
At 12:30, If the weighted covariance is divided by sum of ai squares, shouldn't it be (1/3)^2+1+1+1 = 28/9? Is it a typo?
Thx a lot sir. I am new in the BI Worl and have been strugle to understand those notions. Problem resolved today ! Thx a lot again.
All your videos are fun to watch. Please continue making such high-quality content videos...👏
absolutely brilliantly explained. thank you
Thank you Luis for always putting sense before the equations. May I suggest a related topic to cover: Gaussian Process? So far the only TH-cam video that I found intuitive is "Vincent Warmerdam: Gaussian Progress | PyData Berlin 2019". Even after watching several times, I feel that I'm still missing something fundamental, like how the conditioning on data works, or how to predict using multivariate features. A walkthrough of a real problem solved with Gaussian Process will be really helpful!
The x value of the upper-right point in 11:04 turned magically from 1.6 to 2.6 in the next slide... isn't it a typo?
Thanks a lot ! ...was stuck at a concept in a research paper..resolved many doubts
thanks dude, couldn't understand any explanation of all that before i found your video
Very clearly explained!! Thanks
Good video! Does the highly correlated future in our dataset produces less ERROR when regresson is run? How to determine if a particular feature in our dataset should be considered or not??
Amazing video!! Never felt like I understood as well I do now - favorited
Thank you very much! ❤ Easily explained and everything included! :) 🎉
At 1:26 you say variance in the y direction that goes in
the diagonal
is it correct ?
thank you! got a new view on the variance and covariance
Hello sir, I don't understand why in the weighted covariance formula the weights are squared? also in the numerator.
Very good explanation
you are simply awesome
Great ♥ but can anyone explain why we divided by (sum of a^2) instead of (sum of a)... when we calculate the variance of weighted points ?!
thanks for your explanations, it's very helpful for me!
XClent. I suppose in a two dimensional space is easier to explain variance than 4 dimensions. Is the variance of every point in space, time included? What if the space is expanding anisotropically (lets say due to gravity wave), could the variance have an extra term?.
Mr Serrano have you considered opening a patreon or something like that?
its nice when somebody explain this, who knows what he is talking about... very nice and thank.
and for all the profesors and teachers out there who cant explain this like this dude.... quit your job 😑
Thanks so much for doing this dude
Shouldn't the weight in covariance calculation be 1/3 instead of (1/3)^2? Also, the coefficient shouldn't be 1/4 anymore in the weighted case
If the data in the examples in the beginning weren't centered at 0, what would multiplying just the coordinates (without subtracting the mean) represent?
Thanks, very neat and clean
Luis can you tell us whats your approach when learning a new thing or concept like where do you head to first? is it google, youtube , or some text books
This is fantastic, Thank you!!
This video was very useful for me
Does variance x = variance y implies covariance of 0?
Thanks Luis! Could you explain the reason behind shifting center of mass to the origin?
Thank you very much , this help me so much 👏👍
Why is the denominator in the example when alpha is not 1 "4" when sigma(alpha^2) should be (0.33^2 + 1^2 + 1^2 + 1^2)
Thank you, what a good explanation
at 4:10, shouldn't it be 8/3 instead of 8/4?
oh wait they have different formulas for population (all) and sample (part of all)
Hola Luis!! your videos are very nice to get an intuitive approach of what I am doing. I have seen your videos about Covariance and PCA few times now. Any plans for iterative closest point relating two sets of points? Muchas gracias pelo trabajo!
Thanks a lot, that's the best explanation i found, bonne continuation 👍
I hope if you can make new lessons about generative modeling algorithms. It promosing for many fields. I have check your GAN lesson, but looking for more from amazing teacher.
"we should divide by 1+1+1+1/3, which is 10/3".
Is it not squared as well? 1/(10/3)^2
Awesome video ! Thank you for making it.
Thank you very much for this video
Thank you!
Pretty good Man
Awesome ! Thank you so much. This is a very lucid explanation.
Thank you for the video. I have heard of the covariance matrix from the context of processing velocity-map imaging. I am finding it a bit hard to tie the information you have shared back to this context (how does constructing a covariance matrix help construct a covariance-map images?) Wonder if anyone can help me make the connection :)
thanks. I am viewing your unlisted video! and I got the discount for the book, reading it in my kindle before sleep is a blessing. I have a little query:
1.what is meant the by half/3-4th of a point? the size of the point is virtual, which I confused at first time. now I get it: it is a portion of the vector associated with each point. now the classic question arrrises: how many clusters? how do we know?
2. It is similar to kmeans, where we pick k-cluster centres and here pick "k" gaussian distribution. Can you please compare different clusterings in one video and what is the limitations, change, or similarity between them and how to overcome the limitations, or in what situation which method is to be used in real world data?
Thanks Sourav, great questions!
1. In some algorithms like Gaussian mixture models, you need only a fraction of the point there. I imagine it as, if every point weighted 1 kg, then points are allowed to weight any fraction of that weight. As for how many clusters, there are many heuristics such as the elbow method that work, you can find it here: th-cam.com/video/QXOkPvFM6NU/w-d-xo.html
2. Yes, very similar to K-means. In k-means, we only update the mean, but in GMM we update mean, variances, and covariances. Also in k-means we have hard assignments (a point can belong to only one cluster), but in GMM (th-cam.com/video/q71Niz856KE/w-d-xo.html) we have soft assignments (this is why points can be split into several clusters, which goes back to question 1). In real life both are used, but there are times, for example in sound classification (telling voice apart from music and noise, etc), that the clusters really intersect, and you need a soft clustering algorithm to do a better job.
Hope that helps! Let me know if there is anything else that needs clarification. :)
please inform us why we make it and why we make it the way it is.
Thanks for this video, it's just perfect!
What type of software do you use to make this animation?
I use Keynote for the animations and iMovie for editing
6:35 9:06
Good visuals, great teaching, best quality. Thank you! Your channel and StatQuest have been a huge help through understanding all this math. In this video, an extra example with a 3x3 or 4x4 covariance matrix would have been awesome, but I understand you might not have gone into it to simplify things (since 3D/4D)
thank you
logic math is interesting when focusing on abstract viewpoints
I think in the second part of the video the calculation of the var(x) var(y) covar(x,y) is not done correctly..It should not be divided by 4 instead it should be divided by summation of square of weights..
You are Dope man, simply awesome
Awesome.
Curious why covariance could be related to information
so cool
Background music is annoying
🙏🙏🙏
Too many adverts. :-(
Thank you