I have studied Linear Algebra. Went though Gilbert Strang's book(s). This is the first time someone addressed intuition and related the matrices to real world meaning. Congrats, now I am glad to know what Linear algebra means to me and what it should mean to everyone else. You are😇 a rare and gifted lecturer with deep insight.
Awesome video, but I think what makes you great is the simple examples you share in your videos. I was able to follow everything, but it would be even better if you showed us how to take the apple/banana example and create the closed form of the covariance matrix here. Just my personal feedback. Other than that, it was wonderful!
Hi Ritvik, I love your videos, but this one took me a while to understand. I actually had to figure out for myself how this formula makes sense. My suggestion: Start with the better known matrix formula for Covariance X'X and from there derive the closed form. Define X_k as [X_ik; X_jk] at the start of the derivation.
Hi Ritvik, could you post a link to a more formal or detailed version of the proof of the closed form of covariance matrix formula? Theres several components that are unclear to me: 1. The x_bar_i on the LHS is the i-th component of average across vectors whereas the x_bar_i on the right is the average within the i-th vector. These are two different quantities. 2. When you want the i-th element on the LHS at 6:20, you subscript to get your "i-th element" using the column, instead of the row. I'd imagine that if youre trying to get the i-th element of the k-th column vector that it should be X_{i,k} instead of X_{k,i}
Covariance matrix has the shape of (features, features). I think we should put the (Xi - mu).T before (Xi-mu). That means the matrix (features, samples) @ matrix (samples, features) = matrix (features, features) And also, we need to divide by (samples - 1) instead of (samples) to avoid underestimating variance
this is very confusing because first it mentioned x_i is a dx1 matrix but later in the calculation we have x_ki, x_kj which is not clear what i and j is. i, j are living ing the R^d which are the components of kth observation, and k is in R^N. So the notation should include i,j to avoid confusion like this. the product on the second line was for S_ij and it was erased later. (i was very confused and it took me 30 mins to figure out why i was confused)
Love those videos! Definitely my favorite source of intuitive explanations for Data Science and Statistics (together with Josh from StatQuest) However, I am slightly confused why we divide by N here. Aren't we actually computing the sample covariance here? That would mean that we have to divide by N-1, right? As far as I can tell, the wiki page for sample covariance uses almost the same notation, while dividing by N - 1. en.wikipedia.org/wiki/Sample_mean_and_covariance#:~:text=In%20terms%20of%20the%20observation%20vectors%2C%20the%20sample%20covariance%20is
I have studied Linear Algebra. Went though Gilbert Strang's book(s). This is the first time someone addressed intuition and related the matrices to real world meaning. Congrats, now I am glad to know what Linear algebra means to me and what it should mean to everyone else. You are😇 a rare and gifted lecturer with deep insight.
I also almost completed his Linear Algebra Course, but still confused.
The lack of views on your videos surprises me. Ritvik, thank you for doing this.
You take away the anxiety of starting fresh.
Awesome video, but I think what makes you great is the simple examples you share in your videos. I was able to follow everything, but it would be even better if you showed us how to take the apple/banana example and create the closed form of the covariance matrix here. Just my personal feedback. Other than that, it was wonderful!
Really loving your videos! Impressive how you boil it down to the most essential stuff and giving context too
Hi Ritvik, I love your videos, but this one took me a while to understand. I actually had to figure out for myself how this formula makes sense. My suggestion: Start with the better known matrix formula for Covariance X'X and from there derive the closed form. Define X_k as [X_ik; X_jk] at the start of the derivation.
Hey I appreciate the valuable feedback! I'll keep that in mind for future videos.
Hi Ritvik, could you post a link to a more formal or detailed version of the proof of the closed form of covariance matrix formula?
Theres several components that are unclear to me:
1. The x_bar_i on the LHS is the i-th component of average across vectors whereas the x_bar_i on the right is the average within the i-th vector. These are two different quantities.
2. When you want the i-th element on the LHS at 6:20, you subscript to get your "i-th element" using the column, instead of the row. I'd imagine that if youre trying to get the i-th element of the k-th column vector that it should be X_{i,k} instead of X_{k,i}
6:29 Should the transpose be there?
Good point! It should not be there since that's just a number.
really amazing what you do thank you for the help
Covariance matrix has the shape of (features, features). I think we should put the (Xi - mu).T before (Xi-mu). That means the matrix (features, samples) @ matrix (samples, features) = matrix (features, features)
And also, we need to divide by (samples - 1) instead of (samples) to avoid underestimating variance
Really nice, I like your new video style!
Thank you!
Why do we not use 1/(N - 1) instead of 1/N to account for sample bias?
this is very confusing because first it mentioned x_i is a dx1 matrix but later in the calculation we have x_ki, x_kj which is not clear what i and j is. i, j are living ing the R^d which are the components of kth observation, and k is in R^N. So the notation should include i,j to avoid confusion like this. the product on the second line was for S_ij and it was erased later. (i was very confused and it took me 30 mins to figure out why i was confused)
solve examples for us to understand better
Hello sir , request if you can help me with pattern recognition and machine learning basics .
And this is simplest ans I have got for a covarance matrix till now 🙏
Love those videos!
Definitely my favorite source of intuitive explanations for Data Science and Statistics (together with Josh from StatQuest)
However, I am slightly confused why we divide by N here. Aren't we actually computing the sample covariance here? That would mean that we have to divide by N-1, right?
As far as I can tell, the wiki page for sample covariance uses almost the same notation, while dividing by N - 1.
en.wikipedia.org/wiki/Sample_mean_and_covariance#:~:text=In%20terms%20of%20the%20observation%20vectors%2C%20the%20sample%20covariance%20is
those can be both acceptable, but I suggest divide by N - 1 to avoid underestimating variance