I read a whole PDF of many pages from school, and still did not have a clue about this, but in only 7.13 minutes you clarified all of my questions. Thank you very much!
I really started to hate my schools educational system. All it takes is a 2 minutes of the 7 minute video for me the grasp the idea and here I am losing tons of hours staring at meaningless pdfs, shame on absolute zero efforts on teaching. I really appreciate your video! You saved my life and my interest to the statistics by posting this video 11 years ago.
Your videos are so clear and concise. I go back to them over and over to review and absorb statistical concepts. Your channel is the best on youtube for that. Thank you!
Once upon a time, I went to University to learn leverage and Influential points. And I became history then. This time it has perfectly stored in my brain. You are awesome.
awesome video... 2 questions. 1) eliminating such high influential and leverage points can't this lead to overfitting problem ? 2. But if my managment want the estimates for those values (obserations) eliminated , what suggestions can I make ?
Good visual explanation only you forget to mention that influence = leverage * discrepancy. Discrepancy is the distance of the Y value to the line. (Studentized residuals; Cook's distance) Anyway, keep up the good work!
Thanks for the feedback. I didn't forget to mention that -- I intentionally brought it up in only a loose way (describing it with a visual illustration), as at the point I cover this in an introductory statistics course students won't have been introduced to studentized residuals, Cook's distance, or measures of leverage. Formally describing Cook's distance, along with its expression as a function of the studentized residual and leverage, is far beyond the scope of this video. Cheers.
Every text book says that an outlier along X has more influence on the regression line. Hence such a point is called an Influential point. But the following example clearly shows that an outlier along Y has more influence than an outlier along X. Best fit line through (-1,-1), (0,0), (1,1) is y = x. Including an outlier along Y, the best fit line through (-1,-1),(0,0),(1,10) is y = 5.5x + 3. Including an outlier along X, the best fit line through (-1,-1),(0,0),(10,1) is y = .1487x - .446. Am I missing something? Thanks in advance for your help.
There's quite a bit to this, and it's tough to give a complete answer to this in a comment. First, points that are extreme in X have high *leverage*, but are not necessarily influential. If a high leverage point falls very close to the overall pattern, then it will not have much influence. If a high leverage point has a Y value that falls far from the overall pattern, then the point will be very influential. As I describe in the video, influence depends on both leverage and how extreme the Y value is. In the situation you describe, each of the changes result in a point that are extremely influential. If you plot it, you will see that (visually) the magnitude of their influence on the line is similar. However, if we calculate Cook's Distance (a commonly used measure of influence), point 3 has a *much* higher value in the scenario where X is also extreme. (In the original line, none of those points have any influence at all -- the regression line would be the same if any single point were removed.) One reason for this is that the variability of the estimators depends on the variance in X. (Very loosely, if the X's are spread farther apart, with greater variance, than it would be harder to pivot the line.) So, once the much lower variability of the estimators is taken into account, that point is much more influential.
This video is intended to be a light introduction to the topic, one appropriate for an introductory statistics class. The mention of the numeric measures of leverage and influence were intended to be just that, a mention. At some point, I'll try to get videos up explaining that part in greater detail.
I read a whole PDF of many pages from school, and still did not have a clue about this, but in only 7.13 minutes you clarified all of my questions. Thank you very much!
facts
I really started to hate my schools educational system. All it takes is a 2 minutes of the 7 minute video for me the grasp the idea and here I am losing tons of hours staring at meaningless pdfs, shame on absolute zero efforts on teaching. I really appreciate your video! You saved my life and my interest to the statistics by posting this video 11 years ago.
Literally in the same boat as you
just finished watching over 60 odd some videos. This has helped me through your course a huge amount. Great job with the videos and please keep up!
Your videos are so clear and concise. I go back to them over and over to review and absorb statistical concepts. Your channel is the best on youtube for that. Thank you!
indeed
Once upon a time, I went to University to learn leverage and Influential points. And I became history then.
This time it has perfectly stored in my brain.
You are awesome.
better then a top 20 uni lecturer.... ffs....why should I go to the uni
Facts
Excellent mini-course. Thank you for all your insights and hard work!
On point as always. One of my favorite and one of the best statistical instruction sources out there! Thanks a lot!
I'm glad to be of help!
I'm glad to be of help. Thanks for the kind words and best of luck on your exam!
Thank you for helping me understand a concept that was so woolly before. You are a great teacher!
You're very welcome! I'm glad it helped!
Phenomenal instruction. Thanks!
You are very welcome. Thanks for the compliment!
This is a wonderful video. Easy to follow and the topic is very well explained . Thanks
I always follow your videos, and this one was like a phenomenon to me. thank you so much man, indeed. I really appreciate that.
Very clear and intuitive explanation! Thank you!
This is the answer I was looking for! Great video!
+merumomo Thanks!
Very clear explanation of leverage. Thank you :)
Thanks for the demonstration! Leverage is so much easier to understand now.
You are very welcome!
Excelent video! Great explanation.
Thank you, this is very well explained. My stats course has no lectures, only readings, so videos like this are really helpful!
I'm glad I could help.
Really well explained all the videos . I didn not understood most of it in lecture but your videos made very easy to understand.
+Varun m s I'm glad I could help!
Great, I finally get it, thanks!
awesome video... 2 questions. 1) eliminating such high influential and leverage points can't this lead to overfitting problem ? 2. But if my managment want the estimates for those values (obserations) eliminated , what suggestions can I make ?
You have got one new subscriber. Thanks a lot !!!
You are very welcome!
Hat values or Leverage gauge the influence of the observed value of outcome variable over predicted values. What does it mean??
Thank you very much. This is very helpful. The whole playlist is perfect.
You are very welcome. I'm glad you found it helpful!
jbstatistics Do you have videos on multiple regression and building multiple regression models?
hwy133
Not yet. I'm hoping to get to that at some point in the future. Cheers.
Thank you so much. Respect once and for all
easy way to explained..appreciated
highly informative. thanks a lot
You are very welcome.
Can I use leverage in X multiplied by leverage in Y to make a useful measure of Influence? Seems like a reasonable thing to do, no?
thank you excellent nexplanation
You really helped me out here. thank you!
Wowww... you are great
I'm glad to be of help QN00!
Excellent! Thank you!
You are very welcome!
Good visual explanation only you forget to mention that influence = leverage * discrepancy. Discrepancy is the distance of the Y value to the line. (Studentized residuals; Cook's distance) Anyway, keep up the good work!
Thanks for the feedback. I didn't forget to mention that -- I intentionally brought it up in only a loose way (describing it with a visual illustration), as at the point I cover this in an introductory statistics course students won't have been introduced to studentized residuals, Cook's distance, or measures of leverage. Formally describing Cook's distance, along with its expression as a function of the studentized residual and leverage, is far beyond the scope of this video. Cheers.
thanks , great video
Saved my life.
Every text book says that an outlier along X has more influence on the regression line. Hence such a point is called an Influential point. But the following example clearly shows that an outlier along Y has more influence than an outlier along X. Best fit line through (-1,-1), (0,0), (1,1) is y = x. Including an outlier along Y, the best fit line through (-1,-1),(0,0),(1,10) is y = 5.5x + 3. Including an outlier along X, the best fit line through (-1,-1),(0,0),(10,1) is y = .1487x - .446. Am I missing something? Thanks in advance for your help.
There's quite a bit to this, and it's tough to give a complete answer to this in a comment. First, points that are extreme in X have high *leverage*, but are not necessarily influential. If a high leverage point falls very close to the overall pattern, then it will not have much influence. If a high leverage point has a Y value that falls far from the overall pattern, then the point will be very influential. As I describe in the video, influence depends on both leverage and how extreme the Y value is.
In the situation you describe, each of the changes result in a point that are extremely influential. If you plot it, you will see that (visually) the magnitude of their influence on the line is similar. However, if we calculate Cook's Distance (a commonly used measure of influence), point 3 has a *much* higher value in the scenario where X is also extreme. (In the original line, none of those points have any influence at all -- the regression line would be the same if any single point were removed.) One reason for this is that the variability of the estimators depends on the variance in X. (Very loosely, if the X's are spread farther apart, with greater variance, than it would be harder to pivot the line.) So, once the much lower variability of the estimators is taken into account, that point is much more influential.
@@jbstatistics Thank you very much for your response. Can you please make a video on this question? Then the confusion will be totally cleared.
@jbstatistics Such a great instructor you are when it comes to statistics !!! Well done man !!! and keep on sharing these amazing videos !!!
Thanks for the compliment!
clear and awesome! thanks!
You are welcome!
so useful,thanks a lot!
You are very welcome Lillian!
Thank you.
Thank you so much!
You are very welcome!
Thank you
You are very welcome!
jbstatsistics, bless your soul. Your videos are saving my ass in my stats class!!
Thanks for making these great videos!
You are very welcome!
Your voice is cool
Thanks!
Brillliant!
Thank you so much. It is really helpful to me :D
You are very welcome.
Thanks it really helps!!
thank you!
Very interesting
better than gujarati
You've saved my fucking life
Thank you ))))):
perfect
Thanks!
why go to uni when youtube is better?
you don't explain the formalue... apart from that it's good
This video is intended to be a light introduction to the topic, one appropriate for an introductory statistics class. The mention of the numeric measures of leverage and influence were intended to be just that, a mention. At some point, I'll try to get videos up explaining that part in greater detail.