Correction: 2:49 I left off some of the parentheses for the equation for F. The numerator should be: (SS(mean) - SS(fit))/(pfit - pmean) Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
First, you are an awesome, caring and friendly teacher and your way of teaching is very effective! Love your songs too! Second, would you consider covering historical uses of multiple linear regression, statistics and linear algebra? For instance, I read in my linear algebra textbook about how a russian mathematician used linear algebra to try to help the soviets logistical war effort during WW2 and also how an American economist used linear algebra during the late 1940s. These are just two examples from linear algebra but I would also hope there are interesting historical examples using multiple linear regression and statistics too. With your teaching style it would be very informative and inspire more people to get into math at a deeper level. Thanks either way, your efforts are much appreciated! :)
most of the instructors of resources would always divide simple linear regression and multiple linear regression, not gonna lie that just makes everything more complicated!! so super appreciated how you put all of these concepts in the simplest way to digest, and they are really, not a big deal LOL
This visualisation of multiple regression with the green 3d line that you are using is really nicely done, these kind of visual aides really help us newbie students with the intuition of these concepts the first time they see them. I am using the Woolridge textbook on econometrics and this video is better done in about a fifth the amount of time. Just thought you would like to know that your videos are greatly appreciated :)
Oh my god !!!!!!!!!!!!!!!! Statistics seems too easy now after watching your videos 😲😲 I searched whole internet for such videos that explain concepts in simple and fuN wAy ...... FInally TH-cam Algorithm after seeing my struggle 🤣Suggested me your video Thanks a lot Now I'll see the whole Playlist of statquest 🤪😎😎😎🤪 now i only require this part for my exams !!! 😁 Once Again THank YOu 🙌🔥🙌🙌🙌🙌😝
This is great. Can you expand on this by explaining why and how Multiple Regression, Hierarchical, and Step-wise Regression are different? I more or less know when to use one over the other, but your videos are really helping visualize and understand these principles.
At around 4:20 of the video, you introduce this new F equation where we eliminate SS(mean) from the equation to compare the simple and multiple regressions directly to each other. At 4:50 of the video, you mention the difference in R^2 values between the simple and multiple regressions. The equation for R^2 = (SS(mean)-SS(fit))/SS(mean). Do we also have a new R^2 equation when comparing the simple and multiple regressions directly to each other? Or are we calculating 2 separate R^2 values and comparing them to each other? Thank you!
At 4:50, there's the statement on analysing difference in R-sq, and the p-value to determine if adding a new feature is worth the trouble. I'm wondering how can this interpretation be done properly? I'm thinking if there are 3 predictors x1,x2,x3, does order of adding them affects the R-sq and p-value? Like if added in order of x1,x2,x3 vs x1,x3 vs x2,x3, these are 3 different scenarios where x3 is added, which should give 3 different incremental R-sq and p-values? How do we interpret properly whether x3 is helpful to the predicting y if the state of currently included predictors affects the results from adding a new predictor? Or more generally, why do people use stepwise regression for feature selection when it is only locally optimal and different order of including variables affect the results of interest?
One possible solution it to use Lasso Regression or Elastic Net Regression to select which variables go in the final model. For details, see Ridge: th-cam.com/video/Q81RR3yKn30/w-d-xo.html Lasso: th-cam.com/video/NGf0voTMlcs/w-d-xo.html Ridge vs Lasso: th-cam.com/video/Xm2C_gTAl8c/w-d-xo.html and Elastic-Net: th-cam.com/video/1dKRdX9bfIo/w-d-xo.html
Hi Josh! Is there a theoretical possibility for the "multiple regression" model to be worse than "simple" model, i.e. adjusted "multiple" R2 less than "simple" R2 ? Or "multiple model" p-value is greater than "simple" p-value?
Hi Josh, At the 4:52 mark, when you say that it is worth the trouble of including the extra parameter into your model, wasn't that already done when I created the multi-regression best-fit equation? Seems kind of a moot point, wouldn't you agree? All it is saying is that: 'replace your old simple regression model with this new multi-regression model'.
However, you should only do that if the increase in R^2 is large enough. For example, if we have a big fancy model that has a bunch of variables, but gathering the data is very expensive compared to a simple model, then, if the simple model works almost as well as the fancy model, we might opt to just continue using the simple model (and no longer bother spending all of the money to continue to collect the extra data).
Yes, "n" is the number of observations in your dataset. And if you look at the equation, if 'n' is very large, meaning you have a lot of data, then the denominator will be very small, resulting in a large value for "F". This large value for "F", in turn, will correspond to a relatively small p-value (the larger the value for F, the smaller the corresponding p-value). Thus, the more data you have, the smaller the p-value and the more confidence you have in the "fit" being significant and not just the result of random chance. Does that make sense?
@Sam Dillard Among other things, this compensates for the fact that adding more variables (and parameters) to a model will improve the fit, even if those variables are not very helpful. In other words, if we add a lot of random variables to a model, some of them, by random chance, will correlate with the dependent variable (the thing we are predicting) and improve the fit. Thus, one of the reasons we subtract p-fit from 'n' is to compensate for this.
@Sam Dillard You are correctly on base. If 'n' is huge relative to p-fit, then it will not make a big difference - and this is desired because it means we have tons of data relative to the number of parameters and thus, the data will dominate the result (and the best way to avoid overfitting your model is to have tons of data). However, for smaller 'n' ('n' closer to p-fit), then the we don't really have tons of data anymore and we need to worry about over fitting, and that's when subtracting p-fit from 'n' will make a difference.
Up there with khan acadamy and 3blue one brown. Best machine learning and stats videos on youtube. Would love if there were exam style question and answer examples, like in khan acadamy, but I still seriously good. thank you very much for your videos
How do you formally present the results of multiple regression model analysis on reports? I have read that you report, the R2 value, the coefficients and the anova results on the model but can you elaborate or give an example on a model with both numeric and factor independent variabes?
The answer to this varies, depending on the field. Some want more data than others. I would report the R^2, coefficients and their p-values. For details on how to do this, see: th-cam.com/video/hokALdIst8k/w-d-xo.html
I'm not sure I ever thought too much about comparing a simpler model to a more complex one like that! By the way, how are things going in Chapel Hill these days?
Hi Josh, Thank you for all of your videos so far. It's truly a great help. I would like to know if you have made any videos for multiple regression with CATEGORICAL independent variables? I have some confusion of how to interpret the coefficients of such regression. Or maybe I just give you a simple example and you may give me a brief explanation. Thank you very much again. Quynh.
The next video in this series, Part 2, shows multiple regression with a categorical independent variable. A specific case of this is ANOVA, and that's what I demonstrate: th-cam.com/video/NF5_btOaCig/w-d-xo.html Once you understand that, you should check out the Part 3, Design Matrices: th-cam.com/video/CqLGvwi-5Pc/w-d-xo.html , and Part 4, Design Matrix examples in R: th-cam.com/video/Hrr2anyK_5s/w-d-xo.html
I am really sorry! I don't know what is going on. I've contacted TH-cam and have not heard anything back. This is breaking my heart because I never wanted this to happen, but somehow it is. I am sorry and doing everything I can to fix this.
How do we compare the Simple Regression fit to the Multiple Regression fit (F-statistic) using software ? Comparison to the mean is done automatically by say Excel or R but how would we do this comparison for Simple Regression v/s Multiple Regression ?
The next two videos in the series may answer your questions about dummy variables: th-cam.com/video/NF5_btOaCig/w-d-xo.html and th-cam.com/video/CqLGvwi-5Pc/w-d-xo.html
In my mind, Polynomial Regression is just a special case of Multiple Regression. For Polynomial Regression we square or cube or whatever the values associated independent variables prior to putting them into the Multiple Regression model.
Dear Josh, Thank you very much for your video, which benefited me a lot. Could you do a tutorial on polynomial regression? If multiple linear regression speaks of direct effects from different explanatory variables to explain SSM, can polynomial regression be understood as considering not only direct effects but also cross-effects? Looking forward to your reply
Hello Josh, I have a doubt about hypothesis testing. Since we have to comment on the population on the basis of sample only, we can only reject Ho if we get significant difference. But if we don't get so, it doesn't mean that there is no significant difference in the variables of the population under consideration. Then, why do people (including books) say we accept Ho when difference is not found in the sample???
Dr. Starmer, is there a term for the F value between simple and multiple regressions (the thing you said helps one determine whether collecting data for additional variables is meaningful) that distinguishes it from just plain F between linear or multiple regressions and the mean regression? I determined it for my data and would like to know how to refer to it.
For F_value of multiple regression in numerator we have SS(simple) and the thing that got me thinking is for what feature it is suppose to be calculated? The path taken in the video is clear, we have first collected points with weight and length and then we are considering adding new feature to our model. But what if i got the data from someone with two features (weight and tail length) and one target (length), how to calculate F_value now, as i am able to compute two values of SS(simple), one for pair (weight, length) and one for pair (tail length, length), should i calculate both F_values? what if i have N features and one target? Now i can calculate N SS(simple) values and thus, N F_values. And what exactly "simple" means when I have for example N features and my linear model have N+1 (intercept) parameters, then "simple" means with two parameters or with (N+1) - 1?
In theory, simple can refer to any "simpler" model, or any model that contains a subset of the features found in the full model. So it really depends on what you are interested in testing. You don't have to do every single test possible, just the ones you are interested in. The standard tests are... 1) Compare the fancy model to only using the mean y-axis value to make predictions 2) comparing the fancy model to all the models that are missing one of the features. For details, see: th-cam.com/video/hokALdIst8k/w-d-xo.html
Can you kindly clarify that, by SS(Simple) you are referring to the SS(Fit) of the simple model, and SS(multiple) is the SS(Fit) of the multiple-regression model?
This could just be a pilot study, done before a larger, more expensive study. In this case, the results from the pilot study could inform how we carry out the larger study.
Hi josh, one request from me as a follower of yours........... I searched sooo many videos to get clarification about "Multiple Regression" with "Categorical data as independent variable"(like gender) but not satisfied with their explanation, either they explain using R or their data not contains 'categorical data'. Could u come up with one video? that should explain the concept with numerical form(like your most of the videos are ) but not theory. Actually i also watched your "Anova test" video to get clarification but further u hv not shown how to implemented on regression. Can u come up with one video from taking data of different categories to implementation in regression? please🙏
I mention them in this video: th-cam.com/video/Vf7oJ6z2LCc/w-d-xo.html and in most of my machine learning videos. Tree based methods, support vector machines and Neural Networks are all non-linear. Here's a list of all of my videos: statquest.org/video-index/
Hi Josh! What do you call the process of comparing the R^2 of the simple to multi? And how is it different than simply comparing cross validated R^2 of simple vs multi?
But that means we still need to obtain some data of Tail length beforehand in order to calculate the SS(multiple). So will the following be more precise: To know whether it is worth spending *much* time or paying *more* effort to find *more* data?
Just you talking about R2, I was wondering also, what about AIC (Akaike information criterion) and BIC (Bayesian information criterion) ? are there related to?
Hey Josh ! thanks for one more clearly explained concept ! Just one question: maybe it's too obvious but I don't really get how comparing multiple to the simple regression would help me avoid collecting data for the extra variable if I have already collected it for testing the model. Or is it possible that we measure some number of samples for tail length for rough estimation and decide if it's worth further doing so for a bigger sample size?
We can use the full dataset for the original models and testing of models. Once we decide which model and variables we want to use, future predictions can be made by only collecting the limited set of variables that we need.
How would you find the SS(mean) value when there's multiple variables?Are we supposed to add the SS(mean) value for each independent variable with each other?
In 2-D, the line y=mean(y-axis value) is a horizontal line in 2-d at the mean of the y-axis values and we do not need to specify the x-axis values. Likewise, we can specify a plane (or hyper-plane) with y=mean(y-axis value) without specifying the other variables.
Suppose the relationship between three variables Y, X1, and X2 is estimated using the multiple linear regression as Y = 2.2 X1 + 3.1 X2 + 4.1. Given this information, which of the following statements is true? Answer choices Select only one option For every unit change in X1, Y changes by 2.2 units For every unit change in X2, Y changes by 3.1 units keeping X1 constant For every unit change in X1, Y changes by 3.1 units For every unit change in X2, Y changes by 3.1 units
I know you covered the formula for F already in the linear regression video, but would you mind doing one more just on this formula? Sadly its still not intuitive for me :/ PS: Thanks for all your videos! They are a great help!
Hello Josh, thank you for the wonderful video series. I was wondering if formula for F for simple vs Multiple is correct. I believe the denominator should have SS(Simple) instead. Please confirm.
There are two ways to solve this problem. There is actually an analytical solution, meaning there's an equation that you can plug your data in and out comes a solution. However, I prefer using Gradient Descent, because it can be used in a much wider variety of situations. To learn more about Gradient Descent, see: th-cam.com/video/sDv4f4s2SB8/w-d-xo.html
If the difference in R2 values between simple and multiple regression is large and p value is small, why does adding tail length is worth the trouble ? Sorry I didn't understand this part clearly.
stupid jokes and tone-deaf singing. I'm sure your girlfriend is cringing each time you start singing, or for that matter, start telling jokes 😞. Or if you don't have GF you might start looking for improvement in these areas. but great explanations.
This video DID NOT clearly explain multiple regression. You keep referring to another video that I didn't see so you lost me when you kept referring to it. Next time, just come up with something else. Keep it simple. You need to learn pedagogy because I'm sure there were others who were lost. SMH...
Correction:
2:49 I left off some of the parentheses for the equation for F. The numerator should be: (SS(mean) - SS(fit))/(pfit - pmean)
Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
Hi Josh, I have a project on multi linear regression. I have to come up with my own question using all the data I can find can you please help me
Hi Josh,
Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so.
Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
BAAAAAM. Comparing the simple & multiple regression efficiency through the difference of R². As simple as that. Love it. Thanks for making stats fun
Thank you! :)
Let me tell you this: I was crying in fetal position 4 hours ago; until I saw this video. So THANK YOU for your brilliant channel! 🙏🏻👐🏻
Glad it was helpful! :)
I’m pretty close
But I think this is what I need. Maybe hot all of it but a lot
Thanks
HOORAY! Thank you so much for supporting StatQuest! It means a lot to me that you care enough to contribute. BAM! :)
I almost gave up on statistics until I saw your videos. Now I can confidently say I understand most concepts here.
BAM! :)
that intrO soNG SLAPS
Thank you! :)
My first thought lol
First, you are an awesome, caring and friendly teacher and your way of teaching is very effective! Love your songs too! Second, would you consider covering historical uses of multiple linear regression, statistics and linear algebra? For instance, I read in my linear algebra textbook about how a russian mathematician used linear algebra to try to help the soviets logistical war effort during WW2 and also how an American economist used linear algebra during the late 1940s. These are just two examples from linear algebra but I would also hope there are interesting historical examples using multiple linear regression and statistics too. With your teaching style it would be very informative and inspire more people to get into math at a deeper level. Thanks either way, your efforts are much appreciated! :)
Wow! I'll keep that in mind. However, the current plan is to make a series of videos on neural networks.
Thank you for this well explained video, truly help me to understand multiple regression. BAM!
Thank you! :)
most of the instructors of resources would always divide simple linear regression and multiple linear regression, not gonna lie that just makes everything more complicated!! so super appreciated how you put all of these concepts in the simplest way to digest, and they are really, not a big deal LOL
bam!
This visualisation of multiple regression with the green 3d line that you are using is really nicely done, these kind of visual aides really help us newbie students with the intuition of these concepts the first time they see them. I am using the Woolridge textbook on econometrics and this video is better done in about a fifth the amount of time. Just thought you would like to know that your videos are greatly appreciated :)
oh I almost forgot, BAM
Thanks! I'm glad you like my videos. :)
Oh this was so helpful!, If only there was a video about the assumptions of multiple regression and interaction effects, stats is slowly killing me
I didn't need to see the video to subscribe, you got me with that intro!
BAM! :)
Oh my god !!!!!!!!!!!!!!!!
Statistics seems too easy now after watching your videos 😲😲
I searched whole internet for such videos that explain concepts in simple and fuN wAy ......
FInally TH-cam Algorithm after seeing my struggle 🤣Suggested me your video
Thanks a lot
Now I'll see the whole Playlist of statquest 🤪😎😎😎🤪
now i only require this part for my exams !!! 😁
Once Again THank YOu 🙌🔥🙌🙌🙌🙌😝
Hooray! I'm glad the video was helpful! :)
cutest yt channel intro eveeerrr
:)
This is great. Can you expand on this by explaining why and how Multiple Regression, Hierarchical, and Step-wise Regression are different? I more or less know when to use one over the other, but your videos are really helping visualize and understand these principles.
Hey Josh! Great videos!! THANKS!!
My pleasure!
3:36 if we have more than 3 variables, how exactly would we fit a higher dimension on a graph? Is graphing relevant here?
When you have more variables, you can no longer draw the data and the graph, but the math still works out the same.
Josh! Building a model to evaluate PV system performance and came across your videos. Love it!
Thanks! :)
At around 4:20 of the video, you introduce this new F equation where we eliminate SS(mean) from the equation to compare the simple and multiple regressions directly to each other. At 4:50 of the video, you mention the difference in R^2 values between the simple and multiple regressions. The equation for R^2 = (SS(mean)-SS(fit))/SS(mean). Do we also have a new R^2 equation when comparing the simple and multiple regressions directly to each other? Or are we calculating 2 separate R^2 values and comparing them to each other? Thank you!
We are calculating 2 separate R^2 values and comparing them.
what is the value of n? (3:31) (number of data points?)
It is the number of data points.
At 4:50, there's the statement on analysing difference in R-sq, and the p-value to determine if adding a new feature is worth the trouble. I'm wondering how can this interpretation be done properly? I'm thinking if there are 3 predictors x1,x2,x3, does order of adding them affects the R-sq and p-value? Like if added in order of x1,x2,x3 vs x1,x3 vs x2,x3, these are 3 different scenarios where x3 is added, which should give 3 different incremental R-sq and p-values? How do we interpret properly whether x3 is helpful to the predicting y if the state of currently included predictors affects the results from adding a new predictor? Or more generally, why do people use stepwise regression for feature selection when it is only locally optimal and different order of including variables affect the results of interest?
One possible solution it to use Lasso Regression or Elastic Net Regression to select which variables go in the final model. For details, see Ridge: th-cam.com/video/Q81RR3yKn30/w-d-xo.html Lasso: th-cam.com/video/NGf0voTMlcs/w-d-xo.html Ridge vs Lasso: th-cam.com/video/Xm2C_gTAl8c/w-d-xo.html and Elastic-Net: th-cam.com/video/1dKRdX9bfIo/w-d-xo.html
Hi Josh! Is there a theoretical possibility for the "multiple regression" model to be worse than "simple" model, i.e. adjusted "multiple" R2 less than "simple" R2 ? Or "multiple model" p-value is greater than "simple" p-value?
Good question. I believe the answer is "yes". So I would always be careful about what variables I added to a model.
Hi Josh,
At the 4:52 mark, when you say that it is worth the trouble of including the extra parameter into your model, wasn't that already done when I created the multi-regression best-fit equation? Seems kind of a moot point, wouldn't you agree? All it is saying is that: 'replace your old simple regression model with this new multi-regression model'.
However, you should only do that if the increase in R^2 is large enough. For example, if we have a big fancy model that has a bunch of variables, but gathering the data is very expensive compared to a simple model, then, if the simple model works almost as well as the fancy model, we might opt to just continue using the simple model (and no longer bother spending all of the money to continue to collect the extra data).
3:17 Does the "n" in the denominator means the number of data? and why the SS(fit) in the denominator has to divide the value? thanks
Yes, "n" is the number of observations in your dataset. And if you look at the equation, if 'n' is very large, meaning you have a lot of data, then the denominator will be very small, resulting in a large value for "F". This large value for "F", in turn, will correspond to a relatively small p-value (the larger the value for F, the smaller the corresponding p-value). Thus, the more data you have, the smaller the p-value and the more confidence you have in the "fit" being significant and not just the result of random chance. Does that make sense?
@@statquest yes thanks for the explanation
@Sam Dillard Among other things, this compensates for the fact that adding more variables (and parameters) to a model will improve the fit, even if those variables are not very helpful. In other words, if we add a lot of random variables to a model, some of them, by random chance, will correlate with the dependent variable (the thing we are predicting) and improve the fit. Thus, one of the reasons we subtract p-fit from 'n' is to compensate for this.
@Sam Dillard You are correctly on base. If 'n' is huge relative to p-fit, then it will not make a big difference - and this is desired because it means we have tons of data relative to the number of parameters and thus, the data will dominate the result (and the best way to avoid overfitting your model is to have tons of data). However, for smaller 'n' ('n' closer to p-fit), then the we don't really have tons of data anymore and we need to worry about over fitting, and that's when subtracting p-fit from 'n' will make a difference.
This thread is as crucial as the video. Thanks for the doubt and thanks Josh for answering every single question! You're an awesome guy! :)
Up there with khan acadamy and 3blue one brown. Best machine learning and stats videos on youtube. Would love if there were exam style question and answer examples, like in khan acadamy, but I still seriously good. thank you very much for your videos
Thank you very much!!! :)
How do you formally present the results of multiple regression model analysis on reports? I have read that you report, the R2 value, the coefficients and the anova results on the model but can you elaborate or give an example on a model with both numeric and factor independent variabes?
The answer to this varies, depending on the field. Some want more data than others. I would report the R^2, coefficients and their p-values. For details on how to do this, see: th-cam.com/video/hokALdIst8k/w-d-xo.html
I'm not sure I ever thought too much about comparing a simpler model to a more complex one like that! By the way, how are things going in Chapel Hill these days?
Great! Weather is perfect today.
@@statquestI'm glad to hear that! Are you currently teaching classes with stats and/or machine learning?
@@PunmasterSTP I've never taught a class before and stopped working at UNC 4 years ago to do youtube full time.
@@statquestThat sounds cool, and I imagine you reach way more people on TH-cam!
Hi Josh,
Thank you for all of your videos so far. It's truly a great help.
I would like to know if you have made any videos for multiple regression with CATEGORICAL independent variables? I have some confusion of how to interpret the coefficients of such regression. Or maybe I just give you a simple example and you may give me a brief explanation.
Thank you very much again.
Quynh.
The next video in this series, Part 2, shows multiple regression with a categorical independent variable. A specific case of this is ANOVA, and that's what I demonstrate: th-cam.com/video/NF5_btOaCig/w-d-xo.html Once you understand that, you should check out the Part 3, Design Matrices: th-cam.com/video/CqLGvwi-5Pc/w-d-xo.html , and Part 4, Design Matrix examples in R: th-cam.com/video/Hrr2anyK_5s/w-d-xo.html
This was expalined pretty well. Thanks.
Thanks!
thanks for the super easy explanation. I appreciate your support.
Thank you! :)
Its telling me to pay to watch this video, but a week and a half ago i didnt have to pay, im confused, whats happened?
I am really sorry! I don't know what is going on. I've contacted TH-cam and have not heard anything back. This is breaking my heart because I never wanted this to happen, but somehow it is. I am sorry and doing everything I can to fix this.
How do we compare the Simple Regression fit to the Multiple Regression fit (F-statistic) using software ? Comparison to the mean is done automatically by say Excel or R but how would we do this comparison for Simple Regression v/s Multiple Regression ?
In R you have two models, "simple" and "fancy", you can compare them with anova(simple, fancy).
Please do a statquest on dummy variables!
The next two videos in the series may answer your questions about dummy variables: th-cam.com/video/NF5_btOaCig/w-d-xo.html and th-cam.com/video/CqLGvwi-5Pc/w-d-xo.html
Do you plan on covering polynomial regression?
In my mind, Polynomial Regression is just a special case of Multiple Regression. For Polynomial Regression we square or cube or whatever the values associated independent variables prior to putting them into the Multiple Regression model.
Your voice is superb and mesmerizing......
Thank you! :)
Dear Josh, Thank you very much for your video, which benefited me a lot. Could you do a tutorial on polynomial regression? If multiple linear regression speaks of direct effects from different explanatory variables to explain SSM, can polynomial regression be understood as considering not only direct effects but also cross-effects? Looking forward to your reply
Polynomial regression is just like multiple regression. en.wikipedia.org/wiki/Polynomial_regression
Thank you kindly ✍️
bam!
Hello Josh, I have a doubt about hypothesis testing. Since we have to comment on the population on the basis of sample only, we can only reject Ho if we get significant difference. But if we don't get so, it doesn't mean that there is no significant difference in the variables of the population under consideration. Then, why do people (including books) say we accept Ho when difference is not found in the sample???
You should never "accept the null" because it could be that we just don't have enough data to properly reject it.
@@statquest yes exactly, that's why, we must say failed to reject the null, Right? Thank you so much Josh. Thanks a lot!!!
@@spp626 Correct.
MIxed effects model, please? Thank you!
Noted!
Dr. Starmer, is there a term for the F value between simple and multiple regressions (the thing you said helps one determine whether collecting data for additional variables is meaningful) that distinguishes it from just plain F between linear or multiple regressions and the mean regression? I determined it for my data and would like to know how to refer to it.
To be honest, I'm absolutely horrible with terminology. I have no idea what it's called. Maybe "the nested F-statistic"? I'm not sure.
For F_value of multiple regression in numerator we have SS(simple) and the thing that got me thinking is for what feature it is suppose to be calculated? The path taken in the video is clear, we have first collected points with weight and length and then we are considering adding new feature to our model. But what if i got the data from someone with two features (weight and tail length) and one target (length), how to calculate F_value now, as i am able to compute two values of SS(simple), one for pair (weight, length) and one for pair (tail length, length), should i calculate both F_values? what if i have N features and one target? Now i can calculate N SS(simple) values and thus, N F_values. And what exactly "simple" means when I have for example N features and my linear model have N+1 (intercept) parameters, then "simple" means with two parameters or with (N+1) - 1?
In theory, simple can refer to any "simpler" model, or any model that contains a subset of the features found in the full model. So it really depends on what you are interested in testing. You don't have to do every single test possible, just the ones you are interested in. The standard tests are... 1) Compare the fancy model to only using the mean y-axis value to make predictions 2) comparing the fancy model to all the models that are missing one of the features. For details, see: th-cam.com/video/hokALdIst8k/w-d-xo.html
Can you kindly clarify that, by SS(Simple) you are referring to the SS(Fit) of the simple model, and SS(multiple) is the SS(Fit) of the multiple-regression model?
That is correct and is explained at 4:11
how would R^2 indicate if collecting data on a new feature is going to help if we have not already collected data on the feature to calculate R^2?
This could just be a pilot study, done before a larger, more expensive study. In this case, the results from the pilot study could inform how we carry out the larger study.
Hi josh, one request from me as a follower of yours...........
I searched sooo many videos to get clarification about "Multiple Regression" with "Categorical data as independent variable"(like gender) but not satisfied with their explanation, either they explain using R or their data not contains 'categorical data'.
Could u come up with one video? that should explain the concept with numerical form(like your most of the videos are ) but not theory.
Actually i also watched your "Anova test" video to get clarification but further u hv not shown how to implemented on regression.
Can u come up with one video from taking data of different categories to implementation in regression?
please🙏
See: th-cam.com/video/CqLGvwi-5Pc/w-d-xo.html
I really like your video
can you make a full series on time Series analysis
I hope to do that one day.
Hey! Why do we have a single intercept in Linear Regression for multiple features?
Because there is a single thing we are trying to predict.
nice video. will you talk about non-linear models?
I mention them in this video: th-cam.com/video/Vf7oJ6z2LCc/w-d-xo.html and in most of my machine learning videos. Tree based methods, support vector machines and Neural Networks are all non-linear. Here's a list of all of my videos: statquest.org/video-index/
Hi Josh! What do you call the process of comparing the R^2 of the simple to multi? And how is it different than simply comparing cross validated R^2 of simple vs multi?
Oh I you answered my first question already, but my second question still stands!
Presumably you just average the values across the different folds, but that is just a guess.
But that means we still need to obtain some data of Tail length beforehand in order to calculate the SS(multiple).
So will the following be more precise: To know whether it is worth spending *much* time or paying *more* effort to find *more* data?
Yes
Cool, Could you clearly explain how stepwise regression? I see it in several bioinformatic software and also the ridge regression and lasso? thanks
Just you talking about R2, I was wondering also, what about AIC (Akaike information criterion) and BIC (Bayesian information criterion) ? are there related to?
can you please do a series on multivariate data analysis
Can you give me ideas for topics other than this one?
Hi. Awesome video.
Can you do one one multivariate analysis, Anova, covariates please? I had a hard time in uni understanding them. Thanks
What's the 'n' of the 'n-p(multiple)' in the denominator?
How can we do a multicollinearity check when you have ordinal variables in the data?
that was awesome!
Thank you!
Hey Josh ! thanks for one more clearly explained concept ! Just one question: maybe it's too obvious but I don't really get how comparing multiple to the simple regression would help me avoid collecting data for the extra variable if I have already collected it for testing the model. Or is it possible that we measure some number of samples for tail length for rough estimation and decide if it's worth further doing so for a bigger sample size?
We can use the full dataset for the original models and testing of models. Once we decide which model and variables we want to use, future predictions can be made by only collecting the limited set of variables that we need.
Can you do multivariate regression next? (multiple outcome variables)
How would you find the SS(mean) value when there's multiple variables?Are we supposed to add the SS(mean) value for each independent variable with each other?
In 2-D, the line y=mean(y-axis value) is a horizontal line in 2-d at the mean of the y-axis values and we do not need to specify the x-axis values. Likewise, we can specify a plane (or hyper-plane) with y=mean(y-axis value) without specifying the other variables.
What does the fit mean and where does it come from?
This is described in this video: th-cam.com/video/nk2CQITm_eo/w-d-xo.html
Suppose the relationship between three variables Y, X1, and X2 is estimated using the multiple linear regression as Y = 2.2 X1 + 3.1 X2 + 4.1. Given this information, which of the following statements is true?
Answer choices
Select only one option
For every unit change in X1, Y changes by 2.2 units
For every unit change in X2, Y changes by 3.1 units keeping X1 constant
For every unit change in X1, Y changes by 3.1 units
For every unit change in X2, Y changes by 3.1 units
Is this a homework question?
@@statquest yes
Which is the right option
I know you covered the formula for F already in the linear regression video, but would you mind doing one more just on this formula? Sadly its still not intuitive for me :/
PS: Thanks for all your videos! They are a great help!
I'll keep that in mind.
@@statquest Same with me Joshua. Thank you in advance. :)
Sir, please provide a lecture on multivariate analysis.
I'll keep that in mind.
Amazingly helpful video, could you do one on GEE's?
Thank You !
You're welcome!
Hello Josh, thank you for the wonderful video series. I was wondering if formula for F for simple vs Multiple is correct. I believe the denominator should have SS(Simple) instead. Please confirm.
The video is correct. The denominator has SS(mutliple).
Thanks for this awesome videos...I have a small doubt... Why SS(mean) is found about the tail length and not mouse weight?
Because we are trying to predict length, not weight.
Great one
Thanks!
ooh yeah Statquest
bam! :)
do you know how to find the intercept b
There are two ways to solve this problem. There is actually an analytical solution, meaning there's an equation that you can plug your data in and out comes a solution. However, I prefer using Gradient Descent, because it can be used in a much wider variety of situations. To learn more about Gradient Descent, see: th-cam.com/video/sDv4f4s2SB8/w-d-xo.html
Can u make some examples with Python instead of R since Python becoming more and more popular?
Yes! That is the plan.
@@statquest Great!
Can someone explain what R2 and P-value mean? I have seen the videos but can't figure out how they are different
To understand what R^2 and P-values are, you should start with simple linear regression: th-cam.com/video/nk2CQITm_eo/w-d-xo.html
thank you!
Thanks!
Shouldn’t F be (SS(mean) - SS(fit))/(pfit - pmean) on top?
Yes, I was sloppy with the parentheses.
If the difference in R2 values between simple and multiple regression is large and p value is small, why does adding tail length is worth the trouble ? Sorry I didn't understand this part clearly.
What time point, minutes and seconds, are you asking about?
This channel is 🥰
Thanks!
stat quest!!! yay!!!
Yes! :)
but WHY? Noone ever explains the geometrical meanings of the equatations... :(
Despite the annoying intro, this video was very helpful.
You can just skip the first 30 seconds of all my videos.
Nice video
legend
:)
what is r^2?
To learn more about R^2, see: th-cam.com/video/2AQKmw14mHM/w-d-xo.html and th-cam.com/video/nk2CQITm_eo/w-d-xo.html
I feel so sad when that BAAM!!! doesnt work for me...
P.S. Just tired, can't concentrate. But ussually it works just as it takes!
Ok! I hope you can get some rest.
Excelent video!!!! Can you make one about beta regressions?
Bam! Peanut Butter and JAAAAAAAM!
:)
Pro tip. If you like each video you watch,you will not re-watch videos
BAM!
:)
Nice intro! HAHA!
:)
bammmmm !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
:)
I love you
:)
Bam!
:)
Bam!!
:)
I saved the video. But didnt watch. Got it as a 15 Mark's answer.
Lol, thought you were sing 😁
:)
funny intro!!! lol
:)
Bamm!!
:)
Useful video but that intro almost killed me with cringe
Death by cringe is a terrible way to go. In the future, just skip the first 20 seconds or so and you’ll be spared.
Didn't understand any of this, to be perfectly honest.
Did you watch the first part first? (this is the 2nd part) th-cam.com/video/nk2CQITm_eo/w-d-xo.html
@@statquest Thanks for the link.
stupid jokes and tone-deaf singing. I'm sure your girlfriend is cringing each time you start singing, or for that matter, start telling jokes 😞. Or if you don't have GF you might start looking for improvement in these areas.
but great explanations.
Noted!
This video DID NOT clearly explain multiple regression. You keep referring to another video that I didn't see so you lost me when you kept referring to it. Next time, just come up with something else. Keep it simple. You need to learn pedagogy because I'm sure there were others who were lost. SMH...
Here are the links to the other videos:
th-cam.com/video/PaFPbb66DxQ/w-d-xo.html
th-cam.com/video/nk2CQITm_eo/w-d-xo.html