Great explanation Brian! I have a small question, though. If the response variable has to be normal (in a normal linear regression), why do you think most statistics articles insist that only the residuals have to be normal and not the variable? What tests do you think should be done before a GLM, besides residual plots?
Saying the response is normal and the residuals are normal means the same thing basically. The response is normal (around the mean for that X value), which just means the response’s distance from the mean (residual) is normal with mean 0. If we want to evaluate normality of residuals, it’s then easier to look at a graph of residual since they all have the same mean so we can easily visualize if they seem normally distributed.
Great work, quick question! Why is it ok to use a normal distribution for response variables like weight if weight can't be negative, or zero? I see it a lot, but don't understand why it's so common.
There's pretty much nothing that *really* follows a normal distribution - it's all approximations. Take height for example - and suppose the height follows an approximately normal distribution with mean = 64 inches and sd = 4 inches. Even though a normal distribution has some probability of being less than 0 (which is impossible), because that is 16 standard deviations away from the mean, the probability is basically 0 anyways (less than 1 in a billion billion billion billion billion billion). So yes, you're totally right that it's impossible, but assuming it's normal makes things easy and the probability calculations are often pretty accurate!
But you missed the best part, how we can engineer any combination we want to fit our data. We can model different types of trends, heteroscedasticity and of course, sample from either pdf or pmf. They are incredibly flexible. By the way, ultimately what's the scope of this channel? Can we eventually expect videos on things like measure theoretic probability, stochastic processes and the like?
There might be one video on measure theory sometime, but no, I plan to stick more on the statistics and data science end. Any more probability videos would probably be similar to the Markov/Chebyshev's inequality videos.
In your final slide, you say that the link function maps from the original scale to "the parameter of the relevant probability distribution". You also say the parameter is personalised.... Is your final slide saying that in general, the link function maps to the parameter of the data's distribution? e.g. "p" in Bernoulli, "sigma" in Rayleigh? Apologies if i haven't understood this correctly.
Yes, the link function is just transforming a real number with no restrictions (negative infinity to infinity) to something with the correct possibilities for the parameter of interest. In logistic regression, if we were predicting the probability of having diabetes based on weight, you and me would each get a personalized parameter p based on our weight. The heavier person might have p = 0.7, reflecting the fact that their weight makes it more likely that they may have diabetes. The lighter person might have p=0.3. But they will both be between 0 and 1 no matter eat because the link function transformed the scale to ensure that it’s between 0 and 1, which regular linear regression did not do.
Man this is pure gold. No BS, just esence. Subscribed!
This is a great explanation. I love the visuals showing how they are all related. Thank you.
Can u create a whole playlist for the GLM's. Please do consider doing this
Great explanation Brian! I have a small question, though. If the response variable has to be normal (in a normal linear regression), why do you think most statistics articles insist that only the residuals have to be normal and not the variable? What tests do you think should be done before a GLM, besides residual plots?
Saying the response is normal and the residuals are normal means the same thing basically. The response is normal (around the mean for that X value), which just means the response’s distance from the mean (residual) is normal with mean 0. If we want to evaluate normality of residuals, it’s then easier to look at a graph of residual since they all have the same mean so we can easily visualize if they seem normally distributed.
Great work, quick question! Why is it ok to use a normal distribution for response variables like weight if weight can't be negative, or zero? I see it a lot, but don't understand why it's so common.
There's pretty much nothing that *really* follows a normal distribution - it's all approximations. Take height for example - and suppose the height follows an approximately normal distribution with mean = 64 inches and sd = 4 inches. Even though a normal distribution has some probability of being less than 0 (which is impossible), because that is 16 standard deviations away from the mean, the probability is basically 0 anyways (less than 1 in a billion billion billion billion billion billion). So yes, you're totally right that it's impossible, but assuming it's normal makes things easy and the probability calculations are often pretty accurate!
@@statswithbrian Works for me, thank you!
Thanks!
Finally it came!!!
Thanks for the inspiration!
But you missed the best part, how we can engineer any combination we want to fit our data. We can model different types of trends, heteroscedasticity and of course, sample from either pdf or pmf. They are incredibly flexible.
By the way, ultimately what's the scope of this channel? Can we eventually expect videos on things like measure theoretic probability, stochastic processes and the like?
There might be one video on measure theory sometime, but no, I plan to stick more on the statistics and data science end. Any more probability videos would probably be similar to the Markov/Chebyshev's inequality videos.
In your final slide, you say that the link function maps from the original scale to "the parameter of the relevant probability distribution". You also say the parameter is personalised....
Is your final slide saying that in general, the link function maps to the parameter of the data's distribution? e.g. "p" in Bernoulli, "sigma" in Rayleigh?
Apologies if i haven't understood this correctly.
Yes, the link function is just transforming a real number with no restrictions (negative infinity to infinity) to something with the correct possibilities for the parameter of interest.
In logistic regression, if we were predicting the probability of having diabetes based on weight, you and me would each get a personalized parameter p based on our weight. The heavier person might have p = 0.7, reflecting the fact that their weight makes it more likely that they may have diabetes. The lighter person might have p=0.3. But they will both be between 0 and 1 no matter eat because the link function transformed the scale to ensure that it’s between 0 and 1, which regular linear regression did not do.