Notes: Forward selection: Null hypothesis: only intercept (mean of y) Then add variables to the model Step-wise model selection: Also, start with all variables at the time and remove one at a time Remove the variable with the least significant t-statistic Cross-validation model selection. Qualitative predictors: gender, student, marital status - Create a dummy variable, with values 0 or 1 Completed 08.08.2024
Why don't we use same number of dummy variables as number of levels in a categorical variable? Why do we do k-1? How is the model interpreted in the end? Is it - being Asian or Caucasian doesn't impact the credit balance? What about the African Americans?
1. if k dummy variables, we could not compare those k-1 dummy variables one by one with the other variable which is a baseline and this comparison will show the significance based on p-value because baseline will show in each equation of k levels as a constant and not significant enough p-value will show there is no evident saying to say the difference between them 2.based on p-value and coefficient 3. compared to african americans, asian and caucasian do not show significance , which shows that the credit balance has not evident enough saying as having relationship with ethnicity
If I use a stepwise method, how can I know that I've found the correct variables? Using 2^p candidate predictors I am always going to find A MODEL with no guarantee that I've found THE MODEL. Additionally, the proposed criterions, or stop-rules, are if I'm not misstaken, built to handle a small number of candidate predictors or a handful of candidate models? I think it would be helpful to adress this right away. As well as, perhaps, pointing out that AIC has no valid stop-rule within the context of searching through 2^p candidate predictors. It is quite easy to show, using bootstrap, if you have 8 predictors out of which 4 are present in the "true" relationship, and these 4 are also unccorrelated, the easiest case, that the true relationship will be found about 10% of the time with a sample size of 100. How is the proposed method valid?
1) You have no guarantee of finding the best model among the 2^p models using only forward/backward/stepwise algorithms (especially when p is large), but the selected models often are close. The only guarantee is if you fit them all. 2) AIC will always select the right model if (i) the correct model is in your bag of models (which is not the case of stepwise-like algorithms), (ii) the sample size is large enough.
there needed to be graphical representation with explanation...should have been done in a more digestable ways...not very easy to follow. very average job!
Instructerer does not pronounce some words in sentences clear enough. Pauses for a moment, then speaks too fast. If I slow down the video, pauses get longer and becomes boring. I need to watch this for a course but it is difficult to listen for a nonnative English speaker.
hahahaha good intro in the beginning between trevor and robert
Notes:
Forward selection:
Null hypothesis: only intercept (mean of y)
Then add variables to the model
Step-wise model selection:
Also, start with all variables at the time and remove one at a time
Remove the variable with the least significant t-statistic
Cross-validation model selection.
Qualitative predictors: gender, student, marital status
- Create a dummy variable, with values 0 or 1
Completed 08.08.2024
Why don't we use same number of dummy variables as number of levels in a categorical variable? Why do we do k-1?
How is the model interpreted in the end?
Is it - being Asian or Caucasian doesn't impact the credit balance? What about the African Americans?
1. if k dummy variables, we could not compare those k-1 dummy variables one by one with the other variable which is a baseline and this comparison will show the significance based on p-value because baseline will show in each equation of k levels as a constant and not significant enough p-value will show there is no evident saying to say the difference between them
2.based on p-value and coefficient
3. compared to african americans, asian and caucasian do not show significance , which shows that the credit balance has not evident enough saying as having relationship with ethnicity
If I use a stepwise method, how can I know that I've found the correct variables?
Using 2^p candidate predictors I am always going to find A MODEL with no guarantee that I've found THE MODEL.
Additionally, the proposed criterions, or stop-rules, are if I'm not misstaken, built to handle a small number of candidate predictors or a handful of candidate models? I think it would be helpful to adress this right away. As well as, perhaps, pointing out that AIC has no valid stop-rule within the context of searching through 2^p candidate predictors.
It is quite easy to show, using bootstrap, if you have 8 predictors out of which 4 are present in the "true" relationship, and these 4 are also unccorrelated, the easiest case, that the true relationship will be found about 10% of the time with a sample size of 100.
How is the proposed method valid?
1) You have no guarantee of finding the best model among the 2^p models using only forward/backward/stepwise algorithms (especially when p is large), but the selected models often are close. The only guarantee is if you fit them all.
2) AIC will always select the right model if (i) the correct model is in your bag of models (which is not the case of stepwise-like algorithms), (ii) the sample size is large enough.
Is intercept bias?
+1
there needed to be graphical representation with explanation...should have been done in a more digestable ways...not very easy to follow. very average job!
Instructerer does not pronounce some words in sentences clear enough. Pauses for a moment, then speaks too fast. If I slow down the video, pauses get longer and becomes boring. I need to watch this for a course but it is difficult to listen for a nonnative English speaker.
how much did you pay for the course?