This was key to wrapping the section on validation and scoring for chapter 4. Predicting enrollment at target university for a cohort of admitted students. Thank you!
Dr. Baker, Thanks. Good Discussion. I tried several data sets and optimized them with Modeler and SPSS Binary. I find the Logistic approach is sound and safe, so to speak. I have expanded the model to Machine Learning where the real task is to find a good Training data set with over 200 cases ball park. This is from 2013. You might consider suggesting some books or update your lecture again with datasets from UCLA. Dr. Dey
Just a clarification I am finding in both SPSS and Watson. The XML (or whatever) model file must reference the entire path. Like C:\ML\Mymodel\Immigrant.xml; when you are in the scoring wizard.
Missing values on one or more of the predictors for the cases we are trying to predict (it can't provide an estimate without complete info on the predictors).
I was wanting to do something similar to this. i have customers that have change their status from non committed to committed and some that haven't in a data file. Very little attempt to convert these customers have committed has occurred in the past. It has just evolved that way. I want to work out which parameters have influenced their change in status. In order to do that i need to include those that have changed status against those that haven't. But what i want to do is predict if a future direct attempt to change their status from non committed would work on the original data set of customers that are non committed with the parameters that provide likelihood for prediction. Is it possible?
This method did not work for my dataset with multiple categorical variables. Instead, I created a new variable and randomly assigned 1for 70% and 0 for 30%(can make use of data->select cases) and made use of the 'selection Variable' option in the Logistic regression dialogue box. This method computed the model based on 70% cases and gave me the prediction on the 30% test data in the form of a confusion matrix.
Lots of ways to randomly 70/30 split a file in SPSS... As one example, under the SELECT CASES command (with adjusting a few settings), you can split the file that way.
This was key to wrapping the section on validation and scoring for chapter 4.
Predicting enrollment at target university for a cohort of admitted students.
Thank you!
Hi there, may I know why there's a '.' shown at selected probability?
Dr. Baker, Thanks. Good Discussion. I tried several data sets and optimized them with Modeler and SPSS Binary. I find the Logistic approach is sound and safe, so to speak. I have expanded the model to Machine Learning where the real task is to find a good Training data set with over 200 cases ball park. This is from 2013. You might consider suggesting some books or update your lecture again with datasets from UCLA. Dr. Dey
Just a clarification I am finding in both SPSS and Watson. The XML (or whatever) model file must reference the entire path. Like C:\ML\Mymodel\Immigrant.xml; when you are in the scoring wizard.
Thanks. Why do some values not compute if all of the data is available to the scoring wizard?
Missing values on one or more of the predictors for the cases we are trying to predict (it can't provide an estimate without complete info on the predictors).
I was wanting to do something similar to this. i have customers that have change their status from non committed to committed and some that haven't in a data file. Very little attempt to convert these customers have committed has occurred in the past. It has just evolved that way. I want to work out which parameters have influenced their change in status. In order to do that i need to include those that have changed status against those that haven't. But what i want to do is predict if a future direct attempt to change their status from non committed would work on the original data set of customers that are non committed with the parameters that provide likelihood for prediction. Is it possible?
This method did not work for my dataset with multiple categorical variables. Instead, I created a new variable and randomly assigned 1for 70% and 0 for 30%(can make use of data->select cases) and made use of the 'selection Variable' option in the Logistic regression dialogue box. This method computed the model based on 70% cases and gave me the prediction on the 30% test data in the form of a confusion matrix.
How to split data as 70 and 30...please help
Lots of ways to randomly 70/30 split a file in SPSS... As one example, under the SELECT CASES command (with adjusting a few settings), you can split the file that way.