Great video on feature scaling! Many thanks. Could you please clarify as to why you have advised that feature scaling should aonly be applied to the training set and not the entire dataset? Thanks.
Very happy you enjoyed it! The general principle: anything you learn, must be learned from the model's training data. When you use a StdScaler you "learn" the Variance and the Mean. Giving a full example with numbers would be a bit lengthy for a TH-cam comment, I would recommend the 3th answer here for more detail datascience.stackexchange.com/questions/39932/feature-scaling-both-training-and-test-data
8:50 you said to only feature scale on the training data set and not the entire data set. Isn't' the model that's generated based on this training data? Would the output prediction make sense if the validation and test data sets aren't scaled likewise?
Hey Yalslaus, Thanks for the questions. You got it right, you have to scale the training, test and validation data. What I meant in the video is that you should "train your scaler" only on your training data (call the .fit transform() method only on the training data and use the .transform() on the validation and test). Reasoning: Le'ts assume you have a MaxAbs Scaler and Train on X={age:99}, and test one year later Y={age:100} now if you train the scaler on all the data you will implicitly tell the model what the maximum number in the test set is e.g. 100. Especially for things such as stock prices, or demand forecasting, this can lead you to overestimate your performance.
absolutely great video. thanks a lot mate.
Extremly happy I could help!
Great video, you deserve more viewers :)
Thank you very happy you could use it! The mighty TH-cam algorithm might do wonders one day ;)
Great video on feature scaling! Many thanks. Could you please clarify as to why you have advised that feature scaling should aonly be applied to the training set and not the entire dataset? Thanks.
Very happy you enjoyed it! The general principle: anything you learn, must be learned from the model's training data. When you use a StdScaler you "learn" the Variance and the Mean. Giving a full example with numbers would be a bit lengthy for a TH-cam comment, I would recommend the 3th answer here for more detail datascience.stackexchange.com/questions/39932/feature-scaling-both-training-and-test-data
8:50 you said to only feature scale on the training data set and not the entire data set. Isn't' the model that's generated based on this training data? Would the output prediction make sense if the validation and test data sets aren't scaled likewise?
Hey Yalslaus, Thanks for the questions. You got it right, you have to scale the training, test and validation data.
What I meant in the video is that you should "train your scaler" only on your training data (call the .fit transform() method only on the training data and use the .transform() on the validation and test). Reasoning: Le'ts assume you have a MaxAbs Scaler and Train on X={age:99}, and test one year later Y={age:100} now if you train the scaler on all the data you will implicitly tell the model what the maximum number in the test set is e.g. 100. Especially for things such as stock prices, or demand forecasting, this can lead you to overestimate your performance.
@@datawithsandro2919 Thanks for the reply and clarification.