Thank you so much for this Aman. Please cover the wrapper and embedded categories too. I am currently a data scientist and these techniques are very helpful for my job. Thanks
Thanks a lot Aman for this detailed and simple explanation and putting a lot of efforts for us. Can you cover following topics anytime in future: This will be very helpful for People learning Data science in fields like: Chemical Engg/, Pollution control in Manufacturing/ Power Sectors etc.. 1. How to deal if test and validation set in time series regression data has different distribution. 2. Approach to deal with time series batch wise processes eg: Batch wise manufacturing of Alcohol or any other chemical product and we have to do make a model on either preventive maintenance or how to increase run length and efficiency of upcoming batch processes.
Today's topic is too good clean and clear explanation. 👏👏Please do next videos also Aman sir👍 waiting for next series of feature selection videos 😎😎😎😎😎
Hi Aman thank again for this video. I have a question. Who will suggest us the threshold value for correlation, is it business expert. For n numbers of categorical features who will decide the number of to features ? overall who is the person who decide threshold value for each way to select feature, can choose of our own of this decision is taken from Business expert.
HI Ajay,Domain+Experience will come into picture. Whoever suggest. Suppose there is medical data where there are 200 distinct features, now if I want to create bucket domain comes handy. Suppose I have 500 columns and 1 million rows run to a correlation analysis, experience will help me what will be the suitable threshold though research suggest 0.85/0.90. Sometimes we can take a lower value like 0.75 based on how my features are. Let's say I have different info in most features for example
Hello Aman, I have a doubt regarding threshold used in Variance Threshold, I mean what does this threshold signifies? And how we will decide that what threshold value should be chosen when?
hello sir if there are so many techniques for feature selection how do we get to know what techniques to use when? chi and anova test looks similar which to prefer when? I have used Pearson's correlation to overcome multicollinearity at times... How to perform feature selection when there are around 150 features?
Aman, i have a question , can we directly fit_transform the X in VarianceThreshold to get the desired number of Columns? why do we have to do firstly .fit() and then get_support() when if we can get directly the result from fit_transform()?
Hello, Thanks for the explanation. I have one question. My question is, Does using best features helps to reduce the training data sets. Say I do not have a large datasets, but I can make independent variable that is highly corelated with the dependent variable, will it help me reduce my traning data sets. Your response will be highly valuable.
Hi Aman, Great tutorial as usual but I have one question. if chi square is used for categorical variables then why are you using chi square test for continuous variable?
@@UnfoldDataScience In the Iris dataset, where you are using SelectKbest with Chi Square (@13:50). The iris dataset has continuous variables if I am not wrong
There can't be one size fits all kind of value, it will depend on how many features you are loosing on a threhold, how may u think u need based on domain understanding, multiple things come into picture. So start from zero and move little up and see how it is coming
Thank you so much for this Aman. Please cover the wrapper and embedded categories too. I am currently a data scientist and these techniques are very helpful for my job. Thanks
Thank you so much for this and yes sir, please cover the rest of the two methods in detail.
Thanks a lot Aman for this detailed and simple explanation and putting a lot of efforts for us.
Can you cover following topics anytime in future: This will be very helpful for People learning Data science in fields like: Chemical Engg/, Pollution control in Manufacturing/ Power Sectors etc..
1. How to deal if test and validation set in time series regression data has different distribution.
2. Approach to deal with time series batch wise processes eg: Batch wise manufacturing of Alcohol or any other chemical product and we have to do make a model on either preventive maintenance or how to increase run length and efficiency of upcoming batch processes.
Thanks Sudhanshu, sure.
Thanks for the video, plz cover other two methods as well
Today's topic is too good clean and clear explanation. 👏👏Please do next videos also Aman sir👍 waiting for next series of feature selection videos 😎😎😎😎😎
Thank you.
Thank you so much, I always learned new things with all your courses.
Thanks Aman , can you please create a seperate playlist regarding this topic , it would be a great thing🙂
Good Suggestion, let me create one. There are some old videos also which can be put.
Thanks aman. Please cover the other two categories as well!
Thanks Vijay. Sure
Thanks Aman. Please cover the wrapper and embedded categories too.
Sir the tutorial is such a good one...but I seek to know which method is more better for feature selection in random forest algorithm
Hi Aman thank again for this video.
I have a question.
Who will suggest us the threshold value for correlation, is it business expert.
For n numbers of categorical features who will decide the number of to features ?
overall who is the person who decide threshold value for each way to select feature, can choose of our own of this decision is taken from Business expert.
HI Ajay,Domain+Experience will come into picture. Whoever suggest. Suppose there is medical data where there are 200 distinct features, now if I want to create bucket domain comes handy. Suppose I have 500 columns and 1 million rows run to a correlation analysis, experience will help me what will be the suitable threshold though research suggest 0.85/0.90.
Sometimes we can take a lower value like 0.75 based on how my features are. Let's say I have different info in most features for example
great teaching sir g
Hello Aman, I have a doubt regarding threshold used in Variance Threshold, I mean what does this threshold signifies? And how we will decide that what threshold value should be chosen when?
Thank you sir for your hard work and for this video.🙂
Welcome Shubham.
Thanks a lot Aman bhai
Welcome Mayank.
hello sir if there are so many techniques for feature selection how do we get to know what techniques to use when? chi and anova test looks similar which to prefer when? I have used Pearson's correlation to overcome multicollinearity at times... How to perform feature selection when there are around 150 features?
Nice tutorial. Thanks for your hard work and efforts you put.
I have one question though. Which category does PCA belong to?
Thank you for your hard work!!!!
Hi Aman, we use chi2 test on top of categorical variables. But here petal length and width are numerical variable. Can you please explain this?
Thank you so much, how will I be able to know which feature is more important than the other if both have high correlation?
We should choose the one with high correlation with target variable.
Aman, i have a question , can we directly fit_transform the X in VarianceThreshold to get the desired number of Columns? why do we have to do firstly .fit() and then get_support() when if we can get directly the result from fit_transform()?
do we need separate feature selection for categorical variables
w.r.t target variable is a categorical
Hello, Thanks for the explanation. I have one question. My question is, Does using best features helps to reduce the training data sets. Say I do not have a large datasets, but I can make independent variable that is highly corelated with the dependent variable, will it help me reduce my traning data sets. Your response will be highly valuable.
Yes, but model may not be suitable for practical purposes
can embedded lasso and ridge regularization be used in multiclass classification like iris data??????please reply
Hi Aman,
Great tutorial as usual but I have one question.
if chi square is used for categorical variables then why are you using chi square test for continuous variable?
That variable may be categorical only, which part of video?
@@UnfoldDataScience In the Iris dataset, where you are using SelectKbest with Chi Square (@13:50). The iris dataset has continuous variables if I am not wrong
Thanks. This was very useful. Please cover wrapper and embedded too if possible
thank you very much
Yes please create RFE and Wrapper video
Very good Aman
Thank you
Sir your videos are very very helpful
Thanks Mayank.
How to find the target variable and feature selection when there are multiple numerical and categorical variables?
How we will select optimal threshold value...
There can't be one size fits all kind of value, it will depend on how many features you are loosing on a threhold, how may u think u need based on domain understanding, multiple things come into picture. So start from zero and move little up and see how it is coming
yes please cover the wrapper and embedded
Thanks Ankit. Sure.
Kindly shre the google drive link for this code
github.com/UnfoldDataScience
Sir we want feature selection methods machine learning using pytho we want fast
filter category
Sure
Hi Aman, we use chi2 test on top of categorical variables. But here petal length and width are numerical variable. Can you please explain this?