Feature Selection techniques in Python | feature selection machine learning | machine learning tips

Unfold Data Science

มุมมอง 31 219

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ม.ค. 2025

ความคิดเห็น • 54

@boejiden7093 2 ปีที่แล้ว ⁺⁶
Thank you so much for this Aman. Please cover the wrapper and embedded categories too. I am currently a data scientist and these techniques are very helpful for my job. Thanks
@subhajitroy4869 2 ปีที่แล้ว ⁺²
Thank you so much for this and yes sir, please cover the rest of the two methods in detail.
@sudhanshusoni1524 2 ปีที่แล้ว ⁺³
Thanks a lot Aman for this detailed and simple explanation and putting a lot of efforts for us.
Can you cover following topics anytime in future: This will be very helpful for People learning Data science in fields like: Chemical Engg/, Pollution control in Manufacturing/ Power Sectors etc..
1. How to deal if test and validation set in time series regression data has different distribution.
2. Approach to deal with time series batch wise processes eg: Batch wise manufacturing of Alcohol or any other chemical product and we have to do make a model on either preventive maintenance or how to increase run length and efficiency of upcoming batch processes.
@UnfoldDataScience 2 ปีที่แล้ว ⁺²
Thanks Sudhanshu, sure.
@surajjadhav2382 2 ปีที่แล้ว ⁺²
Thanks for the video, plz cover other two methods as well
@supreethalahari.sreeram5147 2 ปีที่แล้ว ⁺²
Today's topic is too good clean and clear explanation. 👏👏Please do next videos also Aman sir👍 waiting for next series of feature selection videos 😎😎😎😎😎
@UnfoldDataScience 2 ปีที่แล้ว
Thank you.
@dollysiharath4205 ปีที่แล้ว
Thank you so much, I always learned new things with all your courses.
@hemanthvokkaliga 2 ปีที่แล้ว ⁺³
Thanks Aman , can you please create a seperate playlist regarding this topic , it would be a great thing🙂
@UnfoldDataScience 2 ปีที่แล้ว ⁺²
Good Suggestion, let me create one. There are some old videos also which can be put.
@vijayragavansk 2 ปีที่แล้ว ⁺¹
Thanks aman. Please cover the other two categories as well!
@UnfoldDataScience 2 ปีที่แล้ว
Thanks Vijay. Sure
@rahulsingh-qs7lm 10 หลายเดือนก่อน
Thanks Aman. Please cover the wrapper and embedded categories too.
@aiswaryasprasad7319 2 ปีที่แล้ว
Sir the tutorial is such a good one...but I seek to know which method is more better for feature selection in random forest algorithm
@ajaykushwaha-je6mw 2 ปีที่แล้ว ⁺¹
Hi Aman thank again for this video.
I have a question.
Who will suggest us the threshold value for correlation, is it business expert.
For n numbers of categorical features who will decide the number of to features ?
overall who is the person who decide threshold value for each way to select feature, can choose of our own of this decision is taken from Business expert.
@UnfoldDataScience 2 ปีที่แล้ว
HI Ajay,Domain+Experience will come into picture. Whoever suggest. Suppose there is medical data where there are 200 distinct features, now if I want to create bucket domain comes handy. Suppose I have 500 columns and 1 million rows run to a correlation analysis, experience will help me what will be the suitable threshold though research suggest 0.85/0.90.
Sometimes we can take a lower value like 0.75 based on how my features are. Let's say I have different info in most features for example
@umasharma6119 2 ปีที่แล้ว
great teaching sir g
@Shubham14365 ปีที่แล้ว
Hello Aman, I have a doubt regarding threshold used in Variance Threshold, I mean what does this threshold signifies? And how we will decide that what threshold value should be chosen when?
@rawatshubham09 2 ปีที่แล้ว ⁺¹
Thank you sir for your hard work and for this video.🙂
@UnfoldDataScience 2 ปีที่แล้ว
Welcome Shubham.
@mayankbhatt1308 2 ปีที่แล้ว ⁺¹
Thanks a lot Aman bhai
@UnfoldDataScience 2 ปีที่แล้ว
Welcome Mayank.
@tharunnl7810 10 หลายเดือนก่อน
hello sir if there are so many techniques for feature selection how do we get to know what techniques to use when? chi and anova test looks similar which to prefer when? I have used Pearson's correlation to overcome multicollinearity at times... How to perform feature selection when there are around 150 features?
@leamon9024 2 ปีที่แล้ว ⁺¹
Nice tutorial. Thanks for your hard work and efforts you put.
I have one question though. Which category does PCA belong to?
@andresilva9140 2 ปีที่แล้ว
Thank you for your hard work!!!!
@souravbiswas6892 2 ปีที่แล้ว
Hi Aman, we use chi2 test on top of categorical variables. But here petal length and width are numerical variable. Can you please explain this?
@CodeJoeTheDuke 24 วันที่ผ่านมา
Thank you so much, how will I be able to know which feature is more important than the other if both have high correlation?
@UnfoldDataScience 23 วันที่ผ่านมา
We should choose the one with high correlation with target variable.
@trashantrathore4995 2 ปีที่แล้ว
Aman, i have a question , can we directly fit_transform the X in VarianceThreshold to get the desired number of Columns? why do we have to do firstly .fit() and then get_support() when if we can get directly the result from fit_transform()?
@daneshk6395 2 ปีที่แล้ว
do we need separate feature selection for categorical variables
w.r.t target variable is a categorical
@dineshjoshi4100 2 ปีที่แล้ว
Hello, Thanks for the explanation. I have one question. My question is, Does using best features helps to reduce the training data sets. Say I do not have a large datasets, but I can make independent variable that is highly corelated with the dependent variable, will it help me reduce my traning data sets. Your response will be highly valuable.
@UnfoldDataScience 2 ปีที่แล้ว
Yes, but model may not be suitable for practical purposes
@beautyisinmind2163 2 ปีที่แล้ว
can embedded lasso and ridge regularization be used in multiclass classification like iris data??????please reply
@fahadnasir1605 2 ปีที่แล้ว
Hi Aman,
Great tutorial as usual but I have one question.
if chi square is used for categorical variables then why are you using chi square test for continuous variable?
@UnfoldDataScience 2 ปีที่แล้ว ⁺¹
That variable may be categorical only, which part of video?
@fahadnasir1605 2 ปีที่แล้ว
@@UnfoldDataScience In the Iris dataset, where you are using SelectKbest with Chi Square (@13:50). The iris dataset has continuous variables if I am not wrong
@PrincyAnnThomas2022 2 ปีที่แล้ว
Thanks. This was very useful. Please cover wrapper and embedded too if possible
@aktherMHS 3 หลายเดือนก่อน
thank you very much
@miteshkumarsingh ปีที่แล้ว
Yes please create RFE and Wrapper video
@sadhnarai8757 2 ปีที่แล้ว
Very good Aman
@UnfoldDataScience 2 ปีที่แล้ว
Thank you
@mayankmehta8480 2 ปีที่แล้ว
Sir your videos are very very helpful
@UnfoldDataScience 2 ปีที่แล้ว
Thanks Mayank.
@priyankathakur1691 ปีที่แล้ว
How to find the target variable and feature selection when there are multiple numerical and categorical variables?
@keshavsingh6208 2 ปีที่แล้ว
How we will select optimal threshold value...
@UnfoldDataScience 2 ปีที่แล้ว
There can't be one size fits all kind of value, it will depend on how many features you are loosing on a threhold, how may u think u need based on domain understanding, multiple things come into picture. So start from zero and move little up and see how it is coming
@Ankitsharma-vo6sh 2 ปีที่แล้ว
yes please cover the wrapper and embedded
@UnfoldDataScience 2 ปีที่แล้ว
Thanks Ankit. Sure.
@saminaqadir3382 2 ปีที่แล้ว
Kindly shre the google drive link for this code
@UnfoldDataScience 2 ปีที่แล้ว
github.com/UnfoldDataScience
@nagamaninagamani7057 ปีที่แล้ว
Sir we want feature selection methods machine learning using pytho we want fast
@onlinearchitecturalservice7993 2 ปีที่แล้ว
filter category
@UnfoldDataScience 2 ปีที่แล้ว
Sure
@dasgupts10 2 ปีที่แล้ว ⁺¹
Hi Aman, we use chi2 test on top of categorical variables. But here petal length and width are numerical variable. Can you please explain this?

ต่อไป

เล่นอัตโนมัติ

Recursive Feature Elimination Technique | Recursive feature elimination in machine learning