Kaggle's 30 Days Of ML (Competition Part-2): Feature Engineering (Categorical & Numerical Variables)

Abhishek Thakur

มุมมอง 13 854

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 ม.ค. 2025

ความคิดเห็น • 47

@abhishekkrthakur 3 ปีที่แล้ว ⁺²¹
Notebook is here: www.kaggle.com/abhishek/competition-part-2-feature-engineering
like, subscribe and share to help me keep motivated to make more amazing videos like this one ;)
@linnhtet001 3 ปีที่แล้ว ⁺⁴
Thank you for doing this. Your videos guide me and help learn many new things. I was self studying way before the 30 days challenge with kaggle learn but don't know where to go or where to start even after finishing the kaggle micro courses. I really appreciate all the great work you're doing for this community. Thank you, Sir.
@yogitad4136 3 ปีที่แล้ว ⁺³
Thank you, Abhishek. This helps a lot. Highly appreciate the time and effort that you put into the creation of these videos.
@longqua69 3 ปีที่แล้ว ⁺¹
Thank you so much, this is really helpful. There aren't many Machine Learning practical tutorials like yours. I regret that you did not start to record videos like this one sooner.
@kamalchapagain8965 3 ปีที่แล้ว ⁺¹
Great job Abhishek sir. Really fruitful.
@lucaspimentel1375 2 ปีที่แล้ว
Going through your book while going through these videos at the same time is like next level learning
@malawad 3 ปีที่แล้ว ⁺¹
I needed this , i sooo very badly needed this . thank you so very much Abhishek ❤️
@abirhasanx 3 ปีที่แล้ว
I'm learning so much from these videos, thank you so much
@sujitmohapatra4978 3 ปีที่แล้ว ⁺⁶
I feel the numerical features are already standardized.
@user-or7ji5hv8y 3 ปีที่แล้ว
This is like learning from the master of the craft.
@snitox ปีที่แล้ว
You know how you get so attuned to DS that you can listen to these like podcasts and not even have to look at the notebook to know what's going on.
@seemasharma-mn7fk 3 ปีที่แล้ว
Your videos are very helpful and make learning new topics and concepts so much easier! Thank you!
@ninaddate3756 3 ปีที่แล้ว
First of all Thank you so so much for all your videos related to the course topics and now these ones for providing additional understanding for the competition. I have one question - you said that along with feature engineering we need to do hyper-parameter tuning, typically, do we need to tune model differently when we use the different techniques or we can apply same for all methods?
@purposeoriented6094 3 ปีที่แล้ว
Thank you so much for the lessons...
@Orchishman 3 ปีที่แล้ว ⁺²
Before we concat the categorical cols back to the dataset after OHE, don't we need to drop those categorical cols from the DF first? Or does that not really affect the model predictions?
@theonlypicklericktheonlypi2963 3 ปีที่แล้ว
Learning from the best, as it should be done! I have a small query, are we free to create new features however we want to ? As long as our logic holds and it makes sense to the model, can we create new features independently without any restrictions or should we just follow some basic rules while creating one without experimenting too much on how to create one?
@code4u941 3 ปีที่แล้ว
Hi, great video Learning a lot from this.
one thing, interaction_only = True removes a**2 and b**2 so we are left with :- 1, a, b, ab
When we concat this with original dataframe doesn't it creates duplicates of a and b as a and b were already there.
@abhishekkrthakur 3 ปีที่แล้ว ⁺¹
yes. you should remove a & b :)
@GAURAVSINGH-nu2cu 3 ปีที่แล้ว
Thanks a lot Sir, It was really helpful. 👍
@shashihnt 3 ปีที่แล้ว
Thank you it was really informative video. Do you think it’s okay to generate features by using frequency encoding of categorical features ?
@abhaykshirsagar1166 3 ปีที่แล้ว ⁺¹
Hey, can I use feature compression for the cont columns using something like PCA?
@abhishekkrthakur 3 ปีที่แล้ว
yeah sure. feel free to use whatever works!
@AtulSharma-gf3tt 3 ปีที่แล้ว
Thankyou so much sir for this helpful containt. Sir can you please make video of Data visualisation day of Kaggle Competition I am confuse in final project part of Data visualization please sir help me
@aykutcayir64 3 ปีที่แล้ว ⁺¹
Hi Abhishek,
I think there is a logical mistake when you use the generated features coming from groupby methods. You should have used the same groupby value obtained from the training set for training, validation, and test sets because the numbers of A and B are different for training and test sets.
@abhishekkrthakur 3 ปีที่แล้ว ⁺³
yes you should! :)
you should apply the groupby functuons in the training loop on training set and then use same values for validation and test sets! thanks!
@dharitsura1520 3 ปีที่แล้ว
@Abhishek Thakur , For calculating the RSME I guess np.sqrt is missing? Am I missing out something?
@abhishekkrthakur 3 ปีที่แล้ว ⁺¹
squared=False
@heyrobined 3 ปีที่แล้ว
Thanks
@alikayhanatay9080 3 ปีที่แล้ว
Does scaling really matters with tree-based algorithms. In logical sense it shouldn't differ scaled or not.
thank you for videos:)
@abhishekkrthakur 3 ปีที่แล้ว ⁺¹
nope. it doesnt.
@fmussari 3 ปีที่แล้ว
Great video, learning a lot, thanks!
I think that in the case of polynomial encoding we need to drop numerical columns before concat. Same with One Hot Encoding, we need to drop categorical columns as you did in previous videos. Am I right? Thanks again.
@abhishekkrthakur 3 ปีที่แล้ว ⁺³
thanks.
for polynomial features, i use interaction only. so i dont drop original features. you can vhoose what to drop and what to keep. its totally upto you and the model. so, choose what fits and improves the model :)
@fmussari 3 ปีที่แล้ว
@@abhishekkrthakur Great! Understood, thanks a lot.
@pujaurfriend 3 ปีที่แล้ว
Thanks Abhishek :) it was a wonderful teaching . I have 2 things to ask here, 1 is in polynomial feature engineering we used fit transform for test data also, shouldnt it be only trnasform? also we havent transform validation data there, it should be done right..if not what is the reason. please reply. thanks
@abhishekkrthakur 3 ปีที่แล้ว ⁺¹
polynomial features are not "learnt", they are just arithematic operations on columns, so you dont need to fit on train and transform test/valid. you can fit_transform everything in case of polynomial features.
@AAGLeon 3 ปีที่แล้ว
@@abhishekkrthakur Shouldn't we concatenate our new poly-cols into the old df-s, like:
df = df.drop(numerical_cols, axis=1)
df_test = df_test.drop(numerical_cols, axis=1)
df = pd.concat([df, df_poly], axis=1)
df_test = pd.concat([df_test, df_test_poly], axis=1)
@abhishekkrthakur 3 ปีที่แล้ว
@@AAGLeon yes. did i miss it? 😱
@abhishekkrthakur 3 ปีที่แล้ว
@@AAGLeon its at 22:17 :)
@pujaurfriend 3 ปีที่แล้ว
@@abhishekkrthakur Thank you very much for explaining
@md.al-imranabir2011 3 ปีที่แล้ว
`test_poly = poly.fit_transform(df_test[numerical_cols])` - won't the use of `fit` method here cause data leakage?
@abhishekkrthakur 3 ปีที่แล้ว ⁺¹
nope. polynomial features are simple artithematic operations and are not "learnt"
@md.al-imranabir2011 3 ปีที่แล้ว
@@abhishekkrthakur Thanks.
@rajeshyalla9512 3 ปีที่แล้ว
Hello sir I am new to kaggle
And when I tried your code I am getting like this "Your Notebook tried to allocate more memory than available. It has been restarted."
@shreyasat27 3 ปีที่แล้ว
i am getting one error name 'gpu_predictor' is not defined
@abhishekkrthakur 3 ปีที่แล้ว
put it in quotes.
@RaushanKumar-qb3de 3 ปีที่แล้ว
👏🙌🤝
@GaelGendre 3 ปีที่แล้ว
It was worst with the normalizer

ต่อไป

เล่นอัตโนมัติ

Kaggle's 30 Days Of ML (Competition Part-3): What is Target Encoding and how does it work?