(Part 1) Using Column Transformer for making Machine Learning workflow easy | Machine Learning

Rachit Toshniwal

มุมมอง 6 290

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 18 พ.ย. 2024

ความคิดเห็น • 40

@amolkabugade3728 3 ปีที่แล้ว
very nicely explained in a very smooth manner.......thank you so much sir..
@rachittoshniwal 3 ปีที่แล้ว
Thanks Amol! Sir mat bolo 😅
@skyrayzor3693 ปีที่แล้ว
This tutorial is awesome!!
@deepanshumahour3318 3 ปีที่แล้ว ⁺¹
well explained!!!
Please Keep this work up,
I hope your channel will grow rapidly
@ShubhamKumar-xy6kj ปีที่แล้ว
Great video bro...
@KumarHemjeet 3 ปีที่แล้ว
This is amazing..please keep making videos..don't stop !
@rachittoshniwal 3 ปีที่แล้ว
Haha! Thanks!
@owusubright1046 3 ปีที่แล้ว
I have seen a lot of youtube channels which are very good and have many content but bro you channel conquers them all. Please do more videos on other fields of Machine Learning and Deep Learning. Thanks and my respect to you bro.
@rachittoshniwal 3 ปีที่แล้ว
Thanks man! Appreciate it :)
@imdadood5705 3 ปีที่แล้ว
Just what I needed!
@roshantonge1952 ปีที่แล้ว
very good video
@slimmoses3376 3 ปีที่แล้ว
This video helped me so much. Keep up the awesome work!
@rachittoshniwal 3 ปีที่แล้ว
Thanks! I'm glad it helped!
@owusubright1046 3 ปีที่แล้ว ⁺¹
Nice course man well done. Well explained everything thanks for such good content.
@dipanwitasarkar5185 3 ปีที่แล้ว ⁺¹
I got immense and deep understanding of how I can make life easier with sklearn ColumnTransformer. Thank you so much for the video.
If you can kindly comment on how to get back the column names in original dataframe once encoding is done.
@rachittoshniwal 3 ปีที่แล้ว ⁺¹
Hi, Dipanwita, I'm glad it helped!
Getting back the columns names is a little tricky, but possible nonetheless. Every column data can be extracted from the "transformers_" attribute of our 'ct' column transformer object (in the video)
in the 'ct', RobustScaler is the first t/f, hence the first 6 columns in the output are of RS. To extract those, we'd need to do something like:
a = ct.transformers_[0][2]
next is OneHotEncoder in the 'ct', hence to get those columns, we'd need to do:
b = ct.transformers_[1][1].get_feature_names( )
We're dropping the remaining columns, hence "a + list(b)" should give us the full list of columns in the correct order of output.
in our case, remainder was "drop", but if it was passthrough, those columns would be situated at the very last in the output df
To get them, we'd do:
c = ct.transformers_[2][2], and this 'c' now contains the index positions of the columns in the original dataframe which were passed through. In our case, it is index 3, hence df.columns[3] is the passthrough column, and we can append this column to the "a + list(b)" list (IFF remainder was "passthrough")
hope it helps!
@DeenQuery 3 ปีที่แล้ว
how lovely :-) thank you so very much
@nikitanaidu1651 3 ปีที่แล้ว
Great content!
@rachittoshniwal 3 ปีที่แล้ว
Thanks Nikita, I'm glad it helped! :)
@olatheog ปีที่แล้ว
This is such a great video. I am just sad you did not end it with fitting a model and training after transforming as that is where I have problems. Is there another video of yours where you did that? I would really appreciate. Thank you
@rachittoshniwal ปีที่แล้ว ⁺¹
Thanks! I do have a couple of end to end project videos where I've fitted models after transforming. Hope they help!
@ashishsikarwar7578 3 ปีที่แล้ว
Thank you Rachit for sharing such a great content. I am new to machine learning, can you do a video on "from applying ColumnTransformer on categorical values and then all the way to use them for Linear regression and other algorithms/models"
@rachittoshniwal 3 ปีที่แล้ว
Hi, i have done a similar video here: th-cam.com/video/wXQRLpDF-ms/w-d-xo.html
hope it helps!
@ashishsikarwar7578 3 ปีที่แล้ว ⁺¹
@@rachittoshniwal Will check out, thank you very much!
@martinbielke8301 4 ปีที่แล้ว
great tutorial!
@rachittoshniwal 4 ปีที่แล้ว
Thanks Martin! Appreciate that!
@martinbielke8301 4 ปีที่แล้ว
@@rachittoshniwal You are welcome. Do you have something on "feature importance"? If not a tutorial maybe some web page that you could recommend? I'd appreciate that very much.
@rachittoshniwal 4 ปีที่แล้ว
@@martinbielke8301 I'll definitely make one on feature importances. But for now, you can have a look at these excellent links:
mljar.com/blog/feature-importance-in-random-forest/
machinelearningmastery.com/calculate-feature-importance-with-python/
@ankitlakshya450 2 ปีที่แล้ว
Can we perform this feature engineering before train test split or is it mandatory to do it after train test split
@JavedKhan-nr2oo 2 ปีที่แล้ว
sir coloumn transfer can we use oridinal encoding label encoding and one hot encodig can u please explain Thank you
@vish183 3 ปีที่แล้ว
Cant thank you enough for the knowledge imparted. Kudos !!! . A suggestion - Am looking at a variable which needs imputation before One Hot Encoding. Can i perform both the steps in a single code of column transformer or should there be multiple column transformers, which would later be combined using Pipeline functionality?? Please help
@rachittoshniwal 3 ปีที่แล้ว
Thanks man! Appreciate that!
I cover exactly this in this video:
th-cam.com/video/a6o9ies85eM/w-d-xo.html
Have a look , and hope it helps!
@ajaykushwaha-je6mw 3 ปีที่แล้ว
A question, why we are not using CT for hours_per_week ?
@rachittoshniwal 3 ปีที่แล้ว
I just wanted to demonstrate how we can exclude some columns from the transformations and pass them unfiltered. No other reason really.
@amolkabugade3728 3 ปีที่แล้ว
any way to reach you..
@rachittoshniwal 3 ปีที่แล้ว
Sure, here! www.linkedin.com/in/rachit-toshniwal
@hudaali5708 3 ปีที่แล้ว
How can I get the names of the columns back? :"""""(
Please HELP!!!!!!!!!!!!!
@rachittoshniwal 3 ปีที่แล้ว ⁺¹
Getting back the columns names is a little tricky, but possible nonetheless. Every column data can be extracted from the "transformers_" attribute of our 'ct' column transformer object (in the video)
in the 'ct', RobustScaler is the first t/f, hence the first 6 columns in the output are of RS. To extract those, we'd need to do something like:
a = ct.transformers_[0][2]
next is OneHotEncoder in the 'ct', hence to get those columns, we'd need to do:
b = ct.transformers_[1][1].get_feature_names( )
We're dropping the remaining columns, hence "a + list(b)" should give us the full list of columns in the correct order of output.
in our case, remainder was "drop", but if it was passthrough, those columns would be situated at the very last in the output df
To get them, we'd do:
c = ct.transformers_[2][2], and this 'c' now contains the index positions of the columns in the original dataframe which were passed through. In our case, it is index 3, hence df.columns[3] is the passthrough column, and we can append this column to the "a + list(b)" list (IFF remainder was "passthrough")
hope it helps!
@ajaykushwaha-je6mw 3 ปีที่แล้ว
o=OneHotEncoder(drop=First) # this will drop 1 label from each Feature.
@rachittoshniwal 3 ปีที่แล้ว
Yes it will

ต่อไป

เล่นอัตโนมัติ

Using Pipeline for making Machine Learning workflow easy | Machine Learning