To learn more about Pipeline, watch this video: th-cam.com/video/1Y6O9nCo0-I/w-d-xo.html To learn more about ColumnTransformer, watch this video: th-cam.com/video/NGq8wnH5VSo/w-d-xo.html Thanks for watching! 🙌
One more important difference: make_column_transformer results in a NumPy array, and you need to manually convert the array back to Pandas dataframe, as the next transformation will work only on Pandas dataframe. However, if you use ColumnTransformer, you can set the output type with the "set_output" method, which is cool: ----------------------- # create a OneHotEncoder unord_encoder = OneHotEncoder(sparse_output = False) # create a column transformer transformer_unordered = ColumnTransformer(transformers = [ ('unord', unord_encoder, unordered_features) ], remainder='passthrough', verbose_feature_names_out = False) # output should be a pandas dataframe transformer_unordered.set_output(transform = 'pandas')
Any suggestions on using XGBregressor() for regression problem with only categorical features (multiple categories as well as Binary categories) and no numerical features. No of features around 400 nos.Is there any better recommended model. I was feeding the model after PCA trasnformation. Getting low performance of the model. Thanks in advance...
I love the way you draw down things into simple and easy understanding. Can you please start with start videos on linear regression? That would be a great help. Please do reply awaiting for linear regression playlist :)
I think i've found the right teacher. Data School is the Best channel for data science education. Im suscribed. Thank you for sharing your knowledgement. Cheers, from Argentina
Hello Kevin, Is it recommend to code all the machine learning algorithms from scratch so that I can learn math behind it or just understand and start to code?
Next, you would fit the pipeline and then use the fitted pipeline to make predictions. See this video for more: th-cam.com/video/1Y6O9nCo0-I/w-d-xo.html
Hello Mr. Kevin. Great video as always! I had a question. I constantly see the same examples with people passing in StandardScaler() or PCA() into the Pipeline. Can you please explain how you would go about adding your own function? The perfect example would be the titanic dataset. In my preprocessing pipeline I would like to add a function that creates a new column (called "Child") and I would add whether that record was a child or not (aka if hes/she under or above the age of 13) Thank you!!
Great question! You can create a custom transformer from a function using FunctionTransformer, and then include that in your ColumnTransformer. That's how you can do feature engineering within a scikit-learn Pipeline. Hope that helps! I actually cover a number of examples of this in my course: gumroad.com/l/ML-course?variant=Live%20Course%20%2B%20Advanced%20Course
Hi Kevin. Thanks for the great vids on ML and DS you've been giving us. Quality stuff, indeed. I'd like to ask you a question that's not too much related to the topic you're discussing, though. What do you think one has to do to excel as a data scientist or machine-learning professional? Apart from learning from your courses, which is obvious, what other absolutely essential resources (studying of which would not take forever, though) would you recommend? Sorry for this long question and a bit off-topic. I really like the way you teach and it's immediately seen you are a master of this trade.
Thanks for your kind words! As for your question, it's so hard to say because it depends on your background, experience, interests, goals, etc. Sorry I can't be of more help!
To learn more about Pipeline, watch this video: th-cam.com/video/1Y6O9nCo0-I/w-d-xo.html
To learn more about ColumnTransformer, watch this video: th-cam.com/video/NGq8wnH5VSo/w-d-xo.html
Thanks for watching! 🙌
One more important difference:
make_column_transformer
results in a NumPy array, and you need to manually convert the array back to Pandas dataframe,
as the next transformation will work only on Pandas dataframe.
However, if you use ColumnTransformer,
you can set the output type with the
"set_output" method, which is cool:
-----------------------
# create a OneHotEncoder
unord_encoder = OneHotEncoder(sparse_output = False)
# create a column transformer
transformer_unordered = ColumnTransformer(transformers =
[ ('unord', unord_encoder, unordered_features) ],
remainder='passthrough', verbose_feature_names_out = False)
# output should be a pandas dataframe
transformer_unordered.set_output(transform = 'pandas')
Right! That's available in newer versions of scikit-learn.
Amazing tip for reducing the coding effort and Thanks for the same...
Any suggestions on using XGBregressor() for regression problem with only categorical features (multiple categories as well as Binary categories) and no numerical features. No of features around 400 nos.Is there any better recommended model. I was feeding the model after PCA trasnformation. Getting low performance of the model. Thanks in advance...
It's hard to say, I'm sorry!
I love the way you draw down things into simple and easy understanding. Can you please start with start videos on linear regression? That would be a great help. Please do reply awaiting for linear regression playlist :)
Thanks for your suggestion!
I think i've found the right teacher. Data School is the Best channel for data science education. Im suscribed. Thank you for sharing your knowledgement. Cheers, from Argentina
Wow, thank you so much for your kind words! I truly appreciate it!
we want more tips for scikit learn :)
Yes! More tips should be coming soon... stay tuned!
Hello Kevin,
Is it recommend to code all the machine learning algorithms from scratch so that I can learn math behind it or just understand and start to code?
Great question! It depends on your ultimate goals, but generally I recommend the latter. Hope that helps!
I appreciate the videos done! But how should I proceed now with your example if I want to see the output?
Next, you would fit the pipeline and then use the fitted pipeline to make predictions. See this video for more: th-cam.com/video/1Y6O9nCo0-I/w-d-xo.html
Good clarfication... as this can be a little confusing in the beginning...
Glad it was helpful to you!
Hello Mr. Kevin. Great video as always!
I had a question. I constantly see the same examples with people passing in StandardScaler() or PCA() into the Pipeline. Can you please explain how you would go about adding your own function? The perfect example would be the titanic dataset. In my preprocessing pipeline I would like to add a function that creates a new column (called "Child") and I would add whether that record was a child or not (aka if hes/she under or above the age of 13)
Thank you!!
Great question! You can create a custom transformer from a function using FunctionTransformer, and then include that in your ColumnTransformer. That's how you can do feature engineering within a scikit-learn Pipeline. Hope that helps! I actually cover a number of examples of this in my course: gumroad.com/l/ML-course?variant=Live%20Course%20%2B%20Advanced%20Course
Hi Kevin. Thanks for the great vids on ML and DS you've been giving us. Quality stuff, indeed. I'd like to ask you a question that's not too much related to the topic you're discussing, though. What do you think one has to do to excel as a data scientist or machine-learning professional? Apart from learning from your courses, which is obvious, what other absolutely essential resources (studying of which would not take forever, though) would you recommend? Sorry for this long question and a bit off-topic. I really like the way you teach and it's immediately seen you are a master of this trade.
Thanks for your kind words! As for your question, it's so hard to say because it depends on your background, experience, interests, goals, etc. Sorry I can't be of more help!
Good video
Thank you!