Thank you for the video on mllib, I haven't watch much yet, but looks promising. The machine learning stuff starts about 12:00 - the beginning is a warm up to PySpark. Chapters/timestamps would have been helpful for a 40 min video (with each chapter being a different stage in the process or function).
Great tutorial, Greg - really appreciate how you distilled such a comprehensive overview into a single video. Would you consider doing a video showing how to create a complete ML pipeline -- i.e., using output from Imputer(), StringIndexer(), OneHotEncoderEstimator(), VectorAssembler(), and VectorIndexer() -- for a dataset with multiple categorical and numerical features?
Thank you I just found one thing is confusing which is that you did the standard scaling AFTER the merging into one column shouldn't have you done it for each column before the merging?
@@GregHogg Thankyou for your reply I have done an experiment; in order to observe, I tried two features with large difference in values and use 5 million rows It seems even if we merge all the features before applying the scaling it will still calculate the parameters (mean & STD dev) for each feature In summary, you did NOT make any mistake
Take my courses at mlnow.ai/!
Thank you for the video on mllib, I haven't watch much yet, but looks promising.
The machine learning stuff starts about 12:00 - the beginning is a warm up to PySpark. Chapters/timestamps would have been helpful for a 40 min video (with each chapter being a different stage in the process or function).
Great tutorial, Greg - really appreciate how you distilled such a comprehensive overview into a single video. Would you consider doing a video showing how to create a complete ML pipeline -- i.e., using output from Imputer(), StringIndexer(), OneHotEncoderEstimator(), VectorAssembler(), and VectorIndexer() -- for a dataset with multiple categorical and numerical features?
Thx Greg ! It's a very good tutorial from pyspark ! comprehensive with a lot of examples
Glad to hear it :)
Thank you for this tutorial on PySpark !
You're very welcome 🙂
Good information Greg! Thanks for sharing.
Glad to hear it! You're very welcome
Thanks!
Greg! You're too nice hahaha
Thanks. That was pretty comprehensive.
Glad to hear it!
Fantastic tutorial.
Thank you
I just found one thing is confusing
which is that you did the standard scaling AFTER the merging into one column
shouldn't have you done it for each column before the merging?
I don't remember sorry. But you're probably right
@@GregHogg Thankyou for your reply
I have done an experiment; in order to observe, I tried two features with large difference in values
and use 5 million rows
It seems even if we merge all the features before applying the scaling it will still calculate the parameters (mean & STD dev) for each feature
In summary, you did NOT make any mistake
This is really helpful.
Thank U
Super glad to hear it, you're very welcome! Thanks so much for the support ❤️
Oh awesome thanks!
No prob 😊
I wish I had seen this when I took Econ 424(ml) at uw😂
los well well i think u didint search enough