Thanks for checking out this video. Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg Want to solve Python data interview questions: stratascratch.com/?via=ryan I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com *Both Datacamp and Stratascratch are affiliate links.
If is relevant at all I would recommend that if you are zooming in the screen then move the zoom towards the same position you are reading or talking about, often in the video the zoom wasn't relevant
Hey guys I hope you enjoyed the video! If you did please subscribe to the channel! Here is the Kaggle Notebook: www.kaggle.com/code/ryannolan1/kaggle-housing-youtube-video I do plan on updating it + adding more notes/comments to it. Also practically everything I covered in this project is on the channel. You can find the videos in this playlist: th-cam.com/video/SjOfbbfI2qY/w-d-xo.html&ab_channel=RyanNolanData Up next I'm working on a Python Classes course and the start of a series on Deep Learning!
Ok, dude... I haven't even watched the video yet. I'm just here to say that on my way home from work today I was thinking about doing this EXACT project and I completely forgot about. All of a sudden your video pops up on my feed... Yo, Data science out hear reading minds!
This is fantatsic and Ive subscribed to your channel. Im only new to this but people like you who spend their time creating videos like this are commendable. I hope to give back like this one day. Also, you mentioned someone on Kaggle that you got some tips from. Who was that? Im fascinated to know who has more knowledge than someone like you that has heaps
The video is great, you earned a new subscriber. Maybe I didn't focused or my understanding is little, Can you please write in short, why you did box plot for the categorical columns? Because it looked like you are only filling values with 'no' and '0'. Thank you
thanks a lot for this project, learnt great knowledge (especially about stacked regressors and voting regressors) and how to filter out outliers and fill na values using description.txt and a little worldly experience/knowledge.
First off, I want to say great video this really helped me in getting down a good workflow for kaggle cometitions. But also, doesn't doing train_test_split after all the preprocessing is done cause a risk of data leakage? I recently finished the intermediate machine learning course on Kaggle and one of the section really emphasized that unless you're passing your pipeline and model into cross validation, preprocessing should always be done AFTER train_test_split. To my understanding, by preprocessing first and then splitting, this means that our model is being trained on scaled, imputed, and encoded values and the validation data is also preprocessed. So the model will perform well during validation, but when it is exposed to the test_data which does not have scaled or imputer values, it will perform poorly. Am I missing something here? I'm only 2 hours and 15 minutes into the video so if he addresses this later, sorry!
You are absolutley right, preprocessing should be done after splitting the dataset. I am also learning for kaggle competitions, is the kaggle course good? Can you recommend any other sources?
Thank you for creating this video. Can you expand more on why you did not include both Lasso and ElasticNet at the 2:25:10 mark? I'm curious if it made the Stacking Regressor worse at the very end in your original notebook.
I'm trying to build something similar but instead of prediction they have asked me to explain house price- A data science model that explains how different factors(gpd, unemployment, interest rate etc ) impacted home prices over the last 20 years. Any suggestions on what type of model should I use for this problem
I'm typing in the code and following along, however, I'm not sure what you do to retrieve all the data? What should I press after I've entered the code? Example: train_df.dtypes[train_df.dtypes != 'object'] and then? Thanks!
Doubt In this question when we have already given the distinct train and test data separately. Then why do u perform an additional split using train_test_split ?
Thanks for checking out this video.
Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg
Want to solve Python data interview questions: stratascratch.com/?via=ryan
I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
*Both Datacamp and Stratascratch are affiliate links.
Ignoring about the bad video cropping, You are an awesome dude!
If is relevant at all I would recommend that if you are zooming in the screen then move the zoom towards the same position you are reading or talking about, often in the video the zoom wasn't relevant
Hey guys I hope you enjoyed the video! If you did please subscribe to the channel!
Here is the Kaggle Notebook: www.kaggle.com/code/ryannolan1/kaggle-housing-youtube-video
I do plan on updating it + adding more notes/comments to it.
Also practically everything I covered in this project is on the channel. You can find the videos in this playlist: th-cam.com/video/SjOfbbfI2qY/w-d-xo.html&ab_channel=RyanNolanData
Up next I'm working on a Python Classes course and the start of a series on Deep Learning!
You are the best teacher. Keep it up, once I started Kaggle but have not made any competition..But this seems to encourage to consider that.
You got this!
a lot of effort put in this video. thanks! in future videos make sure to keep whole of your screen in the video
I just finished it. Dope... Thanks so much.
Sweet
Ok, dude... I haven't even watched the video yet. I'm just here to say that on my way home from work today I was thinking about doing this EXACT project and I completely forgot about. All of a sudden your video pops up on my feed... Yo, Data science out hear reading minds!
Haha awesome! Hope you enjoy it
This is fantatsic and Ive subscribed to your channel. Im only new to this but people like you who spend their time creating videos like this are commendable. I hope to give back like this one day. Also, you mentioned someone on Kaggle that you got some tips from. Who was that? Im fascinated to know who has more knowledge than someone like you that has heaps
The video is great, you earned a new subscriber.
Maybe I didn't focused or my understanding is little, Can you please write in short, why you did box plot for the categorical columns? Because it looked like you are only filling values with 'no' and '0'. Thank you
thanks a lot for this project, learnt great knowledge (especially about stacked regressors and voting regressors) and how to filter out outliers and fill na values using description.txt and a little worldly experience/knowledge.
Np make sure to join our discord!
First off, I want to say great video this really helped me in getting down a good workflow for kaggle cometitions.
But also, doesn't doing train_test_split after all the preprocessing is done cause a risk of data leakage?
I recently finished the intermediate machine learning course on Kaggle and one of the section really emphasized that unless you're passing your pipeline and model into cross validation, preprocessing should always be done AFTER train_test_split.
To my understanding, by preprocessing first and then splitting, this means that our model is being trained on scaled, imputed, and encoded values and the validation data is also preprocessed.
So the model will perform well during validation, but when it is exposed to the test_data which does not have scaled or imputer values, it will perform poorly.
Am I missing something here?
I'm only 2 hours and 15 minutes into the video so if he addresses this later, sorry!
You are absolutley right, preprocessing should be done after splitting the dataset.
I am also learning for kaggle competitions, is the kaggle course good? Can you recommend any other sources?
Thank you for creating this video. Can you expand more on why you did not include both Lasso and ElasticNet at the 2:25:10 mark? I'm curious if it made the Stacking Regressor worse at the very end in your original notebook.
If I remember correctly it made the results worse when submitting the results. I had kept a spreadsheet with all my attempts.
Shouldn't we use IQR/boxplot to check for outliers? Is outliers referring to outliers of the distribution or outliers of the relationship?
Thankk you so much
No problem
Hey Nolan Do you have separate tutorials for every machine learning model you used in this tutorial?
Yes
I'm a little bit over an hour in and good video so far! I think you could have saved a lot of time doing many things programmatically so far though.
I agree with you, it’s not the cleanest code
Your videos are great. I just love this channel. It's just that kndly try to focus the recording on the code when you are typing. 🙂
Thank you
i knew u looked familiar and then saw the vintage cards in the back Lol, im subscribed to ur card channel too
No way haha first dual subscriber
Thanks for the video, it was awesome.
I really appreciate it
I'm trying to build something similar but instead of prediction they have asked me to explain house price-
A data science model that explains how different factors(gpd, unemployment, interest rate etc ) impacted home prices over the last 20 years.
Any suggestions on what type of model should I use for this problem
I would look at implementing Principal Component Analysis to see what has the biggest impact
With some tuning i got 0.018
Can you make more such competition videos cause i love it.
Nice job! And someday. I want to finish building out my OpenAi/Langchain playlist and then work on a dbt one first
can i see your code for learning purpose
Can you please post your kaggle notebook?
Thanks man, it help me a lot
Glad it helped
hey thank you for this amazing vid.
Glad you liked it!
You earned a subscriber :)
Thanks
I'm typing in the code and following along, however, I'm not sure what you do to retrieve all the data? What should I press after I've entered the code? Example: train_df.dtypes[train_df.dtypes != 'object'] and then? Thanks!
Just run the code
Shift + enter
Or alt + enter
If you are coming back to the notebook after a while . You need to clicck run all
sir why don't you just use r2 score instead of MSE?
how can i do it in some environment like vs code ....i need to submit it?
Doubt In this question when we have already given the distinct train and test data separately. Then why do u perform an additional split using train_test_split ?
So I can get a better model for my train set
Think of train, test, validation
@@RyanAndMattDataScience Shouldn't you do the imputation after you split your data into training and test sets to avoid data leakage?
r^2 score = -4.0019e+19 i think you wrong somewhere
Very instructive video but the zoom was a bit off
Have all the code in the description if anything is off
i try to run the train_df.columns and test_df.columns but i get a namerror saying train and test not defined ..pplease what could be wrong
you probably haven't initialised the variables...read them from the csv provided using pd.read_csv()
Timestamp for personal purpose : 47:00
You got this!
can you do any timeseries for the next time?
Next year for sure! I’m currently studying Deep Learning
Bro this is going to seem very dumb...why do you plot any parameter against the price in the y axis? why not use id?
waaaaay too time consuming and too much time spent on data exploration
Feel free to fast forward