Data Science Beginner Project: Kaggle House Prices Regression Analysis (Full Walkthrough)

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ต.ค. 2024

ความคิดเห็น • 67

  • @RyanAndMattDataScience
    @RyanAndMattDataScience  2 หลายเดือนก่อน

    Thanks for checking out this video.
    Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
    If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg
    Want to solve Python data interview questions: stratascratch.com/?via=ryan
    I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
    *Both Datacamp and Stratascratch are affiliate links.

  • @satvik4225
    @satvik4225 หลายเดือนก่อน +4

    Ignoring about the bad video cropping, You are an awesome dude!

  • @TheErick211_
    @TheErick211_ 6 หลายเดือนก่อน +12

    If is relevant at all I would recommend that if you are zooming in the screen then move the zoom towards the same position you are reading or talking about, often in the video the zoom wasn't relevant

  • @RyanAndMattDataScience
    @RyanAndMattDataScience  11 หลายเดือนก่อน +7

    Hey guys I hope you enjoyed the video! If you did please subscribe to the channel!
    Here is the Kaggle Notebook: www.kaggle.com/code/ryannolan1/kaggle-housing-youtube-video
    I do plan on updating it + adding more notes/comments to it.
    Also practically everything I covered in this project is on the channel. You can find the videos in this playlist: th-cam.com/video/SjOfbbfI2qY/w-d-xo.html&ab_channel=RyanNolanData
    Up next I'm working on a Python Classes course and the start of a series on Deep Learning!

  • @kwizeralambert1316
    @kwizeralambert1316 11 หลายเดือนก่อน +6

    You are the best teacher. Keep it up, once I started Kaggle but have not made any competition..But this seems to encourage to consider that.

  • @ilyosjonnishanov4533
    @ilyosjonnishanov4533 2 หลายเดือนก่อน +4

    a lot of effort put in this video. thanks! in future videos make sure to keep whole of your screen in the video

  • @elfincredible9002
    @elfincredible9002 7 หลายเดือนก่อน +3

    I just finished it. Dope... Thanks so much.

  • @mattadata
    @mattadata 11 หลายเดือนก่อน +1

    Ok, dude... I haven't even watched the video yet. I'm just here to say that on my way home from work today I was thinking about doing this EXACT project and I completely forgot about. All of a sudden your video pops up on my feed... Yo, Data science out hear reading minds!

  • @mgrahamization
    @mgrahamization 5 หลายเดือนก่อน +1

    This is fantatsic and Ive subscribed to your channel. Im only new to this but people like you who spend their time creating videos like this are commendable. I hope to give back like this one day. Also, you mentioned someone on Kaggle that you got some tips from. Who was that? Im fascinated to know who has more knowledge than someone like you that has heaps

  • @miftahuladib-n2v
    @miftahuladib-n2v 6 วันที่ผ่านมา

    The video is great, you earned a new subscriber.
    Maybe I didn't focused or my understanding is little, Can you please write in short, why you did box plot for the categorical columns? Because it looked like you are only filling values with 'no' and '0'. Thank you

  • @AdityaSharma-f3x
    @AdityaSharma-f3x 18 วันที่ผ่านมา

    thanks a lot for this project, learnt great knowledge (especially about stacked regressors and voting regressors) and how to filter out outliers and fill na values using description.txt and a little worldly experience/knowledge.

  • @kr0ssov3r66
    @kr0ssov3r66 หลายเดือนก่อน +1

    First off, I want to say great video this really helped me in getting down a good workflow for kaggle cometitions.
    But also, doesn't doing train_test_split after all the preprocessing is done cause a risk of data leakage?
    I recently finished the intermediate machine learning course on Kaggle and one of the section really emphasized that unless you're passing your pipeline and model into cross validation, preprocessing should always be done AFTER train_test_split.
    To my understanding, by preprocessing first and then splitting, this means that our model is being trained on scaled, imputed, and encoded values and the validation data is also preprocessed.
    So the model will perform well during validation, but when it is exposed to the test_data which does not have scaled or imputer values, it will perform poorly.
    Am I missing something here?
    I'm only 2 hours and 15 minutes into the video so if he addresses this later, sorry!

    • @aumthakkar3737
      @aumthakkar3737 หลายเดือนก่อน

      You are absolutley right, preprocessing should be done after splitting the dataset.
      I am also learning for kaggle competitions, is the kaggle course good? Can you recommend any other sources?

  • @mattysmirks
    @mattysmirks 7 หลายเดือนก่อน

    Thank you for creating this video. Can you expand more on why you did not include both Lasso and ElasticNet at the 2:25:10 mark? I'm curious if it made the Stacking Regressor worse at the very end in your original notebook.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  7 หลายเดือนก่อน +1

      If I remember correctly it made the results worse when submitting the results. I had kept a spreadsheet with all my attempts.

  • @lyk2244
    @lyk2244 หลายเดือนก่อน

    Shouldn't we use IQR/boxplot to check for outliers? Is outliers referring to outliers of the distribution or outliers of the relationship?

  • @rakshitshukla4205
    @rakshitshukla4205 4 วันที่ผ่านมา

    Thankk you so much

  • @s.s.sdhyuthidhar2276
    @s.s.sdhyuthidhar2276 7 หลายเดือนก่อน

    Hey Nolan Do you have separate tutorials for every machine learning model you used in this tutorial?

  • @richardweston3554
    @richardweston3554 7 หลายเดือนก่อน

    I'm a little bit over an hour in and good video so far! I think you could have saved a lot of time doing many things programmatically so far though.

  • @shivamsapru2246
    @shivamsapru2246 11 หลายเดือนก่อน

    Your videos are great. I just love this channel. It's just that kndly try to focus the recording on the code when you are typing. 🙂

  • @vancouverrrr
    @vancouverrrr 9 หลายเดือนก่อน

    i knew u looked familiar and then saw the vintage cards in the back Lol, im subscribed to ur card channel too

  • @japyh4
    @japyh4 11 หลายเดือนก่อน +1

    Thanks for the video, it was awesome.

  • @OrangeTomato474
    @OrangeTomato474 11 หลายเดือนก่อน +1

    I'm trying to build something similar but instead of prediction they have asked me to explain house price-
    A data science model that explains how different factors(gpd, unemployment, interest rate etc ) impacted home prices over the last 20 years.
    Any suggestions on what type of model should I use for this problem

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  11 หลายเดือนก่อน

      I would look at implementing Principal Component Analysis to see what has the biggest impact

  • @itsmephougat
    @itsmephougat 7 หลายเดือนก่อน

    With some tuning i got 0.018
    Can you make more such competition videos cause i love it.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  7 หลายเดือนก่อน +1

      Nice job! And someday. I want to finish building out my OpenAi/Langchain playlist and then work on a dbt one first

    • @Prathamydvv
      @Prathamydvv 5 หลายเดือนก่อน

      can i see your code for learning purpose

    • @JunaidAnsari-my2cx
      @JunaidAnsari-my2cx 2 หลายเดือนก่อน

      Can you please post your kaggle notebook?

  • @wahyunanandika1679
    @wahyunanandika1679 3 หลายเดือนก่อน

    Thanks man, it help me a lot

  • @prad2003
    @prad2003 4 หลายเดือนก่อน

    hey thank you for this amazing vid.

  • @olinabin2004
    @olinabin2004 9 หลายเดือนก่อน +1

    You earned a subscriber :)

  • @taronphoenix9439
    @taronphoenix9439 หลายเดือนก่อน

    I'm typing in the code and following along, however, I'm not sure what you do to retrieve all the data? What should I press after I've entered the code? Example: train_df.dtypes[train_df.dtypes != 'object'] and then? Thanks!

    • @Nameless-t9n
      @Nameless-t9n หลายเดือนก่อน

      Just run the code

    • @afeefanwar1421
      @afeefanwar1421 หลายเดือนก่อน

      Shift + enter

    • @afeefanwar1421
      @afeefanwar1421 หลายเดือนก่อน

      Or alt + enter

    • @afeefanwar1421
      @afeefanwar1421 หลายเดือนก่อน

      If you are coming back to the notebook after a while . You need to clicck run all

  • @pubgdoremongamer8823
    @pubgdoremongamer8823 3 หลายเดือนก่อน

    sir why don't you just use r2 score instead of MSE?

  • @Jiyakathuria
    @Jiyakathuria 2 หลายเดือนก่อน

    how can i do it in some environment like vs code ....i need to submit it?

  • @senthilkumars1061
    @senthilkumars1061 10 หลายเดือนก่อน

    Doubt In this question when we have already given the distinct train and test data separately. Then why do u perform an additional split using train_test_split ?

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  10 หลายเดือนก่อน

      So I can get a better model for my train set

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  10 หลายเดือนก่อน

      Think of train, test, validation

    • @SamLaseter
      @SamLaseter 3 หลายเดือนก่อน

      ​@@RyanAndMattDataScience Shouldn't you do the imputation after you split your data into training and test sets to avoid data leakage?

  • @ttien1612
    @ttien1612 6 หลายเดือนก่อน

    r^2 score = -4.0019e+19 i think you wrong somewhere

  • @coopernik
    @coopernik 9 หลายเดือนก่อน

    Very instructive video but the zoom was a bit off

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  9 หลายเดือนก่อน +1

      Have all the code in the description if anything is off

  • @SophiaUmaru
    @SophiaUmaru 3 หลายเดือนก่อน

    i try to run the train_df.columns and test_df.columns but i get a namerror saying train and test not defined ..pplease what could be wrong

    • @forfiverr3873
      @forfiverr3873 3 หลายเดือนก่อน

      you probably haven't initialised the variables...read them from the csv provided using pd.read_csv()

  • @olinabin2004
    @olinabin2004 9 หลายเดือนก่อน

    Timestamp for personal purpose : 47:00

  • @vishnukp6470
    @vishnukp6470 11 หลายเดือนก่อน

    can you do any timeseries for the next time?

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  11 หลายเดือนก่อน +2

      Next year for sure! I’m currently studying Deep Learning

  • @forfiverr3873
    @forfiverr3873 3 หลายเดือนก่อน

    Bro this is going to seem very dumb...why do you plot any parameter against the price in the y axis? why not use id?

  • @andyn6053
    @andyn6053 3 หลายเดือนก่อน

    waaaaay too time consuming and too much time spent on data exploration