House Price Regression LIVESTREAM | Data Science | Kaggle | 4/27/21

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ต.ค. 2024
  • Hi guys! Today I'll be running through one of Kaggle's data science competitions from start to finish. We will go in-depth into all the necessary actions to perform when trying to optimize a model for high-performance, as well as any preprocessing/data transformation steps that would help us make good predictions.
    Here is a link to the Kaggle competition:
    www.kaggle.com...
    And here is a link to the notebook from the video:
    www.kaggle.com...
    Thanks so much for watching! If you liked the stream, make sure to subscribe and hit the bell for more content!
    See you all later! :)
    ----------
    Patreon: / gcdatkin
    LinkedIn: / gcdatkin
    Twitter: / gcdatkin

ความคิดเห็น • 68

  • @jeffstevenson6419
    @jeffstevenson6419 3 ปีที่แล้ว +10

    Thank you for taking the time to create this! Really great walkthrough that helps viewers to follow your thought process and logic.

    • @jeffstevenson6419
      @jeffstevenson6419 3 ปีที่แล้ว

      Hello - what would be the best way to visualize how the knn_impute function works? Trying to understand better how the function targets missing numerical values within the column, then uses knn to determine nearest neighbor for value

  • @shashankbangera7753
    @shashankbangera7753 2 ปีที่แล้ว +2

    most underrated channel you deserve more than million subs!! what a great teacher man

  • @tonyjoffre2909
    @tonyjoffre2909 2 ปีที่แล้ว

    Lmao "I hope this isn't to confusing " I've rewatched this knn_impute function explanation 4 times 😅

  • @muratsar9090
    @muratsar9090 2 ปีที่แล้ว +3

    Watching you makes it seems easy but then again, it reminds me how far behind I am as a beginner. :)
    Anyway, keep up the good work man! Love the video, great tutorial for me! Already subscribed and liked the video.

    • @brhnglc
      @brhnglc 2 ปีที่แล้ว

      3 Ayın ardından durumlar nasıl ?

  • @esthernjuguna8926
    @esthernjuguna8926 5 หลายเดือนก่อน

    Just noted am watching it exactly 3 years

  • @AkaExcel
    @AkaExcel 3 ปีที่แล้ว +2

    Thank you for such useful content you create, we learn from You, Gabriel wish You all the Best!

    • @gcdatkin
      @gcdatkin  3 ปีที่แล้ว

      Thanks, I'm glad you enjoyed it! :)

  • @iampeterokolie
    @iampeterokolie 2 ปีที่แล้ว +1

    Thanks Gabriel. I totally loved following you on this journey. You rock !

  • @DanielTobi00
    @DanielTobi00 2 ปีที่แล้ว +1

    Thank you so much for doing this.
    I am currently stuck in been able to get the rmse between final_predictions and my local y_test.

  • @nihadaliyev6670
    @nihadaliyev6670 ปีที่แล้ว

    I went to a 3month course and all i wanted but never got was kind of a graph like in 6:10 then the rest i could do with self-learning, so thank you so much.

  • @houdasalhi315
    @houdasalhi315 3 ปีที่แล้ว +1

    Really thank u, I wish your channel reach a million views, coz u deserve it, u r a such good teacher, u explain and make things easier to understand

  • @madhukumaryadav7194
    @madhukumaryadav7194 3 ปีที่แล้ว +1

    Hi Gabriel plz do more sessions like this.

  • @alexandertsikhun7733
    @alexandertsikhun7733 2 ปีที่แล้ว

    Great video! I used it like tutorial to get started own practical experience on Kaggle competition and DS work. Thanks a lot!
    Pity, that you don't relese any video(

  • @mericgenc
    @mericgenc 2 ปีที่แล้ว +1

    Gabriel, thank you so much! I learned a lot of stuff from this video. It improves my perspective. I'm new to Data Science but I'm very excited to learn new stuff. Every time I learn something, I realize how much I don't know anything 😄 I liked this video and subscribed your channel. I definitely check your other videos. You're the best!

  • @shrutiiyer1585
    @shrutiiyer1585 2 ปีที่แล้ว

    This was so good! Very organized workflow, great explanations - learnt a lot from this video. Looking forward to more such competition processes!

  • @moizkachwala2199
    @moizkachwala2199 3 ปีที่แล้ว

    Thank you so much for the wonderful video, I learned like never before. Please make such end to end videos.

  • @retenim28
    @retenim28 2 ปีที่แล้ว +1

    Hi, I'm looking at your work. It looks brilliant so far. Question about the initial part. You transform MSSubClass from int type to str type because it's a categorical data type. But 'OverallCond' and 'OverallQual' for instance are categorical features because they represent categories. Why did not you select them too as categorical features?

    • @luisfernandodasilvamartine8453
      @luisfernandodasilvamartine8453 2 ปีที่แล้ว

      You have to think ... this number (feature) 2 represents the double of 1?
      The number 6 represents something much greater then 1?
      Or the numbers is just 'codes' that don't represent numerical quantities?
      'OverallCond' and 'OverallQual' show us that, if greater the number, greater the importance.

  • @sandeepm9313
    @sandeepm9313 2 ปีที่แล้ว

    Excellent and lucid explanations Gabriel. I was able to learn a lot from your video and it improved my understanding of the concepts and my score as well. Subscribed! Keep up the good work.👍👍

  • @randb9378
    @randb9378 2 ปีที่แล้ว

    Please do more of these! Great video

  • @nadhembenhadjali9063
    @nadhembenhadjali9063 6 หลายเดือนก่อน

    Thank you so much for this video ! I learned a lot !!!

  • @traeht
    @traeht ปีที่แล้ว

    Very instructive, I learned a lot. Thank you!

  • @himashisbiswas1339
    @himashisbiswas1339 ปีที่แล้ว

    thanks a lot ! very good explanation ! as a beginner i learnt a lot from this video

  • @Asdpodkjas
    @Asdpodkjas 3 ปีที่แล้ว

    thank you very much for this video, very interesting! I learned a lot from it - super happy about finding your channel.

    • @gcdatkin
      @gcdatkin  3 ปีที่แล้ว

      Glad to hear it! :)

  • @abc-co7fy
    @abc-co7fy 3 ปีที่แล้ว

    Thnks Gab.!! . . .Congrats on your channel reaching ~1000 subscribers.
    Can u post the livestream schedule and topic in advance under the "Discussion" tab of YT so that we (viewers) can prepare for the session.

    • @gcdatkin
      @gcdatkin  3 ปีที่แล้ว

      Thank you! I will be sure to do that when I am planning the next stream.

  • @АльбусДамблодр
    @АльбусДамблодр 2 หลายเดือนก่อน

    Thank you bro for this extremely cool video❤

  • @beast_from_east
    @beast_from_east 5 หลายเดือนก่อน

    Worth every minute, thx a lot❤

  • @wavyjones96
    @wavyjones96 2 ปีที่แล้ว

    You are the fucking boss man! i learnt so much doing this along with you!
    Want more!
    SUBSCRIBED!

  • @eranzecharia
    @eranzecharia ปีที่แล้ว

    In hyperparameter optimization, why fitting the model before calculating the cv_score? Isn't the cross_val_score fitting the model on the folds?

  • @ICrackSoftWares
    @ICrackSoftWares 3 ปีที่แล้ว

    Thanks a lot for this great video. I had a question about the knn_impute. Since you are using the same dataframe for every iteration of the loop, doesn't that make the order in which you pass the columns to the function matter? Because essentially you are selecting the non na columns after each run, so each iteration there will be an extra column to use, and you might run the knn on values that are actually filled by the knn in a previous iteration. Wouldn't it be less biased if you do every column 1 by 1 and then inserting each column individually into the final df? I don't know if I'm making any sense with my question, but would really appreciate your response. Best regards

  • @randb9378
    @randb9378 2 ปีที่แล้ว

    Great video! I have a question on something that you stated.
    if a column has 0 (false) and 1 (true) sould we convert it to categorical instead of integer? If yes, how would we calculate the correlation between that column and a numeric value? Thanks!

  • @dswithanand
    @dswithanand 2 ปีที่แล้ว

    Learned a lot from ur video…plz make more video live projects video..thanku

  • @rifkoerlangga9371
    @rifkoerlangga9371 2 ปีที่แล้ว

    why you're not using the dropfirst parameter when calling pd.getdumies??, drop first parameter to avoid possible multicollinearity in dummies variable right?

  • @boogersincoffee
    @boogersincoffee 2 ปีที่แล้ว

    This is gold, thank you for this

  • @pythondev2631
    @pythondev2631 2 ปีที่แล้ว

    Thank you. This is a great video with some really nice insights about data analysis.

  • @ju1042
    @ju1042 2 ปีที่แล้ว

    Your videos are helping a lot!!! I have a doubt, I have latitude and longitude data here in my dataframe, would it make sense to transform the skewness of these features? Or is better to leave it unchanged? Thank you very much for your time!

  • @eldadperetz9262
    @eldadperetz9262 2 ปีที่แล้ว

    Hi, thanks for the video
    Just a question:
    what is the intuition to use the mode to fill missing values?
    It makes more sense to me to always set to "None".
    For example for a decision tree based model why does it help?
    Thanks!

  • @lukebauer5495
    @lukebauer5495 2 ปีที่แล้ว

    I do not understand why people are adding in the test data to the training data (I have seen several tutorials that do so). This will most likely lead to overfitting, as you are leaking test data into the training set, no? It is better to keep the train and test separate, I would think. You can split the training set into its own set of training and test data, and can, from there, use the new training set to perform your cross validation.
    Just wondering what your thoughts are. I really enjoyed that you livestreamed this. I confess I skipped around so if you adressed this somewhere, then I missed it, sorry.

    • @lukebauer5495
      @lukebauer5495 2 ปีที่แล้ว

      Found the answer near 1:13:00. :) Also before that, you explain why you aren't making a pipeline for preprocessing, which is what I would have expected.
      Very cool video, got so much insight into how Kaggle is done slightly differently from actual production modeling. Thanks a lot.

  • @mustofaahmed9153
    @mustofaahmed9153 2 ปีที่แล้ว

    Thank you so much for this fruitful video!

  • @dprab123
    @dprab123 2 ปีที่แล้ว

    Thanks for your time, very insightful. I did dirty quick feature engineering and used pycaret tuning/bagging/blend to get to rank ~930. why didnt you use the pycaret bagging ? and you can assign a variable to pycaret compare_models and choose best 6 from that variable so you can code it than hardcode

  • @jamespaladin607
    @jamespaladin607 3 ปีที่แล้ว

    Excelent presentation - now I know why I don't bother to compete!

  • @mohammadhegazy1285
    @mohammadhegazy1285 2 ปีที่แล้ว

    thank you very much that was valuable and informative

  • @taronphoenix9439
    @taronphoenix9439 หลายเดือนก่อน

    How do you get train0 and test0 to show up as data in the notes? What are you pressing? lol

  • @satyamtripathi1732
    @satyamtripathi1732 3 ปีที่แล้ว

    Congratulations on 1k subscribers 🔥🎉🎉🎊🎊🎊 u r awsm mind-blowing teacher u r the combination of andrew ng and many more keep it up may god give u everything and u bath in billions of dollars everyday❤️❤️❤️❤️

    • @gcdatkin
      @gcdatkin  3 ปีที่แล้ว +1

      Wow, this has to be one of the nicest things anyone has said to me. Thank you so much, Satyam 🙏

  • @skyleong8497
    @skyleong8497 3 ปีที่แล้ว

    Amazing video and sharing !love it

  • @jamespaladin607
    @jamespaladin607 3 ปีที่แล้ว

    If possible could you consider a presentation on deploying a production tensorflow model to a website.

    • @gcdatkin
      @gcdatkin  3 ปีที่แล้ว

      Sounds great!

  • @duongkstn
    @duongkstn 7 หลายเดือนก่อน

    learnt alot, thanks !

  • @TsheringYangdon-h2k
    @TsheringYangdon-h2k 4 หลายเดือนก่อน

    Sir i have issue with pycaret in kaggle notebook. how to reslove this issue please?

  • @lidory98
    @lidory98 3 ปีที่แล้ว

    the 'setup' function gives me this error:
    AttributeError: 'Simple_Imputer' object has no attribute 'fill_value_categorical'
    I searched the net for so long and nothing helped, do you have any idea how can I resolve this problem?

  • @SngSengYi
    @SngSengYi ปีที่แล้ว

    what is save target in target? my colab say "NameError: name 'target' is not defined"

  • @osikoyaadeola2530
    @osikoyaadeola2530 2 ปีที่แล้ว

    Thank you so much.

  • @017farazbintariq9
    @017farazbintariq9 2 ปีที่แล้ว

    you are the best !!!

  • @mannyprotopapas4198
    @mannyprotopapas4198 5 หลายเดือนก่อน

    you are incredible

  • @abdelkaderkaouane1944
    @abdelkaderkaouane1944 ปีที่แล้ว

    You haven't studied the correlation between features neither their VIF.
    Why?

  • @c0mpl3xyz
    @c0mpl3xyz ปีที่แล้ว

    thank you very much

  • @017farazbintariq9
    @017farazbintariq9 2 ปีที่แล้ว

    thank you brother

  • @giacomosansone6024
    @giacomosansone6024 ปีที่แล้ว

    why you keep all the features instead of selecting the more important one ?

    • @ixkauzo5032
      @ixkauzo5032 ปีที่แล้ว

      determining which are the important ones will probably need a strong domain knowledge

  • @throwaway999able
    @throwaway999able ปีที่แล้ว

    15:45 how does he "double" type?

  • @SiddheshDhande-ef3yn
    @SiddheshDhande-ef3yn 4 หลายเดือนก่อน

    48:23

  • @teddy911
    @teddy911 2 ปีที่แล้ว

    learned a lot, thanks! very informative!