Kaggle's 30 Days Of ML (Day-12 Part-1): Handling Missing Values in Datasets (imputing missing value)

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ม.ค. 2025

ความคิดเห็น • 36

  • @abhishekkrthakur
    @abhishekkrthakur  3 ปีที่แล้ว +22

    If you like the videos, please do consider subscribing. It helps me keep motivated to make awesome videos like this one. :)

    • @grzegorzzawadzki8718
      @grzegorzzawadzki8718 3 ปีที่แล้ว +1

      Thank you @Abhisheh Thakur, I watched all the previous 11 episodes yesterday. By combining the knowledge from this and the previous video I got into the top 5%!

  • @isaacyn8256
    @isaacyn8256 3 ปีที่แล้ว +11

    You are helping lot of newbies in ML.

  • @yuviiiiiiiiiiiiiiiii
    @yuviiiiiiiiiiiiiiiii 3 ปีที่แล้ว +4

    @44:13 Where magic happens!

  • @adriandiazNY
    @adriandiazNY ปีที่แล้ว

    You are an absolutely great teacher, you've made this a lot easier for me to understand and have given me a ton of tips, and answered a bunch of unrelated questions I had about pandas along the way!! Thanks a ton!!

  • @gmguimaraess
    @gmguimaraess 3 ปีที่แล้ว +2

    Thank you, Abhishek!
    In the beginning, when you were explaining the missing value imputation using the titanic dataset as an example, I was just wondering: hmmmm, couldn't we use Pclass and Sex features and try to predict the missing values of age?
    Then in the last part, you actually said we could try using this method! This got me excited and motivated to try this approach
    once again, thanks! Your videos are helping a lot!

  • @deepakdas8884
    @deepakdas8884 3 ปีที่แล้ว +1

    Sir, Thank you so much again

  • @fmussari
    @fmussari 3 ปีที่แล้ว +2

    Why the model fit before submission is done only on train data and not on all X data imputed? Am I missing something? Thanks a lot for the videos!

  • @anmolsmusings6370
    @anmolsmusings6370 3 ปีที่แล้ว

    Thanks for the video. I was wondering if in the last part where you use the model to predict the column with missing values is somehow related to the Expectation-Maximization (EM) algorithm? I reckon that the expectation step in the EM algorithm actually completes the missing data for each sample using the model itself. Was curious to know your take on it? Thanks again.

  • @jsklair
    @jsklair 3 ปีที่แล้ว

    With the machine learning imputation method you discuss at the end; would you input the the 'F4' predicted numbers from model 1 (once it has run) into the 'X' of model 2 used to predict F6? Thanks for the videos.

  • @md.al-imranabir2011
    @md.al-imranabir2011 3 ปีที่แล้ว +1

    Is it possible to use different strategies for different columns? Say, mean for one column and constant for another column?

  • @sunilsurendrasingh7736
    @sunilsurendrasingh7736 2 ปีที่แล้ว

    in KFold cross-validation should the missing value imputation be done before CV or during CV for each train/Validation fold?

  • @sauravkumar9454
    @sauravkumar9454 3 ปีที่แล้ว

    Hello Abhishek,
    Why don't we just try imputation on the whole training dataset before splitting it for validation, this way we don't have to transform x_valid. Please let me know. Thanks.

  • @tubasiddiqui7345
    @tubasiddiqui7345 3 ปีที่แล้ว +2

    When we had X_test, why did we create X_valid?

    • @abhishekkrthakur
      @abhishekkrthakur  3 ปีที่แล้ว +3

      validation data is derived from original test data and has target labels. X_test doesnt have any target labels!

    • @tubasiddiqui7345
      @tubasiddiqui7345 3 ปีที่แล้ว +3

      ​@@abhishekkrthakur Oh I got it. We use validation data to match our predicted values with original ones and we use test data to actually use the model. Thank you

    • @abhishekkrthakur
      @abhishekkrthakur  3 ปีที่แล้ว +5

      @@tubasiddiqui7345 eggjactly 😀

    • @MrLycantree
      @MrLycantree 3 ปีที่แล้ว

      @@abhishekkrthakur What differs X_valid than y data?

    • @tubasiddiqui7345
      @tubasiddiqui7345 3 ปีที่แล้ว

      @@MrLycantree X has features while y has target/label

  • @thelazydeveloper
    @thelazydeveloper 3 ปีที่แล้ว +1

    mech iyad​does this cource come with a certification

  • @samirana8931
    @samirana8931 3 ปีที่แล้ว

    Hello Abhishek!
    First of all, thank you very much for making this understandable.
    Secondly I have tried to impute missing values by building a model and predicting them and my score got increased. I am attaching the kaggle link for my notebook.
    www.kaggle.com/mlsami/exercise-missing-values
    I know, my notebook is written neither in professional nor its the best. I just want you to take a look into that and tell us how improvement can be made.
    Apologies in advance if code is written badly, but its working for now atleast.
    Once again, Thank you very much.

  • @swayamsingh4650
    @swayamsingh4650 3 ปีที่แล้ว

    Sir as you discussed in last imputation method where we have to use a model to predict the column with missing values right. So for that I also need to train my model first with the rows which don't have any missing values and then pass those rows that have missing values as a validation set for prediction. Am i right ?

    • @abhishekkrthakur
      @abhishekkrthakur  3 ปีที่แล้ว

      yes. as test set, not validation :)

    • @swayamsingh4650
      @swayamsingh4650 3 ปีที่แล้ว +1

      @@abhishekkrthakur yeah sorry wrong term :

    • @swayamsingh4650
      @swayamsingh4650 3 ปีที่แล้ว +3

      @@abhishekkrthakur just tried last approach of filling missing values with predictions and guess the rank it's 464 now :), huge jump from 1556. Thanks again sir

    • @nischaypatel4
      @nischaypatel4 ปีที่แล้ว

      Can you please share the solution of this???I tried the same thing but my model did not improve by a significant amount like yours did.

  • @GurpreetKaur-nn8bb
    @GurpreetKaur-nn8bb 3 ปีที่แล้ว +1

    Abhishek Sir, Is it necessary to submit the assignments in order to get the certificate?
    Or just these assignments/exercises are for practice purpose only?
    Are Kaggle keeping the record of us to going through the tutorials and doing assignments?

    • @suriyaprakaashjl5642
      @suriyaprakaashjl5642 3 ปีที่แล้ว

      No actually there are three courses you have to complete and you will get 3 certifcates

    • @abhishekkrthakur
      @abhishekkrthakur  3 ปีที่แล้ว +1

      you need to do the exercises to get certificate :)

  • @alongbarbrahma484
    @alongbarbrahma484 3 ปีที่แล้ว

    This was a lot to process