Handling Missing Data Easily Explained| Machine Learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 มิ.ย. 2019
  • Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.
    Handling missing data is important as many machine learning algorithms do not support data with missing values.
    In this tutorial, you will discover how to handle missing data for machine learning with Python.
    Specifically, after completing this tutorial you will know:
    How to marking invalid or corrupt values as missing in your dataset.
    How to remove rows with missing data from your dataset.
    How to impute missing values with mean values in your dataset.
    Github link: github.com/krishnaik06/EDA1
    You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
    url: www.amazon.in/Hands-Python-Fi...

ความคิดเห็น • 77

  • @himalayasinghsheoran1255
    @himalayasinghsheoran1255 3 ปีที่แล้ว +15

    Today I started working on the titanic data. Tried to predict the missing age values but failed and was very tensed. So, I started watching your video in hope for a way. When you opened the notebook I felt such a relief - 'ki aab to ho hi jaega'. Thank you for making this video.

  • @aimenbaig6201
    @aimenbaig6201 3 ปีที่แล้ว +1

    QUESTION: why did you choose the imputed value of age with respect to the Pclass and not respect to male or female?

  • @rachanakotha6059

    Are these only for numerical data? What all methods can I used for characters/names or Years? Please suggest, Thanks!

  • @nikosterizakis
    @nikosterizakis ปีที่แล้ว

    Why write and not just make text appear (as in: pre-typed so people can read it and use transition? )

  • @marsrover2754
    @marsrover2754 4 ปีที่แล้ว

    What's the recommended rule for deciding the whether to do data imputation techniques or just simple dropping of the rows having missing values. As the missing values can have any patterns like Mising Data at Random, Not missing at Random and so on. So what to do in that case.

  • @subhajitdutta1443
    @subhajitdutta1443 3 ปีที่แล้ว

    Sir I was unable to under stand the programming part in Udemy. That is why searched in the youtube but here I can see both of them are exactly same.. You should at least change the digits.. With all due respect.. Chap diya apne Udemy se..

  • @radifantaufik8085
    @radifantaufik8085 3 ปีที่แล้ว +5

    I think there is a quantitative justification why we should fill the NaN values on 'Age' with median that classified by 'Sex' and 'Pclass'. On EDA step, we can print or visualize heatmap of the correlations between each columns (dataset.corr().abs()). We can see that 'Age' columns has relatively high correlation to 'Sex' and 'Pclass' columns.

  • @vineethp8925
    @vineethp8925 3 ปีที่แล้ว

    Hi , i want to know you used box plot median to replace missing values in age column but why no mean or mode ? can you please tell me the reason

  • @gopi3e
    @gopi3e 4 ปีที่แล้ว +2

    Thanks for the video, you said that option -2 (model based imputation) is less preferred for huge datasets, does that mean that in general it is good to go with statistical based imputation over model based imputation in real world datasets? Since we get lot of data in real world?. I am working on Home-Credit-Default-Risk (kaggle competetion dataset) request your comment on which imputation method to use?

  • @RAJI11000
    @RAJI11000 4 ปีที่แล้ว

    Sir plz give suggestion regarding cabin feature if it has low number of missing values how we deal with that type?it is a combination of catrogical and neumerical

  • @stevechops3226
    @stevechops3226 3 ปีที่แล้ว +2

    Your channel is awesome, please keep going! Can't tell you how valuable your videos are when starting to learn!

  • @dishydez
    @dishydez 3 ปีที่แล้ว +2

    Honestly, I really love your videos, simple and easy to understand. Always answering my machine learning and data science questions! I do have one though. I watched your video on standardisation and normalisation. I am trying to build a benchmark/index, would it be okay to make the data standardized before creating it or?

  • @equiwave80
    @equiwave80 3 ปีที่แล้ว

    Thanks Krish. I can't think of an easier explanation of a tricky topic!!! Simply superb!!!👍

  • @raunasur9710
    @raunasur9710 2 ปีที่แล้ว

    Thank you Krish sir. I was following the kaggle learn course on machine learning but couldn't understand this topic even after so much of hard work - now it's all clear. Keep it up.

  • @hanman5195
    @hanman5195 4 ปีที่แล้ว +1

    Your explanation is pretty much amazing and your my perfect as usual.

  • @coolsun-lifestyle
    @coolsun-lifestyle 5 ปีที่แล้ว

    Thanks a lot for detailed explanation. It really helps

  • @strangereview2414
    @strangereview2414 3 ปีที่แล้ว

    Nice explanation, conclusion depending on your end goal, and whether if drop or change to mean will affect on your analysis, in he’s example he need the age but he didn’t need the cabinet.

  • @aimenbaig6201
    @aimenbaig6201 3 ปีที่แล้ว +1

    Thank you for making life so much easier for us!

  • @finance_tamil
    @finance_tamil 5 ปีที่แล้ว +4

    Thought that you will also implement Regression Model for synthetic imputation. But the content is great!!

  • @bhaktibailurkar1936
    @bhaktibailurkar1936 ปีที่แล้ว

    Cleared all my doubts! Great..Thank you so much!!