Handling Missing Value with Mean Median and Mode Explanation | Data Cleaning Tutorial 7

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 มี.ค. 2021
  • During the Machine Learning Data Cleaning process, you will often need to figure out whether you have missing values in the data set, and if so, how to deal with it. In this video, I have demonstrated to handling the missing value using statistical way mean, median and mode. In this video I only cover the explanation :-
    1. We impute the missing data for a quantitative attribute by the mean or median and for qualitative attribute by mode.
    2. Generalized Imputation: In this case, we calculate the mean or median for all non missing values of that variable then replace missing value with mean or median.
    3. Similar case Imputation: In this case, we calculate mean individually of non missing values then replace the missing value based on other variable.
    #DataScience #MachineLearning #missingvalue

ความคิดเห็น • 6

  • @abdulhakeem4715
    @abdulhakeem4715 หลายเดือนก่อน

    cleanest explanation ever 👍

  • @sofiahhanisahmadhisham5531
    @sofiahhanisahmadhisham5531 2 ปีที่แล้ว

    how to determine whether to use mean or median for imputation?

  • @0SIGMA
    @0SIGMA 3 ปีที่แล้ว +2

    so when to use mean median compared with interpolation method ?

    • @AtulPatelds
      @AtulPatelds  3 ปีที่แล้ว

      Thanks for watching my video.
      I hope below explanation will help you..
      We use both techniques depending upon the use case.
      Imputation with Mean : If you are given a dataset of Smart City and there is a feature called city temperature. So, if there are null values for this feature then you can replace it by average value i.e. Imputation.
      Interpolation: If you are given a dataset of the share price of a company, you know that every Saturday and Sunday are off. So those are missing values. Now, these values can be filled by the average of Friday value and Monday value i.e. Interpolation.
      So, you can choose the technique depending upon the use case.
      In reality we can not stuck in one approach. Let's my well tested imputation approach(mean) work in same type of the feature(temperature) in some dataseta but it will not be always possible that you will always use mean to impute the temperature feature in any type of the datasets.
      It is totally depends on the data sets.

    • @0SIGMA
      @0SIGMA 3 ปีที่แล้ว

      @@AtulPatelds that's great. Love your detailed explanation ❤️ but I was wondering how we can assume which null elimination techniques is best especially for tricky dataset. And it's not ideal to keep trying different methods right ?

    • @AtulPatelds
      @AtulPatelds  3 ปีที่แล้ว

      In reality if you are working in machine learning based problem and got a chance to impute the null values we usually apply different different approach and check the model performance and If the dataset is very very large and if you have very tight timeline then you can take small samples of dataset and apply different approach and check the model performance.
      But If you have domain expert or Business analyst who has good understanding about the domain you can take help from them and they will guide you about that feature and their importance in that dataset and you can take decision accordingly.
      But when you work in a enterprise level or big product companies they offen have separate data team who take care the data selection and storing and mostly data cleaning and improving data quality. So it always depends on company ,data types, domain,time duration and many others aspects in real world.
      So if any body say that you should use this approach then you should always tell them it might or might not work in your case.