How to impute missing data in categorical features (using MICE)

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ต.ค. 2024

ความคิดเห็น • 14

  • @machinelearningplus
    @machinelearningplus  5 หลายเดือนก่อน

    I teach complete ML Mastery Roadmap (self paced courses) to master Data Science from scratch: edu.machinelearningplus.com/s/pages/ds-career-path

  • @thezenithanalysis7541
    @thezenithanalysis7541 2 หลายเดือนก่อน

    Thank you for this video. It helped.

  • @sabalunax
    @sabalunax 8 หลายเดือนก่อน

    You are a genius! Your video helped me a lot! :)

  • @ketanbutte3497
    @ketanbutte3497 ปีที่แล้ว

    great video..
    Some doubts-
    1. Whats the intuition behind the baysian intuition. How the genders were assigned for missing places?
    2. Can this be safely used for any categorical data which has missing values??
    Again great work...

  • @ekpenyongokpo4900
    @ekpenyongokpo4900 7 หลายเดือนก่อน

    Thank you, for the great tutorial

  • @machinelearningplus
    @machinelearningplus  ปีที่แล้ว

    1. It uses the MICE algorithm to impute missing data. Consuder checking out the previous video on MICE: th-cam.com/video/BjyUbk258o4/w-d-xo.html
    2. Mostly yes

  • @MaximoTartaglia
    @MaximoTartaglia 8 หลายเดือนก่อน

    Thank you, great video!! But shouldn't you use one hot enconding instead of label encoding because it is a nominal cat variable?

    • @machinelearningplus
      @machinelearningplus  8 หลายเดือนก่อน

      The idea is to convert it to numeric column, which can be done using labelencoder itself. Since, there are only 2 categories in gender, it shouldn't really matter if you want to use one hot encoding

    • @ssffyy9401
      @ssffyy9401 8 หลายเดือนก่อน

      ​@@machinelearningplus Thank you for the clarification. I have a question regarding the use of encoding techniques in machine learning. Typically, we opt for one-hot encoding over label encoding for nominal categorical variables to avoid implying any inherent order or hierarchy to the model during prediction tasks. However, in a scenario where the dataset includes a nominal categorical column with more than two classes and the purpose is not for ML prediction but for imputation to address missing values, would employing label encoding to prepare the dataset for the imputer potentially mislead the imputation process?

    • @machinelearningplus
      @machinelearningplus  8 หลายเดือนก่อน

      Thanks for the question. Yes, I would think so. Especially since more than 2 categories are involved

  • @aminvahdati9476
    @aminvahdati9476 ปีที่แล้ว

    what is the difference of this video with Mode imputation? it did the same thing with long codes

    • @shankars4384
      @shankars4384 ปีที่แล้ว +1

      simulation studies suggests that mean imputation is possibly the worst missing data handling method available. this is from research papers. MICE method is like a one size fits all approach and much better. mode imputation messes with bias and variance and screws up the model.

  • @saurabhsonawane7110
    @saurabhsonawane7110 7 หลายเดือนก่อน

    Life saver!