Handle Missing Values: Imputation using R ("mice") Explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ส.ค. 2024

ความคิดเห็น • 159

  • @asukatakazawa9967
    @asukatakazawa9967 3 ปีที่แล้ว +4

    I am totally new to MICE imputation and searched for clues on the internet but failed. However, your video was PERFECT and now I could totally understand how it works . Love your work you've done here 👍

  • @rajujha5225
    @rajujha5225 3 ปีที่แล้ว +4

    Channels don't need to have a thousand subscribers, good content like this is sufficient. Thanks!

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Thank you my friend. Subscribers are important for making more content like this.. 😆

  • @annap9782
    @annap9782 3 ปีที่แล้ว +13

    The point of multiple imputation is to perform the analyes in each imputed dataset and then to pool the results. If you just want to work with one dataset it would be better to use VIM or a similar package to perform single imputation.

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Thanks for your comment. In mice also you can do that like below.,
      1) mice() this will give you mids
      2) with function to apply a stat function to the mids object above. This is the Mira object
      3) pool (mira) to combine the results together.
      Having said that, I still find this way simple and very effective... agree that we can use VIM as well..

  • @raihankhan4374
    @raihankhan4374 4 ปีที่แล้ว +3

    The best explanation of this I've ever come across..please make more R videos!

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว

      Thanks a lot.. more coming soon....

    • @raihankhan4374
      @raihankhan4374 4 ปีที่แล้ว +1

      @@dataexplained7305 would you happen to know two different ways to compute the upper quartile of the variable BMI? Like what is the specific syntax?

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว

      @@raihankhan4374 Try this...
      nhanes$bmi[which(is.na(nhanes$bmi))] = mean(nhanes$bmi, na.rm = T)
      Uppr_Quartile_A = summary(nhanes$bmi)[5]
      Uppr_Quartile_B = quantile(nhanes$bmi)[4]
      Hope this helps !
      Cheers.

    • @raihankhan4374
      @raihankhan4374 4 ปีที่แล้ว +1

      Thanks! i'll give it a go! I tried a different way earlier, it worked but feels super cheap lol check it out:
      First Method:
      step1

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว +1

      Looks great..
      Cheers

  • @aish_waryaaa
    @aish_waryaaa 2 ปีที่แล้ว +1

    Thank You So Much Sir for a very great explanation.I have my project and was worried about imputing the missing values and this has really helped me a lot.GOD Bless You.

  • @GGlessGo
    @GGlessGo 4 ปีที่แล้ว +1

    22:14 saved the experiment for my paper! I still do not understand how to fit all imputed dataset in one model. But at least i got something! much love

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว +1

      Thanks so much for the support.
      You can choose the best data per
      statistical sense, e.g. regressing or mean scale from the imputed datasets. This will be one data as a whole which will be replaced in place of the missing values.. or if you want, with a few lines of code, you can pick and choose the data from all 5 which ever values you think might be close to missing values.
      Check this out..
      th-cam.com/video/_ymR-FFG44c/w-d-xo.html
      Stay tuned... more coming soon !!

  • @richardstj999
    @richardstj999 2 ปีที่แล้ว +5

    The variable age is also categorical, with categories 1="20-39"; 2="40-59"; and 3="60-99"', which are already coded in the data.frame nhanes2, also in the mice package. This does not detract from your very helpful explanations! I am really concerned about imputation of mixed datasets, with both continuous and categorical variables. There are many journal papers in medicine where the authors say they used multivariate normal imputation, and I always wonder how they could possibly handle missing data in categorical variables using that method. The point is that they cannot, and they did not.

  • @TheSeynana
    @TheSeynana 3 ปีที่แล้ว +2

    Very good video with in-depth explanations!

  • @willykitheka7618
    @willykitheka7618 3 ปีที่แล้ว +1

    You've done a super job explaining the content so well that I subscribed! Thanks for sharing!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 3 ปีที่แล้ว +1

    The information on the nhanes variables is readily available in the package section of RStudio. There it states that hyp = 1 is for 'no' and hyp = 2 is for 'yes'. Might have been better to convert them to 0 (for no) and 1 (for yes).

  • @aminusuleiman7924
    @aminusuleiman7924 4 หลายเดือนก่อน

    Thank you, very well explained. Appreciate it

  • @KiyoPenspinning
    @KiyoPenspinning 3 ปีที่แล้ว +2

    That is so valuable. Thank you for creating this video!

  • @josephogle6449
    @josephogle6449 ปีที่แล้ว +1

    Good explainer Re: using mice. However, in addition to @annap9782's comment, I'd also flag that finding that your imputed data is distributed similarly to the observed data is not a meaningful test of imputation performance. If we had a (conditionally) ignorable missigness mechanism (i.e., MAR), it may even be expected that these distributions will differ, without this saying anything about the performance of a given imputation.

  • @rauceliovaldes1495
    @rauceliovaldes1495 3 ปีที่แล้ว +5

    Muito obrigado.
    Sem falar Inglês, eu entendi!!!! Parabéns.

  • @erixyz
    @erixyz 3 ปีที่แล้ว +1

    this was so well explained and to the point. thank you for your knowledge.

  • @krishnaswamyc9034
    @krishnaswamyc9034 3 ปีที่แล้ว +2

    This is Great content. Thanks for patiently explain

  • @v.m.3748
    @v.m.3748 4 ปีที่แล้ว +3

    Ohhh, now I got it. Thank you!!

  • @robertjones3727
    @robertjones3727 3 ปีที่แล้ว +2

    I really enjoyed your video! You selected the 5th column by comparing the original mean to the column of imputed values, which is practical for a small data set. In cases where there are 100s or 1000s of imputed values, what are the steps for calculating the mean for each of the columns of imputed values, and comparing those results to original mean to then select the best column of imputed values?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      You can do summary(imputedDS$1) compare them to summary(source$column) and then summary(imoutedDS$2) compare with source column.. or simply mean() function...

    • @robertjones3727
      @robertjones3727 3 ปีที่แล้ว +1

      @@dataexplained7305 Excellent - Thank you !!!

  • @waelhussein4606
    @waelhussein4606 3 ปีที่แล้ว +3

    Nicely explained. However, you should not just pick one set after your multiple imputation. You run the analysis on all generated datasets and produce a pooled analysis.

    • @lesliezhen4256
      @lesliezhen4256 3 ปีที่แล้ว +2

      Agreed. Something like this:
      my_imp = mice(input_dt2,m=5,method=c("","pmm","logreg","pmm"),maxit=20) # create several sets of imputed data as seen in video
      my_analysis_model

    • @paigecox347
      @paigecox347 2 ปีที่แล้ว

      @@lesliezhen4256 Hi, I wonder if you could expand on that second line of code there for me? The one which fits the model to each set of imputed data. What do I put in the brackets of model( ) ?

    • @lesliezhen4256
      @lesliezhen4256 2 ปีที่แล้ว +1

      @@paigecox347 The part of the 2nd line that reads model(...) is a placeholder for whatever is the actual model that you are fitting. For example, if you are fitting a mixed effects model, you would replace model(...) with something like lmer(outcome ~ 1 + variable1 * variable2 + (1|subject), data=name_of_dataset)

    • @paigecox347
      @paigecox347 2 ปีที่แล้ว

      @@lesliezhen4256 thanks Leslie, I think for my data I won't be able to do this as it's only my dependent variables in a separate data set that need to be imputed and then averaged.
      My independent variables are in a different format (so I cant build the model from the imputed data set in the traditional way)
      Thanks for the help though 👍

  • @vg7181
    @vg7181 4 ปีที่แล้ว +2

    Incredible brother!! Very well explained 👍

  • @eyadha1
    @eyadha1 ปีที่แล้ว +1

    great video. thank you very much.

  • @uvsiblings794
    @uvsiblings794 3 ปีที่แล้ว +2

    Really very good content but i have a doubt, here we have taken the summary of bmi and based on the mean value we have selected the 5th column values because the values are near to the mean. Here why are we considering only bmi summary and why not considering chl summary to select the nearest column values. Thanks in advance

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว +1

      Sorry for delay.. if you still have this Q.. ..the baseline is that the NA values needs to be replaced with best possible values from our judgment... the target variable distribution can be compared to that of the imputed datasets and like you said each column can be imputed using seperate imputed datasets and not just one common dataset.. whatever you choose as your output variable for your analysis can be chosen to compare distribution with final datasets. Hope this helps

    • @paigecox347
      @paigecox347 2 ปีที่แล้ว

      @@dataexplained7305 Hi, was wondering if you could help me with a similar query.
      I have 20 continuous variables in my dataset which I need to impute missing data for.
      In this case how do I impute each column using separate datasets? Do I need to do these steps separately for 20 different datasets? And how would I then combine all those columns at the end into one complete data set?

  • @mattm1152
    @mattm1152 ปีที่แล้ว +1

    Great tutorial! Thank you!

  • @sahelmoein4377
    @sahelmoein4377 ปีที่แล้ว +1

    thank you for this tutorial video. It was really helpful. Is there any specific rule for the number of datasets( in your case:5) we choose?

    • @dataexplained7305
      @dataexplained7305  ปีที่แล้ว

      Thanks. I have Selected based on the correlation and closeness of the data to original set.. there are a couple ways to this. Some don’t choose the data for substitution but just use it for analysis…

  • @ilaydavelioglu3677
    @ilaydavelioglu3677 3 ปีที่แล้ว +1

    Thank you so much for your effort, love this explanation

  • @fabianoborges
    @fabianoborges 3 ปีที่แล้ว +1

    Thank you for this material!

  • @camillesantos4953
    @camillesantos4953 3 ปีที่แล้ว +2

    Thank you for the in depth explanation of the MICE. One question though, I understand the first step to MICE is a simple imputation but what is the point of doing so if in the Mice Imputation command the original data set (input_dt2=nhanes) was used, in addition to checking it against its mean?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Sorry for delay... I m just making a copy of the nhanes dataset to another variable as I don't want to alter the original dataset in which case.. I need to reinstall the library.. just being lazy.. lol

  • @heralvyas27
    @heralvyas27 3 ปีที่แล้ว +1

    Very well explained Senthil!!
    I have a question though. When we do the mice imputation, we get 5 datasets and in this case you chose one that is closest to the mean value of BMI. In this case, the amount of NAs was very less. So you could look through the data and decide which dataset to use.
    What happens in a real life situation when the # of NAs to be replaced is high? How do we decide which dataset to use then? If the number of NAs is huge, manually going through the datasets and deciding which one to use would be a cumbersome task right?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว +1

      Thanks Heral.. you can get the distrivution of each of these datasets as well like below..
      Summary (mice$imp$bmi$1).. Let me.know how it goes..

  • @datascientist2958
    @datascientist2958 3 ปีที่แล้ว +1

    You have compared the imputed dataset with mean of raw data or pooled data distribution?

  • @anushpetrosyan2535
    @anushpetrosyan2535 4 ปีที่แล้ว +1

    Great explanation, thank you!

  • @dielikerin
    @dielikerin ปีที่แล้ว +1

    Do you know from which author this multiple imputation is? I have to cite the method used in my article.,

  • @jimpul1001
    @jimpul1001 3 ปีที่แล้ว +1

    Thanks man, this was very useful

  • @anuradhasaikia9305
    @anuradhasaikia9305 2 ปีที่แล้ว +1

    the missing values in my dataset are not mentioned as NA but left blank. will there be an issue?

  • @divyasukumar7324
    @divyasukumar7324 4 ปีที่แล้ว +1

    Thanks for the detailed explanation...

  • @Icecube88
    @Icecube88 2 ปีที่แล้ว +1

    is there a better way to choose which column is closest to your mean instead of eye balling it? like what if you had over 100 rows.

    • @dataexplained7305
      @dataexplained7305  2 ปีที่แล้ว +1

      Summarize your output set... that is the best way..

    • @Icecube88
      @Icecube88 2 ปีที่แล้ว +1

      @@dataexplained7305 thanks. that worked .

  • @enriquecguerra
    @enriquecguerra 2 ปีที่แล้ว +1

    Thank you very much!

  • @anuradhasaikia9305
    @anuradhasaikia9305 2 ปีที่แล้ว +1

    I am gettting this error after the entering mice function.Error in str2lang(x) : :1:27: unexpected symbol
    1: Company ~ Year+Contingent liabilities

    • @dataexplained7305
      @dataexplained7305  2 ปีที่แล้ว

      Sorry missed it somehow..Hope you would have fixed the code by now.. if not please have your code sent to dataandyou@gmail.com

  • @anamoreno8406
    @anamoreno8406 2 หลายเดือนก่อน

    what if you have over 50000 variables? when i try to use these functions such its not possible to call out a specific variable

  • @ShreyasMeher
    @ShreyasMeher 3 ปีที่แล้ว +1

    I am getting this error - Error: $ operator is invalid for atomic vectors. I am using a panel data set.
    Any idea on what I should do?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Hi,
      Not sure if you have figured it out yet but i found something real quick for you..
      stackoverflow.com/questions/23299684/r-error-in-xed-operator-is-invalid-for-atomic-vectors

  • @manojthanu
    @manojthanu 3 ปีที่แล้ว +1

    nice explanation .. thank you

  • @seanmain5279
    @seanmain5279 3 ปีที่แล้ว +1

    This was great, thank you

  • @tsehayenegash8394
    @tsehayenegash8394 2 ปีที่แล้ว +1

    If you know the matlab code of MICE please inform me

  • @datascientist2958
    @datascientist2958 3 ปีที่แล้ว +1

    Sir can we use pmm for nominal features which have 4 categories.or we have to reduce the cardinality

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Yes pmm would be a good choice..Make sure to have them converted as a factor before you apply pmm..

  • @ranu9376
    @ranu9376 2 ปีที่แล้ว +1

    Amazing!

  • @Pancho96albo
    @Pancho96albo 2 ปีที่แล้ว +1

    thx mate

  • @kinleytshering1934
    @kinleytshering1934 3 ปีที่แล้ว +1

    substituting with mean for n/a will have accumulated error......i still cannot agree... say i have data 1, 10, 10000, 50, 20, na, na, na, 3, etc.... please explain, will it justify

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Thanks for your comment... you are correct. Mean Mode substitution will only work in certain cases when the distribution is tight rather loose like you specified.. thats why we get estimates using mice.. watch the second half of the video as well and let me know what you think..

  • @lanredaodu945
    @lanredaodu945 หลายเดือนก่อน

    please, how can I solve this error."
    iter imp variable
    1 1 df.IN_Ratio df.APTT_Ratio df.Clauss_Fibrnogen_level df.D_Dimer_Level df.Serum_UreaError in solve.default(xtx + diag(pen)) :
    system is computationally singular: reciprocal condition number = 8.02788e-23

  • @p3drito
    @p3drito 3 ปีที่แล้ว +1

    Is there a way to only impute certain variables with the "mice"-command? I'm looking for a way to specifically include certain predictor and auxiliary variables to my imputed model, since I am only working with a subset of variables of a bigger dataset. Thanks in advance.

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Thanks for your comment.
      The answer for your question is YES.
      Go to the 14:50 minute of this video. Which ever column you don't want to impute, you simply keep it null in the method() argument. For others, you specify whatever statistical function you like to use.
      Hope this helps

    • @christiaanscholten64
      @christiaanscholten64 3 ปีที่แล้ว +1

      Yes, you skip variables by choosing method "" for the feature you don't want the be imputed (like in the video in case of the age feature).

  • @camillesantos4953
    @camillesantos4953 3 ปีที่แล้ว +1

    also, what method do you suggest for ordinal variables??

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      My apologies for the delay.. Check this published paper, when you get a chance..
      www.researchgate.net/publication/326435546_Missing_Data_Imputation_for_Ordinal_Data

  • @MrAli2200
    @MrAli2200 3 ปีที่แล้ว +1

    Why did you choose the fifth column please ?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      If you see the mean of the original set is closer to the distribution of 5th output set.

  • @BenjaminLiraLuttges
    @BenjaminLiraLuttges 4 ปีที่แล้ว +1

    Up to what percentage of the data can be imputed? Any references?

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว

      Thanks for your comment. I suggest, keep the outcome variable at-least complete 75 % for better predictions.Other aux variables can be a bit less depending if it is a continuous, cat, ordinal e.t.c. At the bottom line, if there is any variable less than 60% full, think of deletion like pairwise or listwise...

    • @BenjaminLiraLuttges
      @BenjaminLiraLuttges 4 ปีที่แล้ว +1

      @@dataexplained7305 Thanks!

  • @ZeeNoorTrip
    @ZeeNoorTrip 2 ปีที่แล้ว +1

    Hi. Can you please let me know how can you impute the missing value if i have 172 variables? and u need to impute in all variables

    • @dataexplained7305
      @dataexplained7305  2 ปีที่แล้ว

      It depends on the percent of missing features.. if its more then it might not me sense to impute due to the loss of Natural value in the data

    • @ZeeNoorTrip
      @ZeeNoorTrip 2 ปีที่แล้ว +1

      @@dataexplained7305 thanks.
      but the data which i have is 4% missing and in different area with out the first variable.

    • @dataexplained7305
      @dataexplained7305  2 ปีที่แล้ว

      Sound like possible.. 👍

  • @bic5004
    @bic5004 4 ปีที่แล้ว +1

    hey! thanks for the video, I am getting this error:
    error in str2lang(x) : :1:5: unexpected Symbol
    1: 5531Atrialappendectomy
    ^
    I don´t know what to do, tried to delete it column but then it says the same error for the header for the next column! anyone who knows how to solve it ?

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว

      Thanks for your comment. Looks like a simple one around the syntax. Hope you have fixed it by now..

  • @brazilfootball
    @brazilfootball 3 ปีที่แล้ว

    Is it still ok to match the 5th choice with our original dataset's mean if we have non-normal data? Why?

  • @christiansniper2
    @christiansniper2 3 ปีที่แล้ว +1

    anyone can explain why he chose these methods in the mice function?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว +1

      Depending on the variable type, there are a bunch of function options to choose from.

  • @datascientist2958
    @datascientist2958 3 ปีที่แล้ว +1

    How can we extract pool imputed dateset from R Studio

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      yes.. you can try..
      pool(with(imp_function,lm(chl~bmi+age)))
      and to understand the results..
      www.rdocumentation.org/packages/mice/versions/2.8/topics/pool

  • @zoiyaehtisham818
    @zoiyaehtisham818 3 ปีที่แล้ว +1

    can you explain hmisc on R, or can we do imputation through mice when analysing categorical variables

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว +1

      Re:Hmisc.. i will upload a video.. but yes.you can impute categorical variables..

    • @zoiyaehtisham818
      @zoiyaehtisham818 3 ปีที่แล้ว +1

      @@dataexplained7305 what we have to do if doing categorical variables, simply follow the steps or have to add another step

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว +1

      @@zoiyaehtisham818 yea.. you will just need to apply a categorical algorithm to that feature/column.. if you see the video, i would have applied logreg to hyp column... hope this helps..

    • @zoiyaehtisham818
      @zoiyaehtisham818 3 ปีที่แล้ว

      @@dataexplained7305 thank you so much, I have data of 528 obs of 165 variables, and all my data is binary(categorical variables), this data I ve imported from SPSS on R. I have to do the analysis after the imputation and I ve short time, I am learning R since 3 months, and still failed to impute the data, I have applied the algorithm on my data but the error has come i.e
      Error in formula.character(object, env = baseenv()) :
      invalid formula "scale#1 ~ 0+gender+residence+age+religion+mothertounge+cultralbackground+academicbackground+occupation+income+ty( and so on) .Can you advice me what should I do? when I was doing imputation on SPSS there occured the MAXMODELPARAM error so I switch to R. And still I am unable. I ll be very thankful to you if you have any idea and you would tell me.

  • @GenAITutorials
    @GenAITutorials 3 ปีที่แล้ว +1

    Good Job.

  • @gatechmjm39
    @gatechmjm39 2 ปีที่แล้ว +1

    Do you have your syntax for this uploaded anywhere that I could just review your entire syntax for this video and compare it to my code?

    • @dataexplained7305
      @dataexplained7305  2 ปีที่แล้ว

      Can you check the video itself.. I don't have the code locally..

    • @gatechmjm39
      @gatechmjm39 2 ปีที่แล้ว

      @@dataexplained7305 Yes, thank you. I just didn't know if you have it exported on a notepad file or something.

  • @jaygomez2320
    @jaygomez2320 3 ปีที่แล้ว +1

    Hello! I have a dataset with 475 rows where the NA's are categorical values (factors with 4-13 levels) these variables have 2-3 missing values, can I use mode (most frequent) for imputation.
    Also, there are 2 variables with categorical data (factor with 12 levels) where the percentage of missing values are 18% and 25%. I think these variables are important, how can I fill them up? Thank you!

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      Hi Jay, thanks for your comment. My quick comments below
      First case: Either a mode or a categorical predictor algorithm like logreg will not make a big difference. So, feel free there..
      Second: Use Polyreg for this as it is >2 levels. It might be a default method also for these kinds of variables.

    • @jaygomez2320
      @jaygomez2320 3 ปีที่แล้ว +1

      @@dataexplained7305 Hello sir, thank you for your response. I have filled the missing categorical variables with their mode just now.
      In my second question, my dataset (475 observations) has 11 variables and 3 of those have a lot of missing data (17%, 25%, 27%), in using mice (polyreg), should I include those 11 variables in one go to fill all the missing data? Or should I include only the variables that I think that have effect on the missing values?
      CODE: try

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      @@jaygomez2320 not a problem. Handle the ones Which ever one has the missing values only. Others leave them as is...

    • @jaygomez2320
      @jaygomez2320 3 ปีที่แล้ว +1

      @@dataexplained7305 so that code will do?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว

      @@jaygomez2320 yes. Thats right.

  • @GuisseppeVasquezV
    @GuisseppeVasquezV 4 ปีที่แล้ว +1

    Thanks for your video !! but i dont know if you could explain the other parameters of mice package .. greetings

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว +1

      Anything specific that you are looking for ?

    • @GuisseppeVasquezV
      @GuisseppeVasquezV 4 ปีที่แล้ว +1

      Yes the "Seed" parameter, its important to write a specific number and if it is like that, what number i have to write .. also you said that i have pick the model that looks like in mean, but there is another method to choose the correct one ... something more statistical ... and also thank you so much. Greetings.

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว +1

      Thanks for the Q. Highly appreciate it.
      SEED: I suggest you customize your iterations(ideal = 20 to 40) and also your "m" values. Seed parameter is not an ask by the mice() as per the r documentation. Having said that, you can always choose seed as "NA" for a random number generation (or) self assume a number, if that brings in a better imputation.
      OUTPUT_PICK: The one I shown above is mean pick. You can also regress(linear/logistic for example) your variables for a more closer pick. However, considering the length of explanation involved, this should be a separate video, I guess.
      Cheers.

    • @GuisseppeVasquezV
      @GuisseppeVasquezV 4 ปีที่แล้ว

      @@dataexplained7305 thank so much for the explanation, and also i will hope you will bring us a new video with the explanation involved. Greetings.

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว +1

      Stay tuned ! I will do one soon..

  • @rohitnath5545
    @rohitnath5545 3 ปีที่แล้ว

    how can you just pick an imputation. it must be pooled.

  • @pabitrapradhan721
    @pabitrapradhan721 4 ปีที่แล้ว +1

    Bro what about copula based imputation

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว +1

      Thanks for your comment. I will make a video on the CoImp() soon... stay tuned..

    • @pabitrapradhan721
      @pabitrapradhan721 4 ปีที่แล้ว

      Plz bro

  • @Jas-ti7hr
    @Jas-ti7hr 3 ปีที่แล้ว

    Hey, thank you for the video! It was very informative!
    Is it possible that the table (my_imp$imp$bmi) could show different results because of the random multiple iterations?

    • @dataexplained7305
      @dataexplained7305  3 ปีที่แล้ว +1

      This ca show you the number of specified imputed data sets. For e.g. if you selected 5 like in the video, you will see five different imputed sets. You can check the myimp$chain mean which will show the chained mean computed based on imputation iterations that you specify... let me know if this helps..

    • @Jas-ti7hr
      @Jas-ti7hr 3 ปีที่แล้ว

      @@dataexplained7305 thank you for the prompt reply! If I’m not mistaken, the chained mean is identical!

  • @Icecube88
    @Icecube88 2 ปีที่แล้ว

    what is you have categorical variables

    • @Icecube88
      @Icecube88 2 ปีที่แล้ว +1

      nvm this handles categorial as well as numeric

    • @dataexplained7305
      @dataexplained7305  2 ปีที่แล้ว +1

      Yes.. you just need to choose the appropriate functions to handle the categorical columns...

    • @Icecube88
      @Icecube88 2 ปีที่แล้ว

      @@dataexplained7305 yeah I didn't realize that the first part of the vid you weren't using mice. just a simple imputation. second part of the vid you use mice. thanks for the vid.

  • @idealtube281
    @idealtube281 4 ปีที่แล้ว

    its great presentation but it always asked about " my_imp " is not found....s how i pass this step...?
    i put the error i faced below,,,thanks alot
    summary(input_dta3$faminc)
    Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
    4.0 50.0 72.0 134.4 150.0 3000.0 494
    > my_imp$imp$faminc
    Error: object 'my_imp' not found
    > my_imp=imp$faminc
    Error: object 'imp' not found
    > my_imp$imp$faminc
    Error: object 'my_imp' not found...... this is the problem with me..

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว

      Thanks for your comment.
      Two things here. 1) Make sure you have executed that line of my_imp=mice(....)
      2) If this still doesnt help, paste the code below with important details taken off.. just the code flow..

    • @idealtube281
      @idealtube281 4 ปีที่แล้ว +1

      @@dataexplained7305 i also excuted before like,,,below
      my_imp=mice(input_dta3,m=5,method=c("","pmm","logreg","pmm"),maxit=20)
      Error: Length of method differs from number of blocks
      > summary(input_dta3$faminc)
      Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
      4.0 50.0 72.0 134.4 150.0 3000.0 494
      > my_imp$imp$faminc
      Error: object 'my_imp' not found
      > my_imp$imp$faminc
      Error: object 'my_imp' not found
      this is what i excuted

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว

      @@idealtube281 check the length of the arguments supplied and columns you have in the data. Still not working ? then, email me the code dataandyou@gmail.com. I will check..

  • @user-qh6vz6cx8n
    @user-qh6vz6cx8n 3 หลายเดือนก่อน

    the jeet imputatoor

  • @thyowen
    @thyowen 27 วันที่ผ่านมา

    why do none of these videos EVER explain how to appropriately handle missing values? i know how to find missing values, i know how to coutn them, i know how to impute them or remove them. whats NEVER explained is how to handle them APPROPRIATELY. its so fucking infuriating

  • @djangoworldwide7925
    @djangoworldwide7925 3 ปีที่แล้ว +1

    22 mins that could be summed to 5.

  • @Hari-888
    @Hari-888 4 ปีที่แล้ว +1

    stop saying 'right' all the time. I literally had to mute the audio and use CC captions because of that.

    • @dataexplained7305
      @dataexplained7305  4 ปีที่แล้ว

      Lol... will try to fix it next time. Stay tuned though..

  • @kaushikdujo
    @kaushikdujo 3 ปีที่แล้ว +1

    unnecessarily stretched

  • @kaushikdujo
    @kaushikdujo 3 ปีที่แล้ว

    very basic .. this video could have been made in 5 mins

  • @domivan2581
    @domivan2581 3 ปีที่แล้ว +1

    Well explained. Thanks for this!