Handling Missing Values using R

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ส.ค. 2024

ความคิดเห็น • 194

  • @samsontan1141
    @samsontan1141 4 ปีที่แล้ว +2

    You are a life saver
    Dr. Bharatendra Rai. Thank you.

    • @bkrai
      @bkrai  4 ปีที่แล้ว +1

      Thanks for comments!

  • @rajlatte9131
    @rajlatte9131 4 ปีที่แล้ว +3

    This was the best explanation that I have heard since my DS journey, Now I can confidently deal with missing values in R.. Kudos to you Bharat Sir, much appreciated :)

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks for comments!

  • @flamboyantperson5936
    @flamboyantperson5936 6 ปีที่แล้ว +11

    Such a nice explanation Sir. This was one of the most awaited lecture. Thank you so much for such a nice explanation.

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for comments!

  • @dileep3549
    @dileep3549 6 ปีที่แล้ว +3

    Thanks a ton sir , your videos are very helpful . You teach subject very nicely .

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for comments!

  • @niteshranjan5033
    @niteshranjan5033 2 ปีที่แล้ว +1

    Really sir It's very helpful to us. No one can explain these things like you

    • @bkrai
      @bkrai  2 ปีที่แล้ว

      Thanks for comments!

  • @asterlookanalytics9853
    @asterlookanalytics9853 ปีที่แล้ว +2

    You have taught me most of the things. Actually you introduced me to machine learning. Greatly fantastic videos. Be blessed

    • @bkrai
      @bkrai  ปีที่แล้ว

      Great to hear!

  • @bassamal-kaaki3253
    @bassamal-kaaki3253 4 ปีที่แล้ว +1

    You are such a wonderful prof. I love the way you handle things with ease and without confusions. You are the best.

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thank you! 😃

  • @abhishek894
    @abhishek894 2 ปีที่แล้ว +1

    What a awesome video. You make everything so easy. Thank you once again Dr. Rai.

    • @bkrai
      @bkrai  2 ปีที่แล้ว

      You are welcome!

  • @delt19
    @delt19 6 ปีที่แล้ว +4

    I've been wondering how to impute data and as always you make it seem very easy. Would be interested in seeing a tutorial on how to handle outliers in a data set prior to training a model.

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for feedback and suggestion. I've added it to my list.

    • @sebastianvarela2190
      @sebastianvarela2190 5 ปีที่แล้ว

      @@bkrai where is your video on handling outliers? I cant find it in your list... thanks in advance!

    • @atulsaurabh8
      @atulsaurabh8 4 ปีที่แล้ว

      @@bkrai Can not find tutorial on outlier treatment in R, could you please share the link Sir?

  • @vishalialahappan9069
    @vishalialahappan9069 5 ปีที่แล้ว +2

    Thanku so much Sir! Best tutorial channel for learning datascience with R

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Thanks for your comments!

  • @fatimasadjadpour4845
    @fatimasadjadpour4845 4 ปีที่แล้ว +1

    Dear Professor Rai,
    You have super useful videos for every subject!
    Many Thanks

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Glad to hear that!

  • @ravikumar-rz8uu
    @ravikumar-rz8uu 6 ปีที่แล้ว +2

    Sir, You are explaining very well Data Science Concepts ,Thank You..

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for comments!

  • @skfunnext
    @skfunnext 6 ปีที่แล้ว +1

    Simple and easy explanation. Requesting you to please upload one video with different methods of imputation with majority of categorical predicators if possible. Thanks Sunil

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for comments and suggestion. I've added it to my list.

    • @skfunnext
      @skfunnext 6 ปีที่แล้ว

      Hi Sir - I have dataset from one of competition, if you allow can i send to you to make video on imputation with categorigal predictors. Please share your email id - sangasunil@gmail.com

    • @skfunnext
      @skfunnext 6 ปีที่แล้ว

      Found your email ID and sent you data set - Thanks for help - Sunil

  • @sapirelmaliah9561
    @sapirelmaliah9561 5 ปีที่แล้ว +1

    Thank you for a very clear and helpful explanation! I used your code on my data and it worked!!

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Thanks for comments!

  • @MinecraftPhil72
    @MinecraftPhil72 3 ปีที่แล้ว

    First of all - thank you for still answering questions two years after the release of this video!
    My question is - where is the original data file taken from, as I would like to use it in a paper and have to cite the original source.
    Thank you sir!

  • @seant7907
    @seant7907 4 ปีที่แล้ว +1

    sir, you just explained the topic very well and understandable. I automatically pressed the subscribe button. Please do continue your work.

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks for comments!

  • @gauravyewale3986
    @gauravyewale3986 6 ปีที่แล้ว +2

    It helped me a lot. Thanks for the video sir.Would like to see more such videos from you.

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for the comments! Here are some links that you may find useful:
      Machine Learning videos: goo.gl/WHHqWP
      Introductory R Videos: goo.gl/NZ55SJ
      Deep Learning with TensorFlow: goo.gl/5VtSuC
      Image Analysis & Classification: goo.gl/Md3fMi
      Text mining: goo.gl/7FJGmd
      Data Visualization: goo.gl/Q7Q2A8

  • @pralhadkalkundre2651
    @pralhadkalkundre2651 5 ปีที่แล้ว +1

    Nice Tutorial... Thoroughly understood.. Please make on outliers as well👍

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Thanks for comments and suggestion!

  • @sanjayursal5330
    @sanjayursal5330 4 ปีที่แล้ว +1

    Very nicely explained and pretty in depth too.

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Glad it was helpful!

  • @pitrodapiyush
    @pitrodapiyush 5 ปีที่แล้ว +1

    Upmost respect for sharing the knowledge with simple and effective presentation.

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Thanks for your comments!

  • @sharanyamahapatra7563
    @sharanyamahapatra7563 3 ปีที่แล้ว +1

    Beautifully explained sir, thank you for this video !!

    • @bkrai
      @bkrai  3 ปีที่แล้ว +1

      Most welcome!

  • @maksim0933
    @maksim0933 3 ปีที่แล้ว +1

    Such a nice music) Thank you for your lesson) Well done! Very appreciated!

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      Many thanks!

  • @elenafumagalli9044
    @elenafumagalli9044 3 ปีที่แล้ว +1

    Very nicely explained, thank you. Can you suggest references that we could use in a paper to justify imputing NAs before running a mixed anova analysis rather than just using a lmer function that does listwise deletion? Our missing are 30% of the data and I think it is too much information to be lost...

    • @bkrai
      @bkrai  3 ปีที่แล้ว +1

      You can go through the documentation of the package, it should provide some references.

  • @vandanaarya431
    @vandanaarya431 3 ปีที่แล้ว

    It is wonderful sir..You have provided it for the best of the research. I am thankful to you.

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      You are welcome!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 3 ปีที่แล้ว

    A very nice presentation of how to impute missing data. However, I was a bit disappointed in the data set you chose (vehicleMiss.csv). It lacked information. What was the source? How long a time span did it cover? What was the currency - $. Although, these things seem clear it helps to state it nevertheless. A brief introduction of the data and what is means would have been nice. Finally, with less than 1% NAs few would bother spending a lot of time or effort imputing such data since the effect is essentially null on any analysis outcome. Another dataset - even the ones already baked into some of the packages (such as naniar or mice) would have been more appropriate. Don't get me wrong I appreciate the time and effort you put into this and it is a very nice introduction to the mice package. Thanks.

  • @felipeparra2365
    @felipeparra2365 4 ปีที่แล้ว +1

    Sir, thank you very much for that fantastic explanation, and thank you again for sharing your knowledge

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks for your comments!

  • @sureshm73
    @sureshm73 6 ปีที่แล้ว +2

    Great Explanation in a easier way , thank you so much Sir, Could you please also create a video on the best way to impute the Outliers ?

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for feedback and suggestion. I've added it to my list.

  • @danish9135
    @danish9135 2 ปีที่แล้ว +1

    great explanation. Appreciation from Pakistan

    • @bkrai
      @bkrai  2 ปีที่แล้ว

      Thanks for comments!

  • @JamesSmith-kk1yc
    @JamesSmith-kk1yc 4 ปีที่แล้ว +1

    This is a very clear explanation and demonstration of the mice package. I will use this package from now on. thanks. What dataset did you use in your demonstration?

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks, and link to data is available in the description area.

  • @royodama7689
    @royodama7689 2 ปีที่แล้ว +1

    Thank you so much, Prof.

    • @bkrai
      @bkrai  2 ปีที่แล้ว

      You are very welcome!

  • @PM-st6vu
    @PM-st6vu 6 ปีที่แล้ว +1

    Thanks for creating such intuitive video once again. Very helpful.
    Was keen to know, what is the best way to research these techniques and ending up writing such succinct codes?

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      There are several books and research papers available on each topic. Probably google itself is a good starting point to search relevant information.

  • @comredesigns6328
    @comredesigns6328 4 ปีที่แล้ว +1

    Sir thank you for all your videos. They have helped my learning r in a scale that is beyond words can explain. I am thankful to you in every step I take in learning these. Blessings!

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks for your feedback and comments!

  • @akd9977
    @akd9977 6 ปีที่แล้ว +4

    Thank you. Can you please create one video to handle outlier in data

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for feedback and suggestion. I've added it to my list.

  • @stefansms
    @stefansms 5 ปีที่แล้ว +2

    That's such a wonderful explanation about missing data! I have bought several ML courses in Udemy, but none of them were so detailed as your video. Thank you!
    Please let me know if you have a donation method available!

    • @bkrai
      @bkrai  5 ปีที่แล้ว +2

      Thanks for your comments! After your comment I've added donate button, however it is not necessary.

  • @abubakarmehran6052
    @abubakarmehran6052 4 ปีที่แล้ว +1

    Can't explain u, how much i respect and love u sir! ❤

    • @bkrai
      @bkrai  4 ปีที่แล้ว +1

      Thanks a ton!

    • @abubakarmehran6052
      @abubakarmehran6052 4 ปีที่แล้ว

      @@bkrai sir, when i ran the md.pattern() code, my plot had not become as yours! please help sir

  • @larbihouichi8942
    @larbihouichi8942 5 ปีที่แล้ว +2

    Dear Bharatendra Rai
    In multiple imputation, how to decide on which the best proposed from 3 or 5 imputation?

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      When you do 3 imputations, you can separately try them with your prediction model and choose the one that works best.

  • @irmafatmawati2764
    @irmafatmawati2764 4 ปีที่แล้ว +1

    great Sir. I like your explanation.

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks and welcome!

  • @NikhilKumar-hv6dv
    @NikhilKumar-hv6dv 6 ปีที่แล้ว +1

    Explanation part is very good. I have a question, does this package perform swiftly when it comes to big data sets with multiple rows and lots of NA's? What are the other options?

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      It should work fine with bigger data sets. If your computer is faster with at least 16gb RAM, I don't foresee any issue. You can also save time with number of imputations where default is 5, but you can go lower too.

  • @melodicguitarist
    @melodicguitarist 6 ปีที่แล้ว +2

    Very nicely explained Sir. If I need to understand the process of imputation that how it is calculated then I need to read the documentation for the same that what calculations has been done in this function. Can you name some companies also working in Data Science and Analysis in R , Python etc.

    • @bkrai
      @bkrai  6 ปีที่แล้ว +1

      For each R package there is very detailed publicly available documentation that provides various functions, their details, and examples. All leading companies such as Google, Facebook, Apple, Microsoft, Twitter, etc., use data science and freely available packages such as R and Python.

    • @melodicguitarist
      @melodicguitarist 6 ปีที่แล้ว

      Thank you Sir. Any startups you know in which a fresher can apply so that he can make his career in this stream.

    • @bkrai
      @bkrai  6 ปีที่แล้ว +1

      You will have to find that out in your area. I live bear Boston and here there are lot of such companies.

  • @abdulwaheedshaikh3745
    @abdulwaheedshaikh3745 5 ปีที่แล้ว +4

    Sir, I am a great fan of you.

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Thanks for comments!

  • @karyichia2366
    @karyichia2366 3 ปีที่แล้ว +1

    thanks a lot for the explanation, sir.
    i have a question, p

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      Yes, it's length of whole data and 100 is used to convert it in to %.

    • @karyichia2366
      @karyichia2366 3 ปีที่แล้ว +1

      Alright, i got it. Thank you sir!

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      Welcome!

  • @KayYesYouTuber
    @KayYesYouTuber 4 ปีที่แล้ว +1

    This is beautiful. Thank you very much.

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks for comments!

  • @WhiteGhost13
    @WhiteGhost13 4 ปีที่แล้ว +1

    Thank you so much for this video! I really appreciate it!

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks for comments!

  • @akshitbhalla874
    @akshitbhalla874 5 ปีที่แล้ว +1

    With the same code, I got a different marginplot. It does not show 13 (but shows 8) and no numbers on axes. Also, you are a wonderful teacher.

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      You may not be seeing complete plot if the area of the 4th window is too small.

  • @p3drito
    @p3drito 3 ปีที่แล้ว +1

    Is there a way to impute only specific columns. Say, I don't want to impute column 2-7 with the command [,2:7] but columns 2,4,8,10 etc. Can I specify these in the mice command?
    Thanks in advance!

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      You can use a subset of data before using mice. Once done, you can combine columns back.

  • @poojabiradar600
    @poojabiradar600 5 ปีที่แล้ว +1

    Thank u very much fir creating vedios...so much helpful n easily understandable.

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Thanks for comments!

  • @santosacosta4645
    @santosacosta4645 6 ปีที่แล้ว +1

    Thank you. Could you please elaborate on how do you make your decision on which of the 3 imputation methods to use?

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Any of the 3 imputations should be fine. Many methods do not allow you to proceed with model building unless missing data is addressed. You can also run a model with each of the 3 imputations and choose one that gives the best results.

  • @kinisuni
    @kinisuni 6 หลายเดือนก่อน +1

    When we are running the Summary Data , state is showing ass Char, in your case it is showing as Factor, kindly help us how to address the same

    • @bkrai
      @bkrai  5 หลายเดือนก่อน

      You can use this line to change it to factor:
      data$State

  • @ahiqtidar
    @ahiqtidar 3 ปีที่แล้ว

    Such a great Explanation
    I need help on my ML problem. I am a chemical engineer with 4 years of manufacturing background, I am new DS and learning myself from TH-cam and other sources. I am predicting the efficiency of a chemical reactor that is measured on 3 different days a week by Laboratory. This efficiency is indirectly related to some other variables whose values are continuous.
    In short, I have 7 predictors/input variable, each variable has one value per day, that means for each input variable I have 750 values ( almost two years), but my outcome variable has only 230 values in the two years, I want to fill the missing values for my outcome variable. Should I use imputation?

  • @TinaHelen
    @TinaHelen 4 ปีที่แล้ว +1

    Thank you for that great explanation. The thing I still don't get is, what criteria should I use to decide, which imputation to use? And s it always good to choose one (e.g. the first imputation") for ALL variables?

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      If there are 3 and all of them provide consistent outcome for the model, then any of the 3 can be chosen. But if due to random chance one of them behaves very different in terms of model results, then have another option is always good.

  • @kinisuni
    @kinisuni 6 หลายเดือนก่อน +1

    dear sir, impute is not working for STATE, it still shows the NA , however in your video it shows as polyreg, please help

    • @bkrai
      @bkrai  5 หลายเดือนก่อน

      You can use this line to change it to factor:
      data$State

  • @ahmetklnc6347
    @ahmetklnc6347 4 ปีที่แล้ว +1

    Thanks for explanation! My question is how can I apply mice function to large data set (for example: my data set that I work on it has 105000 observations and 226 variables)? I tried what you applied in video but firstly I had error like "system is computationally singular: reciprocal condition number". After that I also change method parameter as "cart" inside mice function (but I am not sure cause I have both categorical and continuous features in my data), it did not give any error but not it takes too much time and does not end. So, I could not make any imputation. Do you have any suggestion? Thank you.

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      In such situations I use appropriate sample from the original data to build models.

  • @patrickduhirwenzivugira4729
    @patrickduhirwenzivugira4729 4 ปีที่แล้ว +1

    Thank you for this video. You are amazing!

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks for comments!

  • @Bupchiieee
    @Bupchiieee 3 ปีที่แล้ว +1

    Unbeatable

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      Thanks for your comment!

  • @mukeshchoudhary2842
    @mukeshchoudhary2842 3 ปีที่แล้ว +1

    Nice explanation. Clear and to the point. I have one query regarding multi-year data. I have data on maize hybrids belonging to three maturity levels (Early, Medium and late) and tested for three years. The problem is that data is unbalanced as the number of hybrids tested every year (and for each maturity) varies with some being common across all three years. Can you help me how to proceed? I applied the lme4 package for variance components estimation but it gives an error for model convergence.

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      For imbalance problem, you can try this:
      th-cam.com/video/Ho2Klvzjegg/w-d-xo.html

  • @aniruddhaghosh9823
    @aniruddhaghosh9823 ปีที่แล้ว

    Sir, 1 query I am having if we need to replace any variable value by its group mean then how will we do and sometimes it is also true that most of the variables have skewed data, i.e., we cannot use mean and should use median instead, then how to do the replacement of missing values. Please help sir!!

  • @DanielKanyata
    @DanielKanyata 3 ปีที่แล้ว

    Thank you so so much for this very helpful video. I want to find out though. Is there a way of saving the complete data after imputation into time series (xts object) other than the data.frame? I am dealing with monthly returns

  • @MKmadhurima
    @MKmadhurima 3 ปีที่แล้ว

    Can you please give the detailed explanation of the interpretation of md.pairs

  • @Bupchiieee
    @Bupchiieee 3 ปีที่แล้ว +1

    Sir can I perform imputation after converting a whole data set which includes character values to numeric values.

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      It will depend on the type of variable. Some chr variables may not be meaningful for converting to numeric.

  • @parasrai145
    @parasrai145 6 ปีที่แล้ว +1

    Nicely explained!

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks!

  • @ravindarmadishetty736
    @ravindarmadishetty736 6 ปีที่แล้ว +1

    Nice video sir....Very reliable for missing data. What is the use of VIM package

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks for feedback! I used VIM for some of the plots.

  • @tanushreebubna2312
    @tanushreebubna2312 4 ปีที่แล้ว +1

    Sir, how do we decide how many imputations do we want and which of the 3 imputaions to choose from?

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Usually 3 is sufficient. You can choose one that gives better results.

  • @adeelahmadnadeem1265
    @adeelahmadnadeem1265 2 ปีที่แล้ว

    Nice tutorial. I have 1 year time series MODIS vegetation indices like NDVI , EVI etc. with 16 days temporal resolution. i want to fill the time gap in datasets. How i can do this in RStudio any suggestion?

  • @abdulwaheedshaikh3745
    @abdulwaheedshaikh3745 5 ปีที่แล้ว +1

    Sir, in the case of my dataset missing values using MICE, it shows this error: "Warning message:
    Number of logged events: 243." As my dataset is having 120 columns. How can change my code. My current code is: impute

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Note that "warning message" is not error.

  • @MrSworob
    @MrSworob 4 ปีที่แล้ว

    Thank you so much for the great video, it really helped me!!

    • @bkrai
      @bkrai  4 ปีที่แล้ว +1

      Thanks for comments!

  • @dimplekashyap1
    @dimplekashyap1 6 ปีที่แล้ว +1

    Great video

    • @bkrai
      @bkrai  6 ปีที่แล้ว

      Thanks!

  • @annmariyageorge3346
    @annmariyageorge3346 4 ปีที่แล้ว +1

    If a column only can have yes or no and some values are missing, how can i impute ?

    • @bkrai
      @bkrai  4 ปีที่แล้ว +1

      You may go with the most frequent class as one option.

  • @atifdai313
    @atifdai313 ปีที่แล้ว

    How to fill the missing values in panel data?

  • @ozozan7895
    @ozozan7895 4 ปีที่แล้ว +1

    Dear Prof Rai, I found this error in the package
    marginplot(data[,c('Mileage', 'lc')])
    Error in marginplot(data[, c("Mileage", "lc")]) :
    could not find function "marginplot"

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Make sure to run the libraries at the beginning.

    • @ozozan7895
      @ozozan7895 4 ปีที่แล้ว +1

      @@bkrai Thank you for your advice, Sir. I solved the issue. However, another issue comes, as after imputation a warning message appears "Warning message: number of logged events: XX."
      XX is a number. Do you have any clue about this issue?

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      In R warnings are ok. It's not an error.

    • @ozozan7895
      @ozozan7895 4 ปีที่แล้ว

      @@bkrai Ok. Would you recommend this MICE imputation method for a gene expression analysis and mass spectrometer based data? since these data can be large in variable.

  • @haroonkhan4u
    @haroonkhan4u 5 ปีที่แล้ว

    in this function under impute (impute$imp$Mileage) what is "imp" where did it came from? and great video on missing value is there any other way we can treat missing value?

  • @isaibassene1331
    @isaibassene1331 3 ปีที่แล้ว +1

    Waouh! Thank you so much.

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      You're welcome!

  • @MAHENDRAKUMAR-ct8jl
    @MAHENDRAKUMAR-ct8jl 2 ปีที่แล้ว +1

    thanks alot !!!

    • @bkrai
      @bkrai  2 ปีที่แล้ว

      You're welcome!

  • @SanjayKNayak-to3nw
    @SanjayKNayak-to3nw 4 ปีที่แล้ว +1

    Nice video sir..I tried this in a situation where only one column is there with missing value there I am getting the error "Data should be a matrix or data frame" how to handle it?

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      You can change your data format to data.frame using following:
      data

    • @hardabrahmadave6173
      @hardabrahmadave6173 4 ปีที่แล้ว

      @@bkrai It still shows the same error in my case any hints as to what i could be doing wrong?

  • @thomaspgumpel8543
    @thomaspgumpel8543 4 ปีที่แล้ว +1

    Thanks for an excellent video. As per your instructions:
    impute

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      There is no need to exclude categorical variables.

    • @thomaspgumpel8543
      @thomaspgumpel8543 4 ปีที่แล้ว

      @@bkrai Categorical data (such as gender) must be an integer.

  • @sk93359
    @sk93359 4 ปีที่แล้ว +1

    Hi Sir,
    this code and Technique are not working in Address Type data like Categorical Data Could you please Make a Video on only Categorical Variables not any numerical Variables

    • @bkrai
      @bkrai  4 ปีที่แล้ว +1

      In my data, 'state' is a categorical variable.

    • @sk93359
      @sk93359 4 ปีที่แล้ว

      @@bkrai yes I saw but in my case I have all data is categorical data with 500 obs.

    • @sk93359
      @sk93359 4 ปีที่แล้ว

      @@bkrai
      could you get me Mail id i will send data ?

  • @lokesh542
    @lokesh542 4 ปีที่แล้ว +1

    Hello sir thanks for such beautiful explanation but while imputing missing values for both categorical and numerical values using mice, my categorical values are still NA:
    1 2 3
    68 NA NA NA
    I am using the same data file vehicleMiss can you please help why this is happening.
    Below is the code:
    p

    • @bkrai
      @bkrai  4 ปีที่แล้ว +1

      Note that after running the last line that you mentioned, there is still no change in the original file. If there were missing values to start with, it still has missing values.

  • @coruscated
    @coruscated 5 ปีที่แล้ว +1

    Can you please mention which packages to install while running the code

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Any package for which I used library, they need to be installed first.

  • @yousfoss4367
    @yousfoss4367 4 ปีที่แล้ว +1

    thks for the video. please how can i do to have yourthe dataset used for this video, such as to follow up properly.
    thks

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Link to data file is in the description area below video.

  • @shivamkrathghara3340
    @shivamkrathghara3340 3 ปีที่แล้ว +1

    👌👌👌 Thankyu

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      You are welcome!

  • @Rehan1824
    @Rehan1824 3 ปีที่แล้ว +1

    awesome...

    • @bkrai
      @bkrai  3 ปีที่แล้ว

      Thanks!

  • @jayashriraghunath3210
    @jayashriraghunath3210 4 ปีที่แล้ว +1

    Sir can we replace NA values in string type?

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Yes

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 ปีที่แล้ว +1

    "observed & Imputed Values" 14:01

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thx

  • @anandacharya9919
    @anandacharya9919 4 ปีที่แล้ว +1

    How to handle missing value in Category variable not mentioned

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      You may use one of the classification methods to predict missing category.
      th-cam.com/play/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1.html

  • @pratibhabajpai377
    @pratibhabajpai377 4 ปีที่แล้ว +1

    Thank you sir for a wonderful video on imputation of missing values. Sir I am working on a covid dataset which have about 33 columns. While I am able to impute missing values for some columns but for other columns like 'city' or 'date on which the patient was admitted to hospital' NA are not replaced by imputed values. I don't even get any error msg. Kindly guide me

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      For those type of columns you may have to find some other way.

  • @me3jab1
    @me3jab1 5 ปีที่แล้ว +1

    Thank you boss

    • @bkrai
      @bkrai  5 ปีที่แล้ว

      Welcome!

  • @Dr_Rod_Rizzo
    @Dr_Rod_Rizzo 5 ปีที่แล้ว

    Thanks!! Very useful! Do you know why my R cannot find the" md.pattern" and "md.pairs"?

    • @prerittrajputt7351
      @prerittrajputt7351 4 ปีที่แล้ว

      I had the same issue and I rectified it by using library(mice) and library(VIM)

  • @darshitsolanki7352
    @darshitsolanki7352 4 ปีที่แล้ว +1

    Great sir 😍

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thanks!

  • @ArunKumar-sg6jf
    @ArunKumar-sg6jf 4 ปีที่แล้ว +1

    sir why u using length in code give me explanation

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      I didn't understand your question. Can you be more specific?

    • @ArunKumar-sg6jf
      @ArunKumar-sg6jf 4 ปีที่แล้ว +1

      @@bkraiin #missing data
      Sir u used length(x) what for u used to this

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      That's to find percentage of missing values. So numerator tells number of missing values and length(x) is total number of values.

  • @swapnilpatil5882
    @swapnilpatil5882 5 ปีที่แล้ว +1

    Hello sir
    nice video!!!
    plz help me
    How to impute mode in NA values??
    thank you!

    • @bkrai
      @bkrai  5 ปีที่แล้ว +1

      This video has all the steps.

    • @swapnilpatil5882
      @swapnilpatil5882 5 ปีที่แล้ว

      @@bkrai mode have categorical variable ?

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 ปีที่แล้ว +1

    "Impute" 8:39

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thx

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 ปีที่แล้ว +1

    "complete Data Set" 12:53

    • @bkrai
      @bkrai  4 ปีที่แล้ว

      Thx

  • @alaaalrawajfeh153
    @alaaalrawajfeh153 ปีที่แล้ว +1

    when mice is discoverd ?

    • @bkrai
      @bkrai  ปีที่แล้ว

      I didn't understand your question.