4.3. Handling Missing Values in Machine Learning | Imputation | Dropping

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 ต.ค. 2024
  • Hi! I will be conducting one-on-one discussion with all channel members. Checkout the perks and Join membership if interested: / @siddhardhan Check membership Perks: / @siddhardhan
    . In this video, I have explained about Handling Missing values using Python in Machine Learning. I have explained about Imputation and Dropping in Handling missing values.
    All presentation files for the Machine Learning course as PDF for as low as ₹200 (INR): Drop a mail to siddhardhans2317@gmail.com
    Enroll at One Neuron to learn from 100 courses in one subscription with 5% discount: courses.ineuro...
    Hi guys! I am Siddhardhan. I work in the field of Data Science and Machine Learning. It all started with my curiosity to learn about Artificial Intelligence and the ability of AI to solve several Real Life Problems. I worked on several Machine Learning & Deep Learning projects involving Computer Vision.
    I am on this journey to empower as many students & working professionals as possible with the knowledge of Machine Learning and Artificial Intelligence.
    Hello everyone! I am setting up a donation campaign for my TH-cam Channel. If you like my videos and wish to support me financially, you can donate through the following means:
    From India 👉 UPI ID : siddhardhselvam2317@oksbi
    Outside of India? 👉 Paypal id: siddhardhselvam2317@gmail.com
    (No donation is small. Every penny counts)
    Thanks in advance!
    Let's build a Community of Machine Learning experts! Kindly Subscribe here👉 tinyurl.com/md...
    I am making a "Hands-on Machine Learning Course with Python" in TH-cam. I'll be posting 3 videos per week: Monday Evening; Wednesday Evening; Friday Evening.
    Dataset File: drive.google.c...
    Colab File Link: colab.research...
    Download the Course Curriculum File from here: drive.google.c...
    LinkedIn: / siddhardhan-s-741652207
    Telegram Group: t.me/siddhardhan
    Facebook group: www.facebook.c... Instagram: / siddhardhan23

ความคิดเห็น • 48

  • @Kevin-gz7th
    @Kevin-gz7th 2 ปีที่แล้ว +1

    Siddhardhan, This is a fantastic video. I just want to find out how to go about a dataset with fluctuating densities? How do you go about it? Do you use mean, median, or mode? Thank you so much

  • @digigoliath
    @digigoliath 3 ปีที่แล้ว +1

    Thanks. Very useful for a quick refresher of the techniques to deal with missing values.

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      Glad it was helpful!😇

  • @karishmasewraj6437
    @karishmasewraj6437 ปีที่แล้ว

    Can you plz make a video on handling categorical missing values with lots of unique values . Also how to put different datasets together with common columns that are in the other datasets that needs to be joined together.

  • @jeezz4128
    @jeezz4128 3 ปีที่แล้ว

    I've watched all ur ML videos ... it was very helpful to me ... damn clr explanation ...keep going sir

  • @ashulohar8948
    @ashulohar8948 ปีที่แล้ว

    Nice explanation but what to do for large dataset plz make a vedio on that

  • @studystuffs8239
    @studystuffs8239 5 วันที่ผ่านมา

    Dataset was not correctly read.
    Salary column has direct dependency with status column.
    Student with status not placed salary should have been replaced with 0 not by any mean , median ,mode.
    As far my understanding only the outlier satisfying all the dependency should be replaced with any of other methods.
    Please correct me if I'm wrong on this?

  • @criclal1787
    @criclal1787 2 ปีที่แล้ว

    Hi, First of all the video was really helpful. But I noticed that in the head() function, the salary was NaN when the status was not placed. So I tried this: df[pd.isnull(df.salary)==True]['status'].unique(). The output was such that the salary was NaN only when the status was not placed.. I mean the unique value was just "Not Placed". So I guess it will be correct to fillna with 0 instead of median, mode or mean. I understand that you have made this video for just understanding purpose but just telling what I tried. Thanks for the video, very helpful!

    • @Siddhardhan
      @Siddhardhan  2 ปีที่แล้ว +1

      hi! I didn't notice it. thanks for letting me know. I'll try to update this.

    • @criclal1787
      @criclal1787 2 ปีที่แล้ว

      @@Siddhardhan Oh Glad that it helped! Thanks again for the video

    • @pushkarjoshi1400
      @pushkarjoshi1400 ปีที่แล้ว +1

      Bro, how do u know these concepts? Like we have to read full documentaion or it will come with experience??

    • @criclal1787
      @criclal1787 ปีที่แล้ว

      @@pushkarjoshi1400 a bit of both. It's not much actually

    • @pushkarjoshi1400
      @pushkarjoshi1400 ปีที่แล้ว

      @@criclal1787 yeah, but how did u explore concepts and libraries in the beginning?

  • @kelethabetseroberts7934
    @kelethabetseroberts7934 ปีที่แล้ว +2

    You Sir are a very rare gem in these TH-cam streets!🤍💫👏🏾

  • @hrishikeshh4936
    @hrishikeshh4936 2 ปีที่แล้ว

    super well designed course

  • @halfbloodprince1788
    @halfbloodprince1788 3 ปีที่แล้ว

    Awesome thank you for all videos

  • @truptimahadik9435
    @truptimahadik9435 2 ปีที่แล้ว +1

    Nice explanation. Keep it up.👍

  • @DeepLearning-k2s
    @DeepLearning-k2s 3 หลายเดือนก่อน

    Well explained 👏🏻

  • @I_Anupam_Pandey
    @I_Anupam_Pandey 2 ปีที่แล้ว

    Amazing explaination sir, all my doubts are clear

  • @srikanthmidde6102
    @srikanthmidde6102 3 ปีที่แล้ว +1

    Thanks a lot..Siddhardhan

  • @jeezz4128
    @jeezz4128 2 ปีที่แล้ว

    Bro plz make videos on outlier detection nd removal techniques

  • @venkatprabhu2057
    @venkatprabhu2057 3 ปีที่แล้ว

    siddhardh ji can you please make a series of videos for explaining each machine learning algorithms

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +3

      hi! we have separate module for it. kindly check the course curriculum.

    • @venkatprabhu2057
      @venkatprabhu2057 3 ปีที่แล้ว

      @@Siddhardhan now only i saw your curriculum amazing please move forward waiting for that particular module....with full eagerness........😇😇😇

  • @hishnew
    @hishnew 8 หลายเดือนก่อน

    If someone is not placed... How does replacing their empty salary with median of the salary make sense? Shouldn't be replaced with Zero??

  • @meenakaria8445
    @meenakaria8445 หลายเดือนก่อน

    I see following warning "FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
    The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy" --> Can you please show me modified code snippet that works without inplace method?

  • @amitbudhiraja7498
    @amitbudhiraja7498 3 ปีที่แล้ว

    Hi sir,
    Sir if are given the regression problem and we have the missing values in the data . Then can we use the regression models like KNN regresor , Random Forest Regressor to find the missing values ?
    and then solve our actual problem
    Is this the right approach?

  • @Tony-vo9ok
    @Tony-vo9ok 2 หลายเดือนก่อน

    amazing

  • @AliSher-kv3pd
    @AliSher-kv3pd 3 ปีที่แล้ว +1

    Bro, I am not understanding that what arguments we have to choose when we call a method of a library and how we come to know that which argument we have to choose and what is its purpose,i have read the method documentation but I can't understand plz tell me ,i am continuously watching your course and now i am on 4th module Preprocessing topic

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +3

      hi! go through the documentation. we don't need to learn that entirely. when you are practicing particular function repeatedly in various projects, you will get a better understanding regarding that particular function. that's how we can get better.

  • @sachinvithubone4278
    @sachinvithubone4278 3 ปีที่แล้ว +1

    I have data around 20000 and missing values only 23 so I can drop those row? It's good or need to be fill nan value?

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +2

      you can drop them as you have a larger dataset.

  • @debokysaha3101
    @debokysaha3101 3 ปีที่แล้ว

    Well Explained!!

  • @mungamurisitaramnikhil7355
    @mungamurisitaramnikhil7355 4 หลายเดือนก่อน

    what to do when I have empty rows in date columns and almost 90 percent of the dates are empty,

  • @PAVANSIDEAS
    @PAVANSIDEAS 4 หลายเดือนก่อน

    Sir but you added a salary to the person who is not placed it lead to wrong data.

  • @ankushgupta5270
    @ankushgupta5270 3 ปีที่แล้ว

    Don't we use pandas profiling for EDA

  • @kaushikdwivedi1845
    @kaushikdwivedi1845 3 ปีที่แล้ว

    What about handling missing values in catogoercal type of columns

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      hi! you can refer the following article: medium.com/analytics-vidhya/ways-to-handle-categorical-column-missing-data-its-implementations-15dc4a56893

  • @sachinlakshitha7616
    @sachinlakshitha7616 ปีที่แล้ว

    why inplace is true... what does it mean

    • @NikhilKumar-ni2gz
      @NikhilKumar-ni2gz 4 หลายเดือนก่อน

      Because inplace=True, make the permanent change in the dataset, otherwise the missing value will be instantaneous.

  • @debokysaha3101
    @debokysaha3101 3 ปีที่แล้ว

    Is it a good idea to use imputer from Scikit-learn?

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      yes. you can use that as well.

    • @debokysaha3101
      @debokysaha3101 3 ปีที่แล้ว

      What about the outliers in the dataset?

  • @dpmharry
    @dpmharry 3 ปีที่แล้ว

    I tried to save the filling datas with median , mean and mode in new variables like this
    dataset_median = dataset['salary'].fillna(dataset['salary'].median(), inplace = True)
    after checking this below code it was throwing error
    dataset_median.isnull().sum()
    AttributeError: 'NoneType' object has no attribute 'isnull'

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      the error means that the data is not loading in the variable. first fill the missing data using the median as I explained in the video. then you can just load the salary column into a variable. try this method. this should work