Can you plz make a video on handling categorical missing values with lots of unique values . Also how to put different datasets together with common columns that are in the other datasets that needs to be joined together.
Hi, First of all the video was really helpful. But I noticed that in the head() function, the salary was NaN when the status was not placed. So I tried this: df[pd.isnull(df.salary)==True]['status'].unique(). The output was such that the salary was NaN only when the status was not placed.. I mean the unique value was just "Not Placed". So I guess it will be correct to fillna with 0 instead of median, mode or mean. I understand that you have made this video for just understanding purpose but just telling what I tried. Thanks for the video, very helpful!
Dataset was not correctly read. Salary column has direct dependency with status column. Student with status not placed salary should have been replaced with 0 not by any mean , median ,mode. As far my understanding only the outlier satisfying all the dependency should be replaced with any of other methods. Please correct me if I'm wrong on this?
I see following warning "FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy" --> Can you please show me modified code snippet that works without inplace method?
Siddhardhan, This is a fantastic video. I just want to find out how to go about a dataset with fluctuating densities? How do you go about it? Do you use mean, median, or mode? Thank you so much
Bro, I am not understanding that what arguments we have to choose when we call a method of a library and how we come to know that which argument we have to choose and what is its purpose,i have read the method documentation but I can't understand plz tell me ,i am continuously watching your course and now i am on 4th module Preprocessing topic
hi! go through the documentation. we don't need to learn that entirely. when you are practicing particular function repeatedly in various projects, you will get a better understanding regarding that particular function. that's how we can get better.
Hi sir, Sir if are given the regression problem and we have the missing values in the data . Then can we use the regression models like KNN regresor , Random Forest Regressor to find the missing values ? and then solve our actual problem Is this the right approach?
I tried to save the filling datas with median , mean and mode in new variables like this dataset_median = dataset['salary'].fillna(dataset['salary'].median(), inplace = True) after checking this below code it was throwing error dataset_median.isnull().sum() AttributeError: 'NoneType' object has no attribute 'isnull'
the error means that the data is not loading in the variable. first fill the missing data using the median as I explained in the video. then you can just load the salary column into a variable. try this method. this should work
I've watched all ur ML videos ... it was very helpful to me ... damn clr explanation ...keep going sir
thanks a lot 😇
Thanks. Very useful for a quick refresher of the techniques to deal with missing values.
Glad it was helpful!😇
Amazing explaination sir, all my doubts are clear
Nice explanation. Keep it up.👍
Well explained 👏🏻
super well designed course
Awesome thank you for all videos
You Sir are a very rare gem in these TH-cam streets!🤍💫👏🏾
Thanks a lot..Siddhardhan
My pleasure 😇
Can you plz make a video on handling categorical missing values with lots of unique values . Also how to put different datasets together with common columns that are in the other datasets that needs to be joined together.
Hi, First of all the video was really helpful. But I noticed that in the head() function, the salary was NaN when the status was not placed. So I tried this: df[pd.isnull(df.salary)==True]['status'].unique(). The output was such that the salary was NaN only when the status was not placed.. I mean the unique value was just "Not Placed". So I guess it will be correct to fillna with 0 instead of median, mode or mean. I understand that you have made this video for just understanding purpose but just telling what I tried. Thanks for the video, very helpful!
hi! I didn't notice it. thanks for letting me know. I'll try to update this.
@@Siddhardhan Oh Glad that it helped! Thanks again for the video
Bro, how do u know these concepts? Like we have to read full documentaion or it will come with experience??
@@pushkarjoshi1400 a bit of both. It's not much actually
@@criclal1787 yeah, but how did u explore concepts and libraries in the beginning?
Dataset was not correctly read.
Salary column has direct dependency with status column.
Student with status not placed salary should have been replaced with 0 not by any mean , median ,mode.
As far my understanding only the outlier satisfying all the dependency should be replaced with any of other methods.
Please correct me if I'm wrong on this?
Nice explanation but what to do for large dataset plz make a vedio on that
what to do when I have empty rows in date columns and almost 90 percent of the dates are empty,
If someone is not placed... How does replacing their empty salary with median of the salary make sense? Shouldn't be replaced with Zero??
Bro plz make videos on outlier detection nd removal techniques
I see following warning "FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy" --> Can you please show me modified code snippet that works without inplace method?
Siddhardhan, This is a fantastic video. I just want to find out how to go about a dataset with fluctuating densities? How do you go about it? Do you use mean, median, or mode? Thank you so much
amazing
Well Explained!!
thanks 😇
I have data around 20000 and missing values only 23 so I can drop those row? It's good or need to be fill nan value?
you can drop them as you have a larger dataset.
Bro, I am not understanding that what arguments we have to choose when we call a method of a library and how we come to know that which argument we have to choose and what is its purpose,i have read the method documentation but I can't understand plz tell me ,i am continuously watching your course and now i am on 4th module Preprocessing topic
hi! go through the documentation. we don't need to learn that entirely. when you are practicing particular function repeatedly in various projects, you will get a better understanding regarding that particular function. that's how we can get better.
Hi sir,
Sir if are given the regression problem and we have the missing values in the data . Then can we use the regression models like KNN regresor , Random Forest Regressor to find the missing values ?
and then solve our actual problem
Is this the right approach?
siddhardh ji can you please make a series of videos for explaining each machine learning algorithms
hi! we have separate module for it. kindly check the course curriculum.
@@Siddhardhan now only i saw your curriculum amazing please move forward waiting for that particular module....with full eagerness........😇😇😇
Don't we use pandas profiling for EDA
What about handling missing values in catogoercal type of columns
hi! you can refer the following article: medium.com/analytics-vidhya/ways-to-handle-categorical-column-missing-data-its-implementations-15dc4a56893
Sir but you added a salary to the person who is not placed it lead to wrong data.
why inplace is true... what does it mean
Because inplace=True, make the permanent change in the dataset, otherwise the missing value will be instantaneous.
Is it a good idea to use imputer from Scikit-learn?
yes. you can use that as well.
What about the outliers in the dataset?
I tried to save the filling datas with median , mean and mode in new variables like this
dataset_median = dataset['salary'].fillna(dataset['salary'].median(), inplace = True)
after checking this below code it was throwing error
dataset_median.isnull().sum()
AttributeError: 'NoneType' object has no attribute 'isnull'
the error means that the data is not loading in the variable. first fill the missing data using the median as I explained in the video. then you can just load the salary column into a variable. try this method. this should work