Small correction viewers, I mentioned distribution of left and right skew graph in opposite manner. To avoid error while converting to log values add +1 to the column. I have updated the notebook in the github. Enjoy the rest of the video!!!
I have just started learning Machine Learning and I understood every bit you explained and done one project on my own similar to this .Really great explanation. I would like to know how to master Machine Learning. I am not student of CSE I am learning this on my own interest
Glad it was helpful!!! kudos to you learning with your own interest. Try to pick a mini project in some domain and solve it. That's a quick way to understand and learn...
Bro, it looks like at 17:08, u applied logit for coapplicant income, but u viz graph for applicant income, ... In the co applicant income, logit function is throwing a error as it contains zeroes.. Request to pls advice on this issue.
for i in ['LoanAmount','Loan_Amount_Term','Credit_History']: tr_data[i] = tr_data[i].fillna(tr_data[i].mean()) we can use this instead filling everything seperately
Very nice video. Best thing is your response to people's queries (unlike others). Great Job. I have 1 suggestion. If you could also cover how to deploy this model somewhere (with fresh data coming in and how model throws output). That would be amazing. Thanks.
i have seen that you respond to comments so i would just like to ask you, what changes do i have to make if my training and testing dataset are in different files already? for example in a kaggle project where the training and testing data are in different files, what changes in the code will i have to make?
For training, don't split the data, train with the whole data. After that preprocess the test data similar to train and try to predict it. You can also see the video for how to predict test data in the playlist
@@HackersRealm thanks so much man, respect your timely response. what i did is i skipped split part and simply preprocessed the test data as well and the used y_test = model.predict(x_test) for the prediction but for this case we can't check the score and all right? since i didn't see the loan_status column in the test data.
@@HackersRealm hey man I tried that but if we don't include dependents it gives and error while classifying. It is the same error as in the video ValueError: could not convert string to float:'3+'. I'm not understanding this
@@vedgadge8659 Oh yeah, i forgot that, it represents as string, that's y i used label encoder. but you can remove that + and convert that string to integer
Hi.. well explained. i have one question ...... why you did not drop "ApplicantIncome" even though you combined with "CoapplicantIncome" and created "Totalincome"...??
@@HackersRealm I need to know there will be n no of customers. These customers cibil how to extract to single excel file. Then based on past repayment we can decide the probability of default.
where can v get the main dataset the link isleading to only the train and testing dataset where can the get the first dataset tha u have entered in your video
while plotting countplot keep Value Error: getting could not convert to float " Any idea why . Data set was downloaded from your kaggle link. No changes ( although looks like the file names have now changed.)
This video is a great explanation of this project. I have just one doubt. From where I took the data set, Test data has a separate file of around 350 observations. How do I make use of that ?
Glad you liked this video!!! You can use the test data to predict the output and submit it, if there is a competition. For practice, there won't be much use to it.
What tech skills you learnt from the project • Why did you pick that domain? • Where can we use your tech skills / software’s learnt during project • Reason for working on that project Sir Please Help me for Interview preparation
hi, thanks for the vids but i want ask: why u did use LabelEncoder to the input values (['Gender',"Married","Education",'Self_Employed',"Property_Area","Loan_Status","Dependents"])? thx
It depends on every factor, not only algorithm, Check out other projects in the tutorial series, so you can get additional insights on increasing accuracy.
Hello Sir I followed your codes, arrival at section ' Exploratory Data'. I replaced the missing values ' df['Gender']=df['Gender'].fillna(df['Gender'].mode()[0]) the line of codes below sns.countplot(df['Gender']) the result ValueError: could not convert string to float: 'Male' could you please advise me, to correct the codes. Thank you
Can I segregate and train the model instead of using log function? Or else It's necessary to use Log function in this whole project. And 1 more confusion as I'm new so what is the agenda of this whole project? I know it sounds like silly but please explain me.
@@HackersRealm hmm so I used the same as previous then it's ok...another thing why feature scaling is not working here??? I'm getting error like this "TypeError: float() argument must be a string or a number, not 'StandardScaler'"
np.seterr(divide = 'ignore') train['CoapplicantIncomeLog'] = np.where(train['CoapplicantIncome']>0, np.log(train['CoapplicantIncome']), 0) this will solve your problem
@@PravinKumar-zc2eq Is it in dollars after log transformation? because before log transformation for example in 1st row applicant income was 5489 then it became 8.67. What if i want income like it was in original dataset? im guessing it was in rupees before log. kindly help if u know praveen.
No, it's in dollars all the time, I have done some data preprocessing on that, that's why the values are small after that. That will be helpful in getting good results
Hi, You are doing a good job....thanks for the video.... there is a mistake while plotting the distplot of 'CoapplicantIncome' Instead of 'CoaaplicantIncome' you have choosen 'ApplicantIncome'....
Small correction viewers, I mentioned distribution of left and right skew graph in opposite manner. To avoid error while converting to log values add +1 to the column. I have updated the notebook in the github. Enjoy the rest of the video!!!
I have just started learning Machine Learning and I understood every bit you explained and done one project on my own similar to this .Really great explanation. I would like to know how to master Machine Learning. I am not student of CSE I am learning this on my own interest
Glad it was helpful!!! kudos to you learning with your own interest. Try to pick a mini project in some domain and solve it. That's a quick way to understand and learn...
Super bro while started learning ML I found your channel and started my learning and progress doing the project thanks for your interest and effort
Hope the videos are useful to you!!! Thanks for watching and please share it for better reach. Thank you!!!
This is awesome man, just went through your blog, the amount of efforts put is amazing. Thanks for the project explanation 🙌
Thanks for your kind words!!!
Bro u explained much much better than edureka I swear bro thanks!
Thanks for your kind words!!!
Bro, it looks like at 17:08, u applied logit for coapplicant income, but u viz graph for applicant income, ... In the co applicant income, logit function is throwing a error as it contains zeroes.. Request to pls advice on this issue.
you can add +1 to the data column, it will resolve the issue
At which level does 1 needs to be added?
@@mitali3j you can add 1 when you see some 0 values, or you can use it generally, there won't be much change in log values
Thank you! Very insightful and thorough explanations.
Glad you liked it 😀
Great explanation of your model building. Thank you!
Glad you liked it!!!
Finally the final output is wt?
I mean loan eligible yes or no?
for the test data, we are predicting from the model and calculating the score of how well it's predicting
ValueError: could not convert string to float: 'Male' WHEN I AM USING THE COUNTPLOT IT KEEP SHOWING THIS
bro did you get solution, if yes please help me out
14:38 you are saying distribution is left skewed but its right skewed.
Sorry, I mispronounced the skewed data
I'm subscribed ur channel for this clear explanation 👍 it was so helpful
Thanks for your kind words!!!
Excellent video, found it very helpful!
Glad it was helpful!!!
in Explanatory data analysis section of video, how to use for loop for sns.countplot() ?
You can store it in a variable and use the subplot to show multiple shots
I m unable to apply correction matrix on categorical data before label encoding.
How did you do that ?
correlation matrix can be calculated with numbers only, not with strings.
for i in ['LoanAmount','Loan_Amount_Term','Credit_History']:
tr_data[i] = tr_data[i].fillna(tr_data[i].mean())
we can use this instead filling everything seperately
yes, we could do that!!!
Amazing explanation
Glad it was helpful!!!
U JUST EARNED THE SUB
Thanks man!!!
Very nice video. Best thing is your response to people's queries (unlike others). Great Job. I have 1 suggestion. If you could also cover how to deploy this model somewhere (with fresh data coming in and how model throws output). That would be amazing. Thanks.
Thank you very much. In this video, I have explained the process for deployment th-cam.com/video/2LqrfEzuIMk/w-d-xo.html
i have seen that you respond to comments so i would just like to ask you,
what changes do i have to make if my training and testing dataset are in different files already?
for example in a kaggle project where the training and testing data are in different files, what changes in the code will i have to make?
For training, don't split the data, train with the whole data. After that preprocess the test data similar to train and try to predict it. You can also see the video for how to predict test data in the playlist
@@HackersRealm thanks so much man, respect your timely response.
what i did is i skipped split part and simply preprocessed the test data as well and the used
y_test = model.predict(x_test)
for the prediction
but for this case we can't check the score and all right?
since i didn't see the loan_status column in the test data.
@@abhiavasthi624 yes, that's right, you only get the output results
At 40:36 dependents is already in numeric form why does it require label encoding?
yes, we don't need to include that
@@HackersRealm hey man I tried that but if we don't include dependents it gives and error while classifying. It is the same error as in the video
ValueError: could not convert string to float:'3+'. I'm not understanding this
@@vedgadge8659 Oh yeah, i forgot that, it represents as string, that's y i used label encoder. but you can remove that + and convert that string to integer
@@HackersRealm okay sure I'll try thanks man
Hi.. well explained. i have one question ...... why you did not drop "ApplicantIncome" even though you combined with "CoapplicantIncome" and created "Totalincome"...??
Hi Ashwin. Could you please upload videos on model deployment with flask using heroku?
Hello, deployment of models, I will cover in later videos for sure, now just covering the basic concepts for better understanding!!!
Thanks a lot 😊
Hi Sir, Logistic regression gave the best score, then why chose Random forest for hypertuning?
for example purpose only
Sir..we normalised data of income of applicants and coapplicant and where it is impacting on analysis
It will impact on the model training and testing... but those comparison is not covered in the video
Very helpful
Glad it was helpful to you!!!
Please, come up with more projects
working on it
why are you using log transformation? you can normalise the data?
you can use any preprocessing approach. It's no issue, try to test & see how it works
except for logistic regression, all other models accuracy and cross-validation is changing if I run it more than once. Can u explain y?
you can set random state inorder to get same results for rerunning
Can u explain the credit history in data mentioned 0 and 1. Can u post video or tutorial link how cibil data are analysed to get credit history values
If the person has credit history, it's 1 or else its 0. I will try analysing cibil data if possible
@@HackersRealm I need to know there will be n no of customers. These customers cibil how to extract to single excel file. Then based on past repayment we can decide the probability of default.
Have you also covered hmeq dataset for loan default prediction
No not yet!!!
where can v get the main dataset the link isleading to only the train and testing dataset where can the get the first dataset tha u have entered in your video
that is the train data. you can use that
while plotting countplot keep Value Error: getting could not convert to float " Any idea why . Data set was downloaded from your kaggle link. No changes ( although looks like the file names have now changed.)
try to check the values you're plotting, that may be the issue.
Hi Sir, When I am plotting for Gender, why my x axis not giving the labels, as Male and Female. Instead it is displaying 0 and 1
If you have done some transformation on that column, it will show like that
@@HackersRealm Thanku, got it....
This video is a great explanation of this project. I have just one doubt. From where I took the data set, Test data has a separate file of around 350 observations. How do I make use of that ?
Glad you liked this video!!! You can use the test data to predict the output and submit it, if there is a competition. For practice, there won't be much use to it.
@@HackersRealm how to do it??
Is this project can be done for final year project is this good topic to do
yeah many people have done this as final year project
@@HackersRealm tq u
Like this itself we can present ryt
@@snehacookie4138 yes
@@HackersRealm bro is this project good for jobs when u put in resume is this good for getting selected in a company pls say bro
@@snehacookie4138 Well that completely depends on the recruiter, but students said they used for resume
What tech skills you learnt from the project
• Why did you pick that domain?
• Where can we use your tech skills / software’s learnt during project
• Reason for working on that project
Sir Please Help me for Interview preparation
How to increase accuracy?
using different models, hyperparameter tuning, etc., watch other projects of mine to learn more techniques
hi, thanks for the vids but i want ask: why u did use LabelEncoder to the input values (['Gender',"Married","Education",'Self_Employed',"Property_Area","Loan_Status","Dependents"])? thx
we have to convert string to numeric values so model can accept the input. label encoder is one of the technique
@@HackersRealm how to convert male in gender column to float
@@afserali450 In video, I used label encoder or one hot encoder to do that.. You can use whichever method that is feasible
bro how can i get accuracy more than 80.42
which algorithm should i use
It depends on every factor, not only algorithm, Check out other projects in the tutorial series, so you can get additional insights on increasing accuracy.
Sor I did not get the conclusion of this project, After the heat map , How can we tell the loan is approved or not?
the model training and results, section you're asking?
Hello Sir
I followed your codes, arrival at section ' Exploratory Data'. I replaced the missing values ' df['Gender']=df['Gender'].fillna(df['Gender'].mode()[0])
the line of codes below
sns.countplot(df['Gender'])
the result
ValueError: could not convert string to float: 'Male'
could you please advise me, to correct the codes.
Thank you
try this, sns.countplot(x='Gender', data=df)... It's due to update in seaborn package.
Sir..will it possible to get the python code..of this and other videos
It's available in the github repo, link in the description
How you directly fill with mean in loan amount why not check outlier
To handle outlier, used log transformation
Another question...why feature scaling is not working here?
we can use feature scaling too. There are various preprocessing methods to use and get insights.
Can I segregate and train the model instead of using log function? Or else It's necessary to use Log function in this whole project. And 1 more confusion as I'm new so what is the agenda of this whole project? I know it sounds like silly but please explain me.
We are trying to predict whether a person can get loan or not from the bank. And log transformation is not compulsory, you can use other methods
@@HackersRealm hmm so I used the same as previous then it's ok...another thing why feature scaling is not working here???
I'm getting error like this
"TypeError: float() argument must be a string or a number, not 'StandardScaler'"
how to remove -inf total income coapplicantincome i was tried but not couldn't resolve it.pls help
If you are using log transformation, try like this - np.log(1+df['name']), it will solve the problem
np.seterr(divide = 'ignore')
train['CoapplicantIncomeLog'] = np.where(train['CoapplicantIncome']>0, np.log(train['CoapplicantIncome']), 0)
this will solve your problem
But after adding 1 then in the graph generated, I can see 2 bell curves....
What does that mean?
sir, i am not able to add new column getting error as
my code: data['total_income']=data['ApplicantIncome']+['CoapplicantIncome']
it's data['CoapplicantIncome'], please check the syntax
Can u tell how to train LogisticRegression model??🙏
i think i have explained how to train logistic regression also, could you please check the video again.
@@HackersRealm sorry I mean to say that how to tune the LogisticRegression model
ok, i didn't cover hyperparameter tuning, it will take a complete video for that. I will try to post the videos for that in future
sns.distplot is working but not showing the graph properly ..could u tell me what to do??
try specifying the x, y values properly
@@HackersRealm How to specify them ??...Tell me If u can
@@kumarsanjibray9415 seaborn.pydata.org/generated/seaborn.distplot.html try this documentation
I didn't understand where it's shown how many people are approved for loan and already
In the dataset itself, it is clearly mentioned, please use head function to see the labels
What is the goal pls tell
@@niklausmikealson3115 based on the attributes of the person, we need to find whether they are eligible for loan
so helpful
Glad you liked it!!!
@@HackersRealm thankyou sir for responding
I am getting error on preprocessing labelencoder
Typeerror:not supported between instances of str and float
@@VickyKumar-sg3jc I think in one column you have float and string values, Please check the type of data
Hello,
Can use this project as my mini project.??
yes, you can
How u say that imputed with mean??
which part you're referring?
but in your result doesnt shown any where who are eligble or not
If you check the y label, it will be there
How to check y label
From where we can download the dataset can you provide link or dataset in zip format
links are in the description
Excellent tutorial but you mispronounced left-skewed and right-skewed data. Appreciate your effort.
Yes, you are right. I will correct it next time. Thanks for watching the video
can some one tell me what is the currency of applicant income and the other amount (currency) in this data set
It's in dollars
@@PravinKumar-zc2eq thank you !!
@@PravinKumar-zc2eq Is it in dollars after log transformation? because before log transformation for example in 1st row applicant income was 5489 then it became 8.67. What if i want income like it was in original dataset? im guessing it was in rupees before log. kindly help if u know praveen.
No, it's in dollars all the time, I have done some data preprocessing on that, that's why the values are small after that. That will be helpful in getting good results
is there any way of connecting with you via email etc?
you can reach me via linkedin or instagram, links are in the description
Is it possible if you can add subtitles
It may automatically generated by youtube
@@HackersRealm it is not avalible for some reason
hi...needed some help for loan prediction workshop...could you please help
please reach me via insta or linkedin
@@HackersRealm texted on instagram...please have a look
at line number 23 u havent done sns.distplot for coaplicant so u have done wrong ??
I have done for coapplicant income, check 16th minute of video. But mistakenly plotted applicant income, sry for that.
Hi,
You are doing a good job....thanks for the video....
there is a mistake while plotting the distplot of 'CoapplicantIncome'
Instead of 'CoaaplicantIncome' you have choosen 'ApplicantIncome'....
And one more thing, we cannot apply log function to 'CoapplicantIncome' since it contains zero value....
If you are using log transformation, try like this - np.log(1+df['name']), it will solve the problem
Yes, my mistake. Sorry for the error
Thank you , I found helpful same
You're welcome!!!
Sir Plzz provide the data set
Check in the github link
Thank uuuuu boss
Very helpful
Glad it was helpful!!!
outliers detection
There is a separate video in ml concepts playlist, You can check that out!!!
check cell no. 23