I watched a lots of tutorial on TH-cam, but nothing can compare to this. Simple and straight to point, great editing and teacher who was born with natural talent called teaching
U have this way of simplifying things. I personally have learnt a lot from all your videos and i am always eager and waiting for your new videos. I know you must be having a very busy schedule sir, but if you could upload videos more frequently, it would mean so much to all of us :)
Now I am back again with my next task, when i search in youtube i do like: k-means clustering Bharatendra :) to find the best learning stuff. Thanks again Dr. Bharatendra!
Thankyou.. Thankyou so much sir... You are bringing new ways to teach us and making it easy to grasp the concept...thankyou please make some end to end case studies approach to make us understand the whole pipeline... 🙏...thankyou for the videos..
Wow Sir what a great teaching. I have been waiting for you videos and you have come with a new topic and I am so happy to see in my notification that you have uploaded a video. Thank you so much for the great content and immense knowledge. KNN was a much needed topic from your side and you have done it. Thank you so much sir :)
Thank you sir for all your valuable lectures I hope that in the near future, you will discuss compositional analysis (CoDa) and explain how to perform Robust PCA and Clustering
Sir, I am working as a data analyst in a NBFC, I am building a predictive model of the probability of default of a customer. I have used logistics regression and random forest. Sir which algorithm will be the best to get optimum results. Another request to you sir of you make tutorial vedio for all this ml algorithm in python script. I and the people like myself will be glad and benefited.Thank you so much Sir for sharing of your valuable knowledge.
Have you tried xgb? Usually it is known for getting good results. Here is the link: th-cam.com/video/woVTNwRrFHE/w-d-xo.html In addition, I'll also do some python videos in next few months.
The video will become too long. I've this playlist that has 10 must know machine learning methods: th-cam.com/play/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O.html
Sir if cross validation is carried out while using training data, than what is the necessity to divide the data into train and test dataset? I mean what if we use entire dataset for cross validation ...will that suffice? And there was one request..if you can make a video on ensembling using caret package, for different classifiers!
It still better to use test data. During cross validation at some point or the other, all training data points are used. Test data will be something not seen by the model. Also thanks for the suggestion!
After finding the optimal k value, (ie)k=33 in above example. If we want to use that k value and find accuracy onceagain, where we must include that found k value? I mean, applying library (class) -> getting labels and put the found k value. Does this work?
Hello mr Rai! I would like to ask you a question? Do we need to standarize in LDA as well if the independent variables have totally different magnitudes???( I am asking because the discriminant functions are already scaled as you mentioned in the LDA video)
Let's say your data is called "data" and the variables you want to standardize is "V1". Then the code will be: (data$V1 - min(data$V1)) / ((max(data$V1) - min(data$V1))
Sir i have a problem, here we face for the given problem Sir please help me. Error in model.frame.default(form = data.f$Consanguinity ~ ., data = training, : variable lengths differ (found for 'Age')
Dear Dr.Bharatendra, thank you a lot for your videos, I'm new to ML and I developed the following workflow using a for loop: 1-Split data into 80/20 2-Train on the 80 3-Test on the 20 4-Get confusion matrix-->get metrics like accuracy 5-repeat the same thing n times 6-average the accuracy from the confusion matrix you got n times. is this correct? when I look at your tutorial or the caret package, to evaluate a model they take the accuracy from the training phase before any testing, and then they do the testing/prediction once! is my method correct? because I feel I'm testing n times, and thus my ML have seen all my data in the process since in every iteration, he get the 80 then test on the 20 and repeat. Should I use the accuracy I got from the confusion matrix to compare between algorithm or not? Thank you a lot again!
Awesome video Sir! Thanks for sharing. I used to do knn analysis by using functions from FNN library; however I noticed you are using caret library. How could I identify the best library for this ML method and others? Also, What would you recommend for implementing knn method for product recommendation in an online store? , Should I use R? Thank you and best regards from Mexico
Hi, how can i get the confusion matrix for the model with the highest accuracy (testing range of values of k), if I perform LOOCV using whole data instead of splitting into test and training set
Hello sir. Thanks for your video so much. I learn much R method. And I am confused that except for KNN, do other machine learning methods need standardization for the data? if so, what methods need?
Sir your tutorial is outstanding comparing to the others.IBut I face a problem here.When I prepare the knn model then an error occurred and that is fit
Simply superb explanation and eagerly waiting for the videos on ML AND AI. If u don't mind can u help like how to predict more of positive cases in the above classification prob?? In my job im facing the issue of classification prob . Even I have tried with class imbalance with help of u r video . Used rose and undersamples methods also. Still getting the worse results. Pls help us to tune for the high sensitivity in classification problems
small doubt..while working with xgboost can i need to correct class imbalance? or can i directly work with my original data (my data is having 0.98938295 0.01061705 ), pls advise me
@@bkrai I was able to get K=29, but the issue I am having is that all of the values for the confusion matrix are 0 and statistics are all NaN (since it is categorical data). Do you know of a way to get the accuracy in this case? Also, I have another question. If I have my training data as 80% and testing data as 20% how can I do prediction without getting a length error? Sorry for all of the questions, I just want to understand how to do this correctly
Dear sir, Thanks for your great content. Can you help, how can we get the first 5 nearest neighbors of a row (data point) based on the euclidean distance by KNN in R?
Thanks for sharing this video. I'm a new learner. I'm trying to replicate the same code for practice not sure why I'm getting Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument file
Hello sir, thanks again for this algorithm video , at the step of variable importance I am getting this error "Error in auc3_(actual, predicted, ranks) : Not compatible with requested type: [type=character; target=double]." What I should do to solve this!! Thanks again
Sir, 2 questions..chas is factor variable in the original data. Does it require to convert to numeric for knn? Secondly there is high correlation between independent variables (indus,nox,tax,dis) How to handle in the model?
Hello Mr. Bharatendra. I am a student in my thesis using K-Nearest Neighbor on Rstudio, I want to ask you, how do you calculate variable importance manually, sir? And how do I display the error level with a graph from the results of cross validation, sir?
Sir, i am not able to understand trainControl and train functions in R, though i have gone through the documentation. Is there any video explaining these functionalities?
hi traincontrol is from a package names caret. It is a faster method to train models with additional paramenters like repetitions, scaling (all done simultaneously instead of doing it in different steps) topepo.github.io/caret/ this is the GitHub of the caret package developer max kuhn. Hope this helps
Hi, I have question about the K-Nearest Neighbor method with regression, can we get the estimates coefficient of the predictors variable as the regular regression. Also, if you don't mind would you provide us with a new topic that explain how to use Support vector with regression?.
Sorry here's the code which I'm using . I'm trying to replicate the same code for practice not sure why I'm getting Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument file
I watched a lots of tutorial on TH-cam, but nothing can compare to this.
Simple and straight to point, great editing and teacher who was born with natural talent called teaching
Thanks for comments!
J
outstanding explanation !! ... In my opinion, there's not even 1 second of waste in this class. Thank you Prof ! Greetings from Houston !!!
You are welcome!
You are a great teacher. The explanations for sub equations are very very appreciated.
Thanks for comments!
U have this way of simplifying things. I personally have learnt a lot from all your videos and i am always eager and waiting for your new videos. I know you must be having a very busy schedule sir, but if you could upload videos more frequently, it would mean so much to all of us :)
Thanks for your comments and feedback!
Dear Dr. Bharatendra, this is amazing stuff again. Thank you so much for sharing your great knowledge!
Most welcome!
this channel is very underrated thanks bro
Thanks!
Now I am back again with my next task, when i search in youtube i do like: k-means clustering Bharatendra :) to find the best learning stuff. Thanks again Dr. Bharatendra!
You are very welcome!
easy....descriptive ..... all in 1 package ... waiting to hear more from you SIR
Thanks for your comments!
No words about the explanation. Perfect
Thank you very much
You are most welcome!
this channel is better than any of the lecture in my uni. I would pay to learn from you if there was any chance.
Thanks for your feedback and comments!
as usual,awesome communication skills at work here...Keep up!
Thanks for comments!
You are explaining in a nice manner
Thanks for comments!
Thankyou.. Thankyou so much sir... You are bringing new ways to teach us and making it easy to grasp the concept...thankyou please make some end to end case studies approach to make us understand the whole pipeline... 🙏...thankyou for the videos..
Thanks for feedback and suggestion! I'm adding end case studies to my list.
Thank very much from Brazil. Your videos are helping me a lot!
Thanks for comments!
Big thanks! From Kazakhstan;)
Thanks for comments!
Wow Sir what a great teaching. I have been waiting for you videos and you have come with a new topic and I am so happy to see in my notification that you have uploaded a video. Thank you so much for the great content and immense knowledge. KNN was a much needed topic from your side and you have done it. Thank you so much sir :)
Thanks for your feedback and comments!
Millions of thanks, it is a perfect training. Going to all the details. You saved my life :D
Thanks for comments!
Important points are covered sir....Thank you
Thanks for feedback!
Sir, can you make one video on upsampling, downsampling, both and Smote with example
Thank you sir for all your valuable lectures
I hope that in the near future, you will discuss compositional analysis (CoDa) and explain how to perform Robust PCA and Clustering
Sir, I am working as a data analyst in a NBFC, I am building a predictive model of the probability of default of a customer. I have used logistics regression and random forest. Sir which algorithm will be the best to get optimum results. Another request to you sir of you make tutorial vedio for all this ml algorithm in python script. I and the people like myself will be glad and benefited.Thank you so much Sir for sharing of your valuable knowledge.
Have you tried xgb? Usually it is known for getting good results. Here is the link:
th-cam.com/video/woVTNwRrFHE/w-d-xo.html
In addition, I'll also do some python videos in next few months.
PERFECT !!!!!! amazing teacher !!
Thank you! 😃
Thank you very much Dr. Rai
You're most welcome!
Hello Sir, when we can expect your new videos? Thank you
This month I'll work on few more.
That would be great Sir. Eagerly waiting for next video. @@bkrai
Thanks!
Thanks a lot for all your video.
Thanks for feedback!
Thank you sir very informative for finding k value.
Why does k=1 in knn give the best accuracy?
It depends on data.
Would request pls made a seperate video on model training.Which comprises all the methods
The video will become too long. I've this playlist that has 10 must know machine learning methods:
th-cam.com/play/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O.html
Sir if cross validation is carried out while using training data, than what is the necessity to divide the data into train and test dataset? I mean what if we use entire dataset for cross validation ...will that suffice? And there was one request..if you can make a video on ensembling using caret package, for different classifiers!
It still better to use test data. During cross validation at some point or the other, all training data points are used. Test data will be something not seen by the model. Also thanks for the suggestion!
@@bkrai Thank you sir. Your videos help me lot in my research work.
You are very welcome!
After finding the optimal k value, (ie)k=33 in above example. If we want to use that k value and find accuracy onceagain, where we must include that found k value? I mean, applying library (class) -> getting labels and put the found k value. Does this work?
The model will automatically use optimal k values for predictions.
Thank you sir for the lesson. I have one doubt, how to decide the value of k for our model?
This algorithm gives you the best value of k.
Hello mr Rai! I would like to ask you a question? Do we need to standarize in LDA as well if the independent variables have totally different magnitudes???( I am asking because the discriminant functions are already scaled as you mentioned in the LDA video)
If in doubt, always do it. Standardization has negative impact.
Thanks for sharing! If we wanted to use the other method of normalizing the data, what would be that R code?
Let's say your data is called "data" and the variables you want to standardize is "V1". Then the code will be:
(data$V1 - min(data$V1)) / ((max(data$V1) - min(data$V1))
thank you so much! I tried that but the fit() doesn't work. I had replace the preProcess with normalize.@@bkrai
Keep up the good work.
Thx for comments!
Sir i have a problem, here we face for the given problem Sir please help me.
Error in model.frame.default(form = data.f$Consanguinity ~ ., data = training, :
variable lengths differ (found for 'Age')
check your data and make sure they have same length.
Dear Dr.Bharatendra, thank you a lot for your videos, I'm new to ML and I developed the following workflow using a for loop:
1-Split data into 80/20
2-Train on the 80
3-Test on the 20
4-Get confusion matrix-->get metrics like accuracy
5-repeat the same thing n times
6-average the accuracy from the confusion matrix you got n times.
is this correct? when I look at your tutorial or the caret package, to evaluate a model they take the accuracy from the training phase before any testing, and then they do the testing/prediction once! is my method correct? because I feel I'm testing n times, and thus my ML have seen all my data in the process since in every iteration, he get the 80 then test on the 20 and repeat.
Should I use the accuracy I got from the confusion matrix to compare between algorithm or not?
Thank you a lot again!
Refer to this playlist for detailed coverage:
th-cam.com/video/s23CMIjfwHk/w-d-xo.html
If categorical variables are in independent variables,don't we have to create dummy variables, or the package will deal with it ?
It's always better to convert categorical independent variables to dummy variables.
its fine informative, helpful but how we get the table for practice???
Here is the link that is now also added below the video.
Data file: goo.gl/D2Asm7
Awesome video Sir! Thanks for sharing.
I used to do knn analysis by using functions from FNN library; however I noticed you are using caret library. How could I identify the best library for this ML method and others?
Also, What would you recommend for implementing knn method for product recommendation in an online store? , Should I use R?
Thank you and best regards from Mexico
Seeing this today. Caret is more versatile. For product recommendation, you can certainly use R.
Sir can you please make one video on TOPSIS and compromise programming method
Thanks, I've added it to my list.
Thanks a lot sir . It is very helpful
Thanks for comments!
Hi,
how can i get the confusion matrix for the model with the highest accuracy (testing range of values of k), if I perform LOOCV using whole data instead of splitting into test and training set
Good job! Thanks
Thanks for comments!
Hello sir. Thanks for your video so much. I learn much R method. And I am confused that except for KNN, do other machine learning methods need standardization for the data? if so, what methods need?
If in doubt, it is better to do it. Because if you do not do it when needed, then threre could be problem.
Thank you very much sir!
Thanks for comments!
Sir your tutorial is outstanding comparing to the others.IBut I face a problem here.When I prepare the knn model then an error occurred and that is
fit
Check your data and see why variable 'admit' was not found. Probably you are using different data that doesn't have 'admit' variable.
nice explanation
thanks for comments!
when i have non-numeric variables in my dataset i can use K-Nearest Neighbors?
You need to find a way to covert them into numbers first.
@@bkrai Thanks
Simply superb explanation and eagerly waiting for the videos on ML AND AI. If u don't mind can u help like how to predict more of positive cases in the above classification prob?? In my job im facing the issue of classification prob . Even I have tried with class imbalance with help of u r video . Used rose and undersamples methods also. Still getting the worse results. Pls help us to tune for the high sensitivity in classification problems
Have you tried xgb? Usually it is known for getting good results. Here is the link:
th-cam.com/video/woVTNwRrFHE/w-d-xo.html
@@bkrai thanks u so much for the reply sir... I will try today
small doubt..while working with xgboost can i need to correct class imbalance? or can i directly work with my original data (my data is having 0.98938295 0.01061705 ), pls advise me
And for the same at 9:22 it is being used as set.seed(222)
It ensures repeatability of results. With a different number your result can be different compared to what I got.
Why you did not scale the data before running the KNN?
Check at 10:30 point. That's where it was addressed.
hello sir can you explanin this syntax set.seed(1234) what does this 1234 describes in the bracket
It ensures repeatability of results. With a different number your result can be different compared to what I got.
I am trying to use KNN for my survey data, but almost all of the variables are factors/categorical. Can I still use the knn method?
yes that should work fine.
@@bkrai I was able to get K=29, but the issue I am having is that all of the values for the confusion matrix are 0 and statistics are all NaN (since it is categorical data). Do you know of a way to get the accuracy in this case?
Also, I have another question. If I have my training data as 80% and testing data as 20% how can I do prediction without getting a length error?
Sorry for all of the questions, I just want to understand how to do this correctly
Why when you want to do confusion matrix error data` and `reference` should be factors with the same levels.
We need to compare apples to apples.
Dear sir, Thanks for your great content. Can you help, how can we get the first 5 nearest neighbors of a row (data point) based on the euclidean distance by KNN in R?
Sir, Boston housing dataset is missing can add the link please
The data used here is admission data. Link is available in the description area.
Thanks for sharing this video. I'm a new learner. I'm trying to replicate the same code for practice not sure why I'm getting Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument file
can we use the model for more than two classes?
like red, blue, purple and black
Should work fine.
Hello sir, thanks again for this algorithm video , at the step of variable importance I am getting this error
"Error in auc3_(actual, predicted, ranks) :
Not compatible with requested type: [type=character; target=double]."
What I should do to solve this!!
Thanks again
Dr. Rai why p=o.75 for caret?
I tried ??caret in R studio, where the package stated p=0.75, and I am little confused with this. Thanks in advance. Amninder
At what point in the video are you referring to?
Sir, 2 questions..chas is factor variable in the original data. Does it require to convert to numeric for knn? Secondly there is high correlation between independent variables (indus,nox,tax,dis) How to handle in the model?
k-NN involves calculating distances between datapoints, we must use numeric.
Hello Mr. Bharatendra. I am a student in my thesis using K-Nearest Neighbor on Rstudio, I want to ask you, how do you calculate variable importance manually, sir? And how do I display the error level with a graph from the results of cross validation, sir?
Sir, i am not able to understand trainControl and train functions in R, though i have gone through the documentation. Is there any video explaining these functionalities?
hi traincontrol is from a package names caret. It is a faster method to train models with additional paramenters like repetitions, scaling (all done simultaneously instead of doing it in different steps) topepo.github.io/caret/ this is the GitHub of the caret package developer max kuhn. Hope this helps
Hi, I have question about the K-Nearest Neighbor method with regression, can we get the estimates coefficient of the predictors variable as the regular regression. Also, if you don't mind would you provide us with a new topic that explain how to use Support vector with regression?.
hi, I really like your videos. can you please do a video on stacking?
Thanks for comments and suggestion! I've added it to my list.
Thanks !!
Welcome!
How can I run some specific k values like k= 1, 15, 40, 80, 120 in knn regression tuneGrid?
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
y_pred = knn.predict(X)
print(metrics.accuracy_score(y, y_pred))
(For 1)
FIRST OF ALL THANKS FOR THE VIDEOS, SIR WHAT WE NEED TO DO FOR THREE LEVELS CLASS
Three levels should work fine.
@@bkrai thank you
yes i tried for 3 class got the results
is it okay with efficiency 70.6%?
I am getting an error varImp(fit) - as error auc3 _
y do we need to use variable as factor?
Probably this link can help answer:
th-cam.com/video/ftjNuPkPQB4/w-d-xo.html
@@bkrai thank you
Do you have any video which contain 200 variables and 1000s of observations
not yet.
Sorry here's the code which I'm using . I'm trying to replicate the same code for practice not sure why I'm getting Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument file
fit
Thanks for the update!
In R you can ignore warnings. They are not errors.
Thank you
how to implement from scratch KNN?
Implementation depends on business background. Where are you planning to implement it?
17:59 Model Performance
Thx
hello
can i connect with you ?
17:23 Boston housing Pricing Data Partition
Thx
Your content is VERY GOOD but sound quality is bad. Also pls do NOT experiment with visual effects in such videos.
Thanks for suggestions!
11:02 Model Performance
Thx
OMG this is so helpful
Thanks for comments!
"K nearest neighbour method" 8:31
Thx
Thx
"Data Partition" 7:28
Thx
"Classification" 5:54
Thx