Six Projects in and I'm so grateful that you committed to educate us in so much detail and repetitiveness, this is probably gonna change my life for the better and I might never even get to meet you. So Thank you Sidd!🙏
Hi! I've 2 questions after watching this tutorial: How can I do labelling when I need more than 2 quality measures? How can I print the quality value of the output (ML generated) from my given input parameters?
I think you should first see whether the target variable is regression based or classification based. and choose the subset of algorithms from that. then we would have to test the accuracy and rootmeansquareerror after training it with each model and select the best out of it.
Another great video, congratulations. There are two charts that I like to plot: 1) this one show the distribution of each attribute (aka. column): for i, col in enumerate(wine_dataset.columns): plt.figure(i) sns.distplot(wine_dataset[col]) 2) and this one below, show the comparison for each possible pair of attributes by wine quality, it worth to plot, you can take some insights from it. plt.figure() sns.pairplot(wine, hue = 'quality') plt.show()
How do we know which machine learning model is better for which data set? You have shown Logistic Regression in one model, SVM in another, and Random Forest in this model.
Simply create a project where you have tested all the models you have in mind and compare the results. Choose the one with better output and optimize it ahead. Try this video:- th-cam.com/video/7uLzGRlXXDw/w-d-xo.html
Your videos are very great.... however, you don't fine tune the model....I have watched your hyperparameter tuning but you don't show it in projects. I sincerely love your videos
Good explanation sir but your approach has some serius problems. 1. There are a lot of outliers 2. Accuracy is high but the other metrics are really bad. This is caused of the high imbalance of the dataset, in nearly all test data are classified as bad quality wine and this is why the accuracy is so high. Spliting the good and bad quality in the range of [3,5] and [6,8] would be a better approach for dealing with the imbalance problem. Treating the problem with regression modeling would be maybe a better sollution.
Sir this is a regression project . You changed the dependent values using lambda function into classification and then u applied randomfroestClassifier . How is this possible ? I did using regression got accuracy as 40 percent using random forest . I am not able to understand how have u got this much. Plus I applied r2 score as it was a regression model .
hi! I took a classification approach. it depends on our problem statement and the outcome that we want. and R2 score is not percentage value. you need to do some research on that. if you get the R2 value as 0.4 then it's actually a good model. it's not 40 percentage.
Hi sidhathan . Thank for the video . I want to ask u . I apply different model to data set i have and build predectivr for every entry it's saying bad quality only. Can you tell where I'm going wrong. Because of i standardize my data. That's why I'm getting like this
In train test split when you did print( x.shape x_train.shape x_test.shape).. It's showing only rows, if I am not wrong it's should show the rows and features columns
Hi Bro, Your videos are great and I really appreciate your effort. I have a question as follows. Do we need to standardize the data mandatorily, whenever there is a different range of values in independent variables? I am asking because we did the data standardization in project#2 but not in project#5 & 6. I personally feel that the data standardization using standardscaler will certainly help the model to improve the prediction accuracy. What do you think? Regards, Prakash
Hi! Standardization is an important process. We don't have to do it if our dataset contains several categorical columns. Standardization should not be performed on categorical columns. I may not have done standardization in few videos. It's purely because of the length of the video. And about ur doubt on whether it will improve ur model's performance, it definitely helps. It's not obvious in certain cases. But in case of certain datasets, you can get a better accuracy and performance when u standardize the data
@@Siddhardhan sure, I will check it.. mention in project title it's classification problem or clustering or regression so people can find easily on TH-cam..just suggession..😌
Hey! What about the count value of output variable y..? In the data analysis part you have shown graph of quality variable where most of the number are in between 4 & 6 and in label binarizaton you took mid values as 7 when means most of the quality variable data are converted to 0(zeros). There is a chance of imbalance dataset! Correct me if i am wrong 🙌
Simply create a project where you have tested all the models you have in mind and compare the results. Choose the one with better output and optimize it ahead. Try this video:- th-cam.com/video/7uLzGRlXXDw/w-d-xo.html
Hey great tutorial.And i have 2 question.First why we didn't standardize our data or should we ? Secondly, when we split out data sometimes we use a parameter 'statify' but here we didnt use it could you explain me why ? Thank you
I know this is supervising learning...but how can you choose it is random forest but not svm...I am in doubt while choosing the model...can you guide me
hi! watch videos in my machine learning course playlist: th-cam.com/play/PLfFghEzKVmjsNtIRwErklMAN8nJmebB0I.html you will be able to understand this project.
Vey nice video! I also tried SVM but it didnt seem to work proparly, it always predicted bad quality even though i did standarize the data after i reshaped it.
Models working depends on the nature of the dataset also.. you can search in google regarding the pros and cons of svm and other models. Those informations will help you choose better model. There is not any exact rule for this all the time.
Six Projects in and I'm so grateful that you committed to educate us in so much detail and repetitiveness, this is probably gonna change my life for the better and I might never even get to meet you. So Thank you Sidd!🙏
Wine appreciation through machine learning. Fantastic to know what makes a good quality wine. TQVM. Had great fun with this one!!
Glad you enjoyed it!😅 thanks 😇
32:55 this changes everything in the deta set , great information.
Thank you for helping me a lot about a ML project from scratch. I really appreciate you for your hard work. 🎉
Thank you for helping me a lot to learn about a real world application related ML model
Great explanation on Random forest brother..Thanks a lot..now I understood everything!
Thankyou so much, this was a precise and fluid explanation, helped me a lot.
Hi! I've 2 questions after watching this tutorial:
How can I do labelling when I need more than 2 quality measures?
How can I print the quality value of the output (ML generated) from my given input parameters?
dude u deserve a sub
thanks man
helped a lot
Glad I could help😇
I still confused about how to choose the correct algorithm for the dataset can you help me.
Even me
I think you should first see whether the target variable is regression based or classification based. and choose the subset of algorithms from that. then we would have to test the accuracy and rootmeansquareerror after training it with each model and select the best out of it.
Bro experiment kro sab
@26:05. correlation is not working in Jupyter Notebook. Do you have any solution regarding this.
Another great video, congratulations.
There are two charts that I like to plot:
1) this one show the distribution of each attribute (aka. column):
for i, col in enumerate(wine_dataset.columns):
plt.figure(i)
sns.distplot(wine_dataset[col])
2) and this one below, show the comparison for each possible pair of attributes by wine quality, it worth to plot, you can take some insights from it.
plt.figure()
sns.pairplot(wine, hue = 'quality')
plt.show()
Once again a perfect video. Hats off.
Thank you so much 😀
For the splitting of the data can we use the parameter stratify = y to equalize the target data ?
Yes
Thank you very much !your teaching is really good
Great video! Thank you so much!
thank you so much... it is very useful for new ideas & learning..
Very cool, but random state is always 42. Do not go against "The Hitchhiker's Guide to the Galaxy".
PD: Great job
haha!
thank you sir so much this video helped me alot . i cant define it in terms thank you so much sir .
great job ........keep it up ....and thanks a lot
Most welcome😇
Very good explanation..
Thanks 😇
How do we know which machine learning model is better for which data set? You have shown Logistic Regression in one model, SVM in another, and Random Forest in this model.
Simply create a project where you have tested all the models you have in mind and compare the results. Choose the one with better output and optimize it ahead.
Try this video:- th-cam.com/video/7uLzGRlXXDw/w-d-xo.html
Great video! Thank you so much!
Iam getting ' typeerror missing 1 required positional argument:'y''... while training model....can anyone explain??
Very good explanation
thanks 😇
Your videos are very great.... however, you don't fine tune the model....I have watched your hyperparameter tuning but you don't show it in projects. I sincerely love your videos
Good explanation sir but your approach has some serius problems.
1. There are a lot of outliers
2. Accuracy is high but the other metrics are really bad. This is caused of the high imbalance of the dataset, in nearly all test data are classified as bad quality wine and this is why the accuracy is so high.
Spliting the good and bad quality in the range of [3,5] and [6,8] would be a better approach for dealing with the imbalance problem. Treating the problem with regression modeling would be maybe a better sollution.
How can the outliers problem be solved sir?
Thanks for valuable information
Thank you so much sir
Sir this is a regression project . You changed the dependent values using lambda function into classification and then u applied randomfroestClassifier . How is this possible ? I did using regression got accuracy as 40 percent using random forest . I am not able to understand how have u got this much. Plus I applied r2 score as it was a regression model .
hi! I took a classification approach. it depends on our problem statement and the outcome that we want. and R2 score is not percentage value. you need to do some research on that. if you get the R2 value as 0.4 then it's actually a good model. it's not 40 percentage.
Thank you so much!
This is very useful I want this project report
thank you so much its exactly what I wanted.
Glad I could help!😇
Thank you for sharing
Siddardhan sir please help me differentiate the algos which are specifically made for classifiaction and regression respectivley
What is that green,red,violet representing.is it different bottles of wine
Very detailed, thanks
my pleasure 😇
Really helpful this project, thanks 😊
You're welcome 😊
Wt is the language used for front-end
Hi sidhathan . Thank for the video . I want to ask u . I apply different model to data set i have and build predectivr for every entry it's saying bad quality only. Can you tell where I'm going wrong. Because of i standardize my data. That's why I'm getting like this
I want to classify this in three types medium good and bad but I cannot figure it out. If you know what to do please let me know.
In wine quality prediction by taking the different value its not predicting. im getting error
Can you explain why not doing outliers reduced method in this dataset?
Veryyyy good explanation 👍👍👍
Thank you 😊
48:20 Build a Predictive System
Random Forest Algorithm is not showing . how to fix the error
You might even worked on outliers
In train test split when you did print( x.shape x_train.shape x_test.shape)..
It's showing only rows, if I am not wrong it's should show the rows and features columns
hi! here we are not printing x. we are printing only y. y contains only one column which represents the label. kindly check.
@@Siddhardhan okay..
in train test and split there is a error (not enough values to unpack (expected 5, got 4) PLZZ can somebody can help??
'it would be great help'
Thank you so much :)
You're welcome!😇
thnx teacher but why the accuracy of trainig data is 100%..?
Hi Bro,
Your videos are great and I really appreciate your effort. I have a question as follows.
Do we need to standardize the data mandatorily, whenever there is a different range of values in independent variables? I am asking because we did the data standardization in project#2 but not in project#5 & 6. I personally feel that the data standardization using standardscaler will certainly help the model to improve the prediction accuracy. What do you think?
Regards,
Prakash
Hi! Standardization is an important process. We don't have to do it if our dataset contains several categorical columns. Standardization should not be performed on categorical columns. I may not have done standardization in few videos. It's purely because of the length of the video. And about ur doubt on whether it will improve ur model's performance, it definitely helps. It's not obvious in certain cases. But in case of certain datasets, you can get a better accuracy and performance when u standardize the data
@@Siddhardhan Thanks for your reply.
This is the classification problem correct? And mostly we did study in classification problem I think.
yeah, we also have Projects in Regression & one clustering Project. there will be separate playlists on those. kindly check.
@@Siddhardhan sure, I will check it.. mention in project title it's classification problem or clustering or regression so people can find easily on TH-cam..just suggession..😌
In which use case or data set we can use
RandomForstRegressor
And
RandomTreeEmbedding ?
great. But at first, you should complete all ML algo. theory.
sure
What about this project report sir??
Hey can someone tell me why we did not standardize the data
Sir how can we input n values as input and reshape it?
Hey!
What about the count value of output variable y..?
In the data analysis part you have shown graph of quality variable where most of the number are in between 4 & 6 and in label binarizaton you took mid values as 7 when means most of the quality variable data are converted to 0(zeros). There is a chance of imbalance dataset!
Correct me if i am wrong 🙌
hi! it's upto our consideration. you can take the values from 6 as label 1 as well.
Okay thanks for the content.!
Very Good video
but still i am not able to understand how can we choose which model is for what problem?
hi! watch the videos in 7th module. (intuition behind models)
Sir how I get to know which model is suitable for a particular problem?
Simply create a project where you have tested all the models you have in mind and compare the results. Choose the one with better output and optimize it ahead.
Try this video:- th-cam.com/video/7uLzGRlXXDw/w-d-xo.html
Hey great tutorial.And i have 2 question.First why we didn't standardize our data or should we ? Secondly, when we split out data sometimes we use a parameter 'statify' but here we didnt use it could you explain me why ? Thank you
stratify is required to equally distribute the dataset so that train and test have almost same data so that we can train the model correctly
Can I get The PPT ;-;
by the way
Your Explanation was Awesome
I know this is supervising learning...but how can you choose it is random forest but not svm...I am in doubt while choosing the model...can you guide me
he chooes all model and then finds this model helpful not shown in video
helpful means high accuracy try to apply all model by yourself you will get your answer
Is this classification or regression ?
classification
Sir pl give me dataset link here
can we do the same in some IDE?
Yes, definitely.
Share resources where to learn ml for this project
hi! watch videos in my machine learning course playlist: th-cam.com/play/PLfFghEzKVmjsNtIRwErklMAN8nJmebB0I.html
you will be able to understand this project.
Can we create confusion matrix
hi! yes, you can create
Vey nice video!
I also tried SVM but it didnt seem to work proparly, it always predicted bad quality even though i did standarize the data after i reshaped it.
hi! try changing the model and do some optimizations... in my future videos, I'll cover topics on optimization
@@Siddhardhan Is it normal for the SVM not working right though?
Models working depends on the nature of the dataset also.. you can search in google regarding the pros and cons of svm and other models. Those informations will help you choose better model. There is not any exact rule for this all the time.
@@Siddhardhan it works if i label good quality for greater than 6. but i see what you mean. Keep up your great work! thank you very much!
Yes! You can definitely try
Is there any way to do this without binarization of the data?
hi! u can try multi class classification.
@@Siddhardhan thank you for the assist.
Everything thing is good but you didn't showed anything about skewness and outliers
hi! it's tough to explain all the things in a single video. I made separate videos on skewness and distributions in probability playlist.
hi
hi!
why are you in rush speak slowly
Reduce video speed slow