Project 6. Wine Quality Prediction using Machine Learning with Python | Machine Learning Project

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ธ.ค. 2024

ความคิดเห็น • 123

  • @thabor7402
    @thabor7402 5 หลายเดือนก่อน +3

    Six Projects in and I'm so grateful that you committed to educate us in so much detail and repetitiveness, this is probably gonna change my life for the better and I might never even get to meet you. So Thank you Sidd!🙏

  • @digigoliath
    @digigoliath 3 ปีที่แล้ว +10

    Wine appreciation through machine learning. Fantastic to know what makes a good quality wine. TQVM. Had great fun with this one!!

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +2

      Glad you enjoyed it!😅 thanks 😇

  • @AkashSharma-my5hv
    @AkashSharma-my5hv 2 ปีที่แล้ว +10

    32:55 this changes everything in the deta set , great information.

  • @AR-gw2vo
    @AR-gw2vo 10 หลายเดือนก่อน +2

    Thank you for helping me a lot about a ML project from scratch. I really appreciate you for your hard work. 🎉

  • @thirniyaprabaharan5835
    @thirniyaprabaharan5835 6 หลายเดือนก่อน +1

    Thank you for helping me a lot to learn about a real world application related ML model

  • @abundanceontheway2511
    @abundanceontheway2511 2 หลายเดือนก่อน

    Great explanation on Random forest brother..Thanks a lot..now I understood everything!

  • @GhoshAnujit
    @GhoshAnujit ปีที่แล้ว +2

    Thankyou so much, this was a precise and fluid explanation, helped me a lot.

  • @cypher5873
    @cypher5873 2 ปีที่แล้ว +8

    Hi! I've 2 questions after watching this tutorial:
    How can I do labelling when I need more than 2 quality measures?
    How can I print the quality value of the output (ML generated) from my given input parameters?

  • @hellothere6983
    @hellothere6983 3 ปีที่แล้ว +2

    dude u deserve a sub
    thanks man
    helped a lot

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      Glad I could help😇

  • @yashwanth3033
    @yashwanth3033 ปีที่แล้ว +7

    I still confused about how to choose the correct algorithm for the dataset can you help me.

    • @RojaShankar1303
      @RojaShankar1303 9 หลายเดือนก่อน

      Even me

    • @kakashi-yu7ok
      @kakashi-yu7ok 7 หลายเดือนก่อน

      I think you should first see whether the target variable is regression based or classification based. and choose the subset of algorithms from that. then we would have to test the accuracy and rootmeansquareerror after training it with each model and select the best out of it.

    • @sagarchaudhary97610
      @sagarchaudhary97610 3 หลายเดือนก่อน

      Bro experiment kro sab

  • @kanishkagour6356
    @kanishkagour6356 2 ปีที่แล้ว +1

    @26:05. correlation is not working in Jupyter Notebook. Do you have any solution regarding this.

  • @renanboaventura9105
    @renanboaventura9105 3 ปีที่แล้ว +4

    Another great video, congratulations.
    There are two charts that I like to plot:
    1) this one show the distribution of each attribute (aka. column):
    for i, col in enumerate(wine_dataset.columns):
    plt.figure(i)
    sns.distplot(wine_dataset[col])
    2) and this one below, show the comparison for each possible pair of attributes by wine quality, it worth to plot, you can take some insights from it.
    plt.figure()
    sns.pairplot(wine, hue = 'quality')
    plt.show()

  • @MuhammadKamran-ii4rh
    @MuhammadKamran-ii4rh 3 ปีที่แล้ว

    Once again a perfect video. Hats off.

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      Thank you so much 😀

  • @karishmasewraj6437
    @karishmasewraj6437 2 ปีที่แล้ว +5

    For the splitting of the data can we use the parameter stratify = y to equalize the target data ?

  • @sanmatipol3201
    @sanmatipol3201 11 หลายเดือนก่อน

    Thank you very much !your teaching is really good

  • @AddisBelayneh
    @AddisBelayneh 7 หลายเดือนก่อน

    Great video! Thank you so much!

  • @nidhi2212
    @nidhi2212 7 หลายเดือนก่อน

    thank you so much... it is very useful for new ideas & learning..

  • @alejandrosierra3743
    @alejandrosierra3743 3 ปีที่แล้ว +2

    Very cool, but random state is always 42. Do not go against "The Hitchhiker's Guide to the Galaxy".
    PD: Great job

  • @ShortQuikies
    @ShortQuikies 10 หลายเดือนก่อน

    thank you sir so much this video helped me alot . i cant define it in terms thank you so much sir .

  • @suruchikumari2360
    @suruchikumari2360 3 ปีที่แล้ว

    great job ........keep it up ....and thanks a lot

  • @premalathas623
    @premalathas623 3 ปีที่แล้ว +1

    Very good explanation..

  • @LoneWolf-rj1px
    @LoneWolf-rj1px 2 ปีที่แล้ว +1

    How do we know which machine learning model is better for which data set? You have shown Logistic Regression in one model, SVM in another, and Random Forest in this model.

    • @rohanshah8129
      @rohanshah8129 2 ปีที่แล้ว

      Simply create a project where you have tested all the models you have in mind and compare the results. Choose the one with better output and optimize it ahead.
      Try this video:- th-cam.com/video/7uLzGRlXXDw/w-d-xo.html

  • @meghnarawat3820
    @meghnarawat3820 2 ปีที่แล้ว

    Great video! Thank you so much!

  • @bharatm3195
    @bharatm3195 2 ปีที่แล้ว +1

    Iam getting ' typeerror missing 1 required positional argument:'y''... while training model....can anyone explain??

  • @koushikguptabonthala2429
    @koushikguptabonthala2429 3 ปีที่แล้ว

    Very good explanation

  • @oyeleyeolalekan4486
    @oyeleyeolalekan4486 ปีที่แล้ว

    Your videos are very great.... however, you don't fine tune the model....I have watched your hyperparameter tuning but you don't show it in projects. I sincerely love your videos

  • @leftmpl
    @leftmpl 3 ปีที่แล้ว +9

    Good explanation sir but your approach has some serius problems.
    1. There are a lot of outliers
    2. Accuracy is high but the other metrics are really bad. This is caused of the high imbalance of the dataset, in nearly all test data are classified as bad quality wine and this is why the accuracy is so high.
    Spliting the good and bad quality in the range of [3,5] and [6,8] would be a better approach for dealing with the imbalance problem. Treating the problem with regression modeling would be maybe a better sollution.

    • @afeezlawal5167
      @afeezlawal5167 2 ปีที่แล้ว

      How can the outliers problem be solved sir?

    • @JeevanEG
      @JeevanEG 11 หลายเดือนก่อน

      Thanks for valuable information

  • @nikitasinha8181
    @nikitasinha8181 2 ปีที่แล้ว

    Thank you so much sir

  • @devanshujain4650
    @devanshujain4650 3 ปีที่แล้ว +1

    Sir this is a regression project . You changed the dependent values using lambda function into classification and then u applied randomfroestClassifier . How is this possible ? I did using regression got accuracy as 40 percent using random forest . I am not able to understand how have u got this much. Plus I applied r2 score as it was a regression model .

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      hi! I took a classification approach. it depends on our problem statement and the outcome that we want. and R2 score is not percentage value. you need to do some research on that. if you get the R2 value as 0.4 then it's actually a good model. it's not 40 percentage.

  • @growingfire
    @growingfire 7 หลายเดือนก่อน

    Thank you so much!

  • @ashoka8929
    @ashoka8929 2 ปีที่แล้ว

    This is very useful I want this project report

  • @Namangen
    @Namangen 3 ปีที่แล้ว

    thank you so much its exactly what I wanted.

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      Glad I could help!😇

  • @ToanvaKhoahocmaytinh
    @ToanvaKhoahocmaytinh ปีที่แล้ว

    Thank you for sharing

  • @techyreport7992
    @techyreport7992 3 ปีที่แล้ว

    Siddardhan sir please help me differentiate the algos which are specifically made for classifiaction and regression respectivley

  • @magical5051
    @magical5051 2 ปีที่แล้ว

    What is that green,red,violet representing.is it different bottles of wine

  • @joe_fu
    @joe_fu 3 ปีที่แล้ว

    Very detailed, thanks

  • @sachinvithubone4278
    @sachinvithubone4278 3 ปีที่แล้ว

    Really helpful this project, thanks 😊

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      You're welcome 😊

  • @myparadise6137
    @myparadise6137 7 หลายเดือนก่อน

    Wt is the language used for front-end

  • @tanvibamrotwar
    @tanvibamrotwar ปีที่แล้ว

    Hi sidhathan . Thank for the video . I want to ask u . I apply different model to data set i have and build predectivr for every entry it's saying bad quality only. Can you tell where I'm going wrong. Because of i standardize my data. That's why I'm getting like this

  • @MuhammadHamza-ki3ze
    @MuhammadHamza-ki3ze 2 ปีที่แล้ว

    I want to classify this in three types medium good and bad but I cannot figure it out. If you know what to do please let me know.

  • @shwetharaju6496
    @shwetharaju6496 2 ปีที่แล้ว

    In wine quality prediction by taking the different value its not predicting. im getting error

  • @srinukomarapuri7441
    @srinukomarapuri7441 2 ปีที่แล้ว

    Can you explain why not doing outliers reduced method in this dataset?

  • @ketanpatil4921
    @ketanpatil4921 3 ปีที่แล้ว

    Veryyyy good explanation 👍👍👍

  • @nadhiyakandaswami
    @nadhiyakandaswami 2 ปีที่แล้ว

    48:20 Build a Predictive System

  • @shwetharaju6496
    @shwetharaju6496 2 ปีที่แล้ว

    Random Forest Algorithm is not showing . how to fix the error

  • @sameerabanu3115
    @sameerabanu3115 ปีที่แล้ว

    You might even worked on outliers

  • @sachinvithubone4278
    @sachinvithubone4278 3 ปีที่แล้ว

    In train test split when you did print( x.shape x_train.shape x_test.shape)..
    It's showing only rows, if I am not wrong it's should show the rows and features columns

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      hi! here we are not printing x. we are printing only y. y contains only one column which represents the label. kindly check.

    • @sachinvithubone4278
      @sachinvithubone4278 3 ปีที่แล้ว

      @@Siddhardhan okay..

  • @d_62_sourabhvankudre41
    @d_62_sourabhvankudre41 ปีที่แล้ว

    in train test and split there is a error (not enough values to unpack (expected 5, got 4) PLZZ can somebody can help??
    'it would be great help'

  • @sayanaajayan9471
    @sayanaajayan9471 3 ปีที่แล้ว

    Thank you so much :)

  • @ahmedabid6799
    @ahmedabid6799 3 ปีที่แล้ว

    thnx teacher but why the accuracy of trainig data is 100%..?

  • @9941521791
    @9941521791 2 ปีที่แล้ว +1

    Hi Bro,
    Your videos are great and I really appreciate your effort. I have a question as follows.
    Do we need to standardize the data mandatorily, whenever there is a different range of values in independent variables? I am asking because we did the data standardization in project#2 but not in project#5 & 6. I personally feel that the data standardization using standardscaler will certainly help the model to improve the prediction accuracy. What do you think?
    Regards,
    Prakash

    • @Siddhardhan
      @Siddhardhan  2 ปีที่แล้ว

      Hi! Standardization is an important process. We don't have to do it if our dataset contains several categorical columns. Standardization should not be performed on categorical columns. I may not have done standardization in few videos. It's purely because of the length of the video. And about ur doubt on whether it will improve ur model's performance, it definitely helps. It's not obvious in certain cases. But in case of certain datasets, you can get a better accuracy and performance when u standardize the data

    • @9941521791
      @9941521791 2 ปีที่แล้ว

      @@Siddhardhan Thanks for your reply.

  • @sachinvithubone4278
    @sachinvithubone4278 3 ปีที่แล้ว +1

    This is the classification problem correct? And mostly we did study in classification problem I think.

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      yeah, we also have Projects in Regression & one clustering Project. there will be separate playlists on those. kindly check.

    • @sachinvithubone4278
      @sachinvithubone4278 3 ปีที่แล้ว

      @@Siddhardhan sure, I will check it.. mention in project title it's classification problem or clustering or regression so people can find easily on TH-cam..just suggession..😌

    • @sachinvithubone4278
      @sachinvithubone4278 3 ปีที่แล้ว

      In which use case or data set we can use
      RandomForstRegressor
      And
      RandomTreeEmbedding ?

  • @gauravfamily2209
    @gauravfamily2209 3 ปีที่แล้ว

    great. But at first, you should complete all ML algo. theory.

  • @ashoka8929
    @ashoka8929 2 ปีที่แล้ว

    What about this project report sir??

  • @vanshikarathi2356
    @vanshikarathi2356 2 ปีที่แล้ว

    Hey can someone tell me why we did not standardize the data

  • @rohitgaloth1547
    @rohitgaloth1547 3 ปีที่แล้ว

    Sir how can we input n values as input and reshape it?

  • @nithinkumbam5525
    @nithinkumbam5525 3 ปีที่แล้ว

    Hey!
    What about the count value of output variable y..?
    In the data analysis part you have shown graph of quality variable where most of the number are in between 4 & 6 and in label binarizaton you took mid values as 7 when means most of the quality variable data are converted to 0(zeros). There is a chance of imbalance dataset!
    Correct me if i am wrong 🙌

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +2

      hi! it's upto our consideration. you can take the values from 6 as label 1 as well.

    • @nithinkumbam5525
      @nithinkumbam5525 3 ปีที่แล้ว

      Okay thanks for the content.!

  • @ashwinizende7923
    @ashwinizende7923 3 ปีที่แล้ว

    Very Good video
    but still i am not able to understand how can we choose which model is for what problem?

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      hi! watch the videos in 7th module. (intuition behind models)

  • @roshankshirsagar8665
    @roshankshirsagar8665 2 ปีที่แล้ว

    Sir how I get to know which model is suitable for a particular problem?

    • @rohanshah8129
      @rohanshah8129 2 ปีที่แล้ว

      Simply create a project where you have tested all the models you have in mind and compare the results. Choose the one with better output and optimize it ahead.
      Try this video:- th-cam.com/video/7uLzGRlXXDw/w-d-xo.html

  • @sezermezgil9304
    @sezermezgil9304 3 ปีที่แล้ว

    Hey great tutorial.And i have 2 question.First why we didn't standardize our data or should we ? Secondly, when we split out data sometimes we use a parameter 'statify' but here we didnt use it could you explain me why ? Thank you

    • @techyreport7992
      @techyreport7992 3 ปีที่แล้ว

      stratify is required to equally distribute the dataset so that train and test have almost same data so that we can train the model correctly

  • @shyampraveen4203
    @shyampraveen4203 2 ปีที่แล้ว

    Can I get The PPT ;-;
    by the way
    Your Explanation was Awesome

  • @swastikmohanty7370
    @swastikmohanty7370 2 ปีที่แล้ว

    I know this is supervising learning...but how can you choose it is random forest but not svm...I am in doubt while choosing the model...can you guide me

    • @Doraemon67812
      @Doraemon67812 2 ปีที่แล้ว

      he chooes all model and then finds this model helpful not shown in video

    • @Doraemon67812
      @Doraemon67812 2 ปีที่แล้ว

      helpful means high accuracy try to apply all model by yourself you will get your answer

  • @siddharthsharma5162
    @siddharthsharma5162 3 ปีที่แล้ว

    Is this classification or regression ?

  • @riyashah2530
    @riyashah2530 2 ปีที่แล้ว

    Sir pl give me dataset link here

  • @bhagyashreenarwade355
    @bhagyashreenarwade355 3 ปีที่แล้ว

    can we do the same in some IDE?

  • @faizansaqeeb3390
    @faizansaqeeb3390 3 ปีที่แล้ว

    Share resources where to learn ml for this project

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      hi! watch videos in my machine learning course playlist: th-cam.com/play/PLfFghEzKVmjsNtIRwErklMAN8nJmebB0I.html
      you will be able to understand this project.

  • @koushikguptabonthala2429
    @koushikguptabonthala2429 3 ปีที่แล้ว

    Can we create confusion matrix

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      hi! yes, you can create

  • @dimitriskapsis6018
    @dimitriskapsis6018 3 ปีที่แล้ว

    Vey nice video!
    I also tried SVM but it didnt seem to work proparly, it always predicted bad quality even though i did standarize the data after i reshaped it.

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว +1

      hi! try changing the model and do some optimizations... in my future videos, I'll cover topics on optimization

    • @dimitriskapsis6018
      @dimitriskapsis6018 3 ปีที่แล้ว

      @@Siddhardhan Is it normal for the SVM not working right though?

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      Models working depends on the nature of the dataset also.. you can search in google regarding the pros and cons of svm and other models. Those informations will help you choose better model. There is not any exact rule for this all the time.

    • @dimitriskapsis6018
      @dimitriskapsis6018 3 ปีที่แล้ว

      @@Siddhardhan it works if i label good quality for greater than 6. but i see what you mean. Keep up your great work! thank you very much!

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      Yes! You can definitely try

  • @ballerr45
    @ballerr45 3 ปีที่แล้ว

    Is there any way to do this without binarization of the data?

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      hi! u can try multi class classification.

    • @ballerr45
      @ballerr45 3 ปีที่แล้ว

      @@Siddhardhan thank you for the assist.

  • @AnkitSingh-mr4og
    @AnkitSingh-mr4og 3 ปีที่แล้ว

    Everything thing is good but you didn't showed anything about skewness and outliers

    • @Siddhardhan
      @Siddhardhan  3 ปีที่แล้ว

      hi! it's tough to explain all the things in a single video. I made separate videos on skewness and distributions in probability playlist.

  • @samarsingh7594
    @samarsingh7594 3 ปีที่แล้ว

    hi

  • @parikshitshukla7355
    @parikshitshukla7355 ปีที่แล้ว

    why are you in rush speak slowly

    • @hades840
      @hades840 5 หลายเดือนก่อน

      Reduce video speed slow