Hyperparameter Tuning of Machine Learning Model in Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 ต.ค. 2024

ความคิดเห็น • 78

  • @AI_Boy99
    @AI_Boy99 10 หลายเดือนก่อน +1

    Wow, this was amazing. I'm working on machine learning models to diagnose early leackage of valves in piston diaphragm pumps. Thanks Chanin. Really love your videos.

  • @ajifyusuf7624
    @ajifyusuf7624 4 ปีที่แล้ว +5

    This video, I think, is one of the best for explanation tuning hyperparameter

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks for the kind words 😊

  • @aimenbaig6201
    @aimenbaig6201 3 ปีที่แล้ว +2

    i love your calm teaching style! it's relaxing

  • @MarsLanding91
    @MarsLanding91 3 ปีที่แล้ว +5

    Superb video. Very Insightful. Question - How are you picking the numbers for the parameters? max_features_range = np.arange(1,6,1) - why did you decide to start at 1 and end at 6? Why are you incrementing by 1 and not by 2, for example? Would love to hear your thoughts on this.

  • @WaliSayed
    @WaliSayed 2 หลายเดือนก่อน

    Very clear and details are explained in simple way. Thank you!

  • @Ghasforing2
    @Ghasforing2 4 ปีที่แล้ว +2

    This was a lucid and complete discussion on Hyperparameters tuning. Thanks for the sharing, Professor.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thank you for watching and glad it was helpful 😊

  • @aiuslocutius9758
    @aiuslocutius9758 2 ปีที่แล้ว +1

    Thank you for explaining this concept in an easy-to-understand manner.

  • @JBhavani777
    @JBhavani777 2 ปีที่แล้ว +1

    while its too late for watching but worth it sir, thank you so much for the Gem...keep teaching ; very elaborative explaination

  • @CatBlack01
    @CatBlack01 3 ปีที่แล้ว +2

    Clear explanation and presentation. Love the analogies and error fixing.

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Much appreciated! Glad to hear!

  • @jorge1869
    @jorge1869 4 ปีที่แล้ว +2

    Hello Dr!!!, I have read many of your works because alternatively I have a line of research related to the development of tools based on machine learning, mainly prediction of peptides with different activities. Currently, I use Python to develop and of course publish my papers, currently I am also learning R because I have noticed this language has good libraries to calculate molecular descriptors, for instance "Protr". I would appreciate a video tutorial explaining key steps such as data separation, training and cross-validation and testing with R using the "CARET" library, of course if possible. Greeting and success for this awesome youtube channel!

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thanks JF for the comment and for reading my research work. How did you discover this TH-cam channel? (so that I can use this information to better promote the channel) Yes, we also use protr package in R for some of our peptide/protein QSAR work. In that case, I might make a video about calculating the descriptors of peptide/protein or even compounds in future videos.
      In the meantime, please check out the following video "Machine Learning in R: Building a Classification Model" as well as 13 other R video tutorials explaining the machine learning model building process in a step-by-step manner.
      th-cam.com/video/dRqtLxZVRuw/w-d-xo.html

    • @jorge1869
      @jorge1869 4 ปีที่แล้ว +1

      Dr. thank you so much for your reply. I discovered your channel here on youtube looking for machine learning tutorials in R, when you mentioned your name in one of your videos where you do an excellent lecture on drug discovery, quickly I look up your profile on researchgate and that's how I realized it was you.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      JF Thanks for the insights, it is very helpful.

  • @donrachelteo9451
    @donrachelteo9451 3 ปีที่แล้ว +1

    Yes indeed this is one of the best explanation on hyperparameter tuning.
    Just needed clarification: how do we decide the range of values to run in grid search? Hope you can also help do one video on Manual Tuning vs Auto Grid Search Tuning. Thanks 👍

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Thanks for the suggestion! I'll put it on my to do list.

    • @donrachelteo9451
      @donrachelteo9451 3 ปีที่แล้ว +1

      @@DataProfessor thanks professor

  • @jgubash100
    @jgubash100 3 ปีที่แล้ว +1

    Liked the contour plots, I'll have to try those too.

  • @dearcharlyn
    @dearcharlyn 3 ปีที่แล้ว +1

    Another amazing tutorial, well explained and comprehensible! Thank you data professor! I am currently working on COVID-19 predictor models. :)

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Thanks! Appreciate the kind words!

  • @GeraldTalton
    @GeraldTalton ปีที่แล้ว

    Great video, always helps to see the visualization

  • @madhawagunathilake8304
    @madhawagunathilake8304 ปีที่แล้ว

    Thank you Prof. for your very insightful and helpful lecture!!

  • @infinitygeospatial1972
    @infinitygeospatial1972 2 ปีที่แล้ว

    Great video. Very Explanatory. Thank you

  • @geoffreyanderson4719
    @geoffreyanderson4719 2 ปีที่แล้ว +1

    Thank you for making good content and that is what attracted me to the channel, Data Professor. I say the following only with constructive purpose. There is no signal to find in a random dataset like that sampled by make_classification. Is this correct? Thus the RF is fitting itself to noise only. It's using completely spurious assocations. You would prefer to avoid fitting to noise components in real life as much as possible. Fitting to noise is pure variance error.

  • @muskanmishra6625
    @muskanmishra6625 ปีที่แล้ว

    very well explained thank you so much🙂

  • @amiralx88
    @amiralx88 3 ปีที่แล้ว

    Really nice and clean code I've learned a lot from your video how to optimize mine. Thanks

  • @joseluisbeltramone599
    @joseluisbeltramone599 2 ปีที่แล้ว +1

    Fantastic explanation, Sir (as always). Thank you very much!

  • @gabrielcornejo2206
    @gabrielcornejo2206 2 ปีที่แล้ว

    Great tutorial, thank you very much. I have a question. How I could know which are best 3 features to used to built de best model with 140 n_estimators???

  • @sofluzik
    @sofluzik 4 ปีที่แล้ว +1

    lovely . how relevant is confusion and classification report , and AUC score , ROC with score mentioned above.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Hi Rajaram, this article does a good job in providing a detailed distinction of the various metrics for classification neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc

    • @sofluzik
      @sofluzik 4 ปีที่แล้ว

      @@DataProfessor thank you sir

  • @nibrad9712
    @nibrad9712 3 หลายเดือนก่อน

    Why did you choose the max feature as 5 while the n estimator to be 200? More specifically, how do I choose these params?

  • @cahayasatu9201
    @cahayasatu9201 3 ปีที่แล้ว +1

    Thank for a great tutorial. May I know how to see/identify what are the 2 features that produces the best accuracy?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Hi, if using random forest, the feature importance plot will allow us to see which features contributed the most to the prediction. The shap library also adds this capability to any ML algorithm.

  • @budoorsalem1168
    @budoorsalem1168 3 ปีที่แล้ว +1

    Thank you for your great video , have you done in hyperparameters tuning for different algorithm like decision tree, Ann, GBR?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      The first step is to figure out which hyperparameters you want to optimize. You can do that by going to the API Documentations and look for the algorithm function that you want to use and see which hyperparameters there are and adapt accordingly as shown in this video. For example, in Random Forest, the 2 hyperparameters that we choose for optimization is max_features and n_estimators. For example, for ANN, you may choose to optimize the learning rate, momentum and number of nodes in the hidden layer, etc.

    • @budoorsalem1168
      @budoorsalem1168 3 ปีที่แล้ว +1

      @@DataProfessor thank you so much, this is really helped me

  • @dreamphoenix
    @dreamphoenix 2 ปีที่แล้ว +1

    Thank you

  • @eyimofepinnick
    @eyimofepinnick 3 ปีที่แล้ว

    Nice tutorial, so now that I've done all this, hoe can i apply the model, like now use what we've done to predict the X_test data or predict the data if we create an API

  • @DM-py7pj
    @DM-py7pj 2 ปีที่แล้ว

    Is it not important to know which features when GridSearch tells you the optimal number of features? And what then when, over different runs, you get different n_features?

  • @isaacvergara6792
    @isaacvergara6792 3 ปีที่แล้ว

    Awesome video!

  • @geoffreyanderson4719
    @geoffreyanderson4719 2 ปีที่แล้ว

    A thought experiment: If the generating process continued a lot longer and made far more than 200 examples, what would this do to the tuned final model's predictions? I am talking about the model that was developed on the 200 examples. That is, what happens when it is tried on that new data? Keep in mind that sklearn's make_classification() by design produces noise only, no signal.

  • @張稚辰
    @張稚辰 3 ปีที่แล้ว +1

    Awesome video thanks

  • @bryanchambers1964
    @bryanchambers1964 2 ปีที่แล้ว

    I have a very large dataset. 356 columns, I reduced it to 75 using PCA and retained 99.8% variance. I did a clustering model and it works outstanding, I identified 3 clusters out of 8 in which potential customers belong. But my machine learning model is garbage. ROC-AUC score of barely greater than 0.5. I am surprised because if the cluster model works very well than shouldn't the machine model work well? I was wondering if you had any suggestions?

    • @DanielRong795
      @DanielRong795 2 ปีที่แล้ว

      may I ask what's ROC-AUC?

  • @SyedZion
    @SyedZion 3 ปีที่แล้ว

    Can you please explain the same concept with RandomizedSearch?

  • @limzijian98
    @limzijian98 2 ปีที่แล้ว

    Hi, just wanted to ask , how do you determine the number of n_estimates for a record size of 2mill ?

  • @sudhakarsingha283
    @sudhakarsingha283 3 ปีที่แล้ว +1

    This is a video with detail discussion on hyperparameter tuning.

  • @budoorsalem8378
    @budoorsalem8378 3 ปีที่แล้ว +1

    thank you so much Professor for this good information, it helped a lot, I wondering if we can do the hyper tuning parameter in random forest regression for continuous data

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Hi, by continuous data are you referring to the Y variable? If so, then the answer would be yes.

    • @budoorsalem1168
      @budoorsalem1168 3 ปีที่แล้ว +1

      @@DataProfessor yes the target dependent variable is not categorical.. it is different numbers

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      @@budoorsalem1168 Hyperparameter tuning can be performed for both categorical and numerical Y variables (classification and regression, respectively).

    • @budoorsalem1168
      @budoorsalem1168 3 ปีที่แล้ว

      @@DataProfessor ok thank you so much

  • @hejarshahabi114
    @hejarshahabi114 3 ปีที่แล้ว

    thanks for your video. I also have a question regarding max features that you mentioned "11:48". by max features what do you mean? do you mean the maximum independent elements like x1,x2,...xn.

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Thanks for watching! Yes, if max_features == all features . The max_features is a parameter that scikit learn uses to determine how many features to use in performing the node split. More details provided here scikit-learn.org/stable/modules/ensemble.html#random-forest-parameters

    • @hejarshahabi114
      @hejarshahabi114 3 ปีที่แล้ว +1

      @@DataProfessor thank you very much for your quick response, please keep making videos on such topics, you are doing great and I've learnt many things from your channel. BIG LIKE

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      @@hejarshahabi114 Thanks, and greatly appreciate the support 😊

  • @dennislam1501
    @dennislam1501 ปีที่แล้ว

    what is minimum sample size for decent tuning? 10000? 1000? 100000? data rows i mean

  • @josiel.delgadillo
    @josiel.delgadillo 2 ปีที่แล้ว

    How do you use gridsearchcv with a custom estimator? I can’t seem to make it work.

  • @AbhishekSingh-vl1dp
    @AbhishekSingh-vl1dp ปีที่แล้ว

    How we will decide how much to split data into train set and into the test set ??

  • @kailee3491
    @kailee3491 2 ปีที่แล้ว

    where can i find the environmental requirements?

  • @levithanprimal2410
    @levithanprimal2410 3 ปีที่แล้ว +3

    How am I watching this for free? Thanks Professor!

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Glad it was helpful and yes we have free data science contents here, would appreciate if you share with a friend or two 😆

  • @guoqiang7215
    @guoqiang7215 4 ปีที่แล้ว +1

    I am working on spam mail data set and now try to make hyperparameter tuning to the model

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks for sharing, sounds like an interesting project.

  • @MinhHua-zu2pl
    @MinhHua-zu2pl 5 หลายเดือนก่อน

    please make screen font bigger thank you

  • @franklintello9702
    @franklintello9702 2 ปีที่แล้ว

    I am still trying to find one with real data, because all this automatic generated are hard to apply sometimes.

  • @shivamkrathghara3340
    @shivamkrathghara3340 3 ปีที่แล้ว +1

    why 81k ? it should be more than 810k
    Thankyu professor

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      Haha, thanks for the support!

  • @yingzisilver9085
    @yingzisilver9085 ปีที่แล้ว

    Thank you