How to select the best model using cross validation in python

แชร์
ฝัง
  • เผยแพร่เมื่อ 9 ก.พ. 2025

ความคิดเห็น • 80

  • @srinivasanganesan9294
    @srinivasanganesan9294 4 ปีที่แล้ว +2

    Krish, A very simple explanation of how CV can be used in algorithm selection. Very well done.

  • @VVV-wx3ui
    @VVV-wx3ui 5 ปีที่แล้ว +1

    I have done a course and also did bit of predictions using ML, DL including ANN, CNN and LSTM. However, now i understand the libraries to use for difference cases. Thanks For coming up with such Videos. Please do more, i am your subscriber.

  • @pranavakailash8751
    @pranavakailash8751 3 ปีที่แล้ว +2

    That is Clean AF! Thanks for this video, Really appreciated

  • @drvren030
    @drvren030 3 ปีที่แล้ว

    got an exam in a couple of hours, and this video cleared up a LOT of things! thank you for going into the concept, and using that to explain what's going on in your code. kudos man, kudos

  • @vaibhavkhobragade9773
    @vaibhavkhobragade9773 3 ปีที่แล้ว

    This video helps me to clear all my doubts regarding cross-validation and data leakage.

  • @asadkhan-kk2ru
    @asadkhan-kk2ru ปีที่แล้ว

    Excellent

  • @Nikhil-jj7xf
    @Nikhil-jj7xf 5 ปีที่แล้ว +1

    Explained with simplicity .
    Thanks Krish..

  • @infidos
    @infidos 4 ปีที่แล้ว +1

    Awesome and clean, simple explanation.

  • @turksonmichael1236
    @turksonmichael1236 หลายเดือนก่อน

    What an explanation!!!
    Boss that

  • @tarabalam9962
    @tarabalam9962 ปีที่แล้ว

    Great teaching

  • @muskangupta5873
    @muskangupta5873 4 ปีที่แล้ว

    best video 🙌
    keep posting sir you are awesome

  • @sivakumarprasadchebiyyam9444
    @sivakumarprasadchebiyyam9444 ปีที่แล้ว

    Hi its a very good video. Could you plz let me know if cross validation is done on train data or total data?

  • @chinedumezeakacha1604
    @chinedumezeakacha1604 4 ปีที่แล้ว

    Very apt and straight to the point. Thanks for sharing

  • @SBitachiyt0
    @SBitachiyt0 5 ปีที่แล้ว +3

    Can you increase font size of the editor. Its very small and eye straining to read in mobiles.

  • @akashpoudel571
    @akashpoudel571 5 ปีที่แล้ว +1

    Thank you sir , for a lucid explaination...

  • @YavuzDurden
    @YavuzDurden 2 ปีที่แล้ว

    Sir, how can I achive this datasets which validated? Why we are applying cross validation if we cant select the high scored scores data? Thank you.

  • @TJ-wo1xt
    @TJ-wo1xt 2 ปีที่แล้ว

    gr8 explanation.

  • @salvador9431
    @salvador9431 5 ปีที่แล้ว +3

    Is it ok to use your train_x anda train_y data in your cross validation? Or is better to use your whole x and y variables?

    • @generationwolves
      @generationwolves 5 ปีที่แล้ว +1

      The X, and y variables. The whole point of using cross validation techniques is to try various combinations of train and test sets from your original dataset, and find out how effective your algorithm is for any of these combinations.

  • @VenkatDinesh02
    @VenkatDinesh02 3 ปีที่แล้ว

    Krish For logistical regression problem we should use Mode right ..Use used mean here..why

  • @divyanshuanand9990
    @divyanshuanand9990 5 ปีที่แล้ว +1

    Excellent.
    Thanks for the video.

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 ปีที่แล้ว +1

    I have a question. In cross validation we perform multiple experiment based on cv value. In K fold also we do the same thing.
    What is the difference between these two ?

  • @amankapri
    @amankapri 5 ปีที่แล้ว

    Very Good Explanation

  • @SararithMAO
    @SararithMAO 2 ปีที่แล้ว

    If i just want to apply K-fold Cross validation, so i don't need to do train test split, right?

  • @Hellow_._
    @Hellow_._ ปีที่แล้ว

    how to use other CV techniques in coding like stratified cv, time series cv and leave one out cv?

  • @swethanandyala
    @swethanandyala 2 ปีที่แล้ว

    Hi sir... When we are using cv=10...then it simply applies Kfold sampling...can we import stratified k fold and put cv=stratified k fold while wroking with a dataset which has class imbalance as the stratified sampling gives same ratio of clases in train and validation data

  • @asadkhan-kk2ru
    @asadkhan-kk2ru ปีที่แล้ว

    Very good

  • @malkitsingh2654
    @malkitsingh2654 4 ปีที่แล้ว

    Don of datascience Community

  • @borngenius4810
    @borngenius4810 4 ปีที่แล้ว

    Excelllent explanation. So if I am using cross_val_score instead of train_test_split, I don't need to find out and analyse metrices like Precision, Recall , F1 , ROC ? Just getting the accuracy.mean() is good enough? p.s I am new to DS so I hv probably mixed up a few things

  • @laxminarasimhaduggaraju2671
    @laxminarasimhaduggaraju2671 5 ปีที่แล้ว

    Iam following ur videos
    The way u explains is simply awesome N many more happy thanks for sharing the information N knowledge about D.S

  • @denischo2133
    @denischo2133 3 ปีที่แล้ว

    What to do if I wanna apply minmax or standardscale to fit train and transform only test set in cross Val score? The rule of thumb is to apply these technique on train and test separately so how I can perform this? Cross Val score doesn’t has a specific argument

  • @ravikshdikola6089
    @ravikshdikola6089 4 ปีที่แล้ว

    if train _scores and cross_validate scores difference is negative value so does that mean that model perform very well

  • @animemodeactivated6404
    @animemodeactivated6404 4 ปีที่แล้ว +1

    Hi Krish, After selecting the model, How to select the best chunk of data for training, as different splits of data will give different accuracy. Very helpful if you can post some video for the same.

    • @UsmanAhmedKhi
      @UsmanAhmedKhi 3 ปีที่แล้ว

      Hi Anime,
      I think after selecting best model with good average accuracy, we dont need to split further again i.e. now train on whole dataset and make/save a model. What say?

  • @anubabu4187
    @anubabu4187 3 ปีที่แล้ว

    Nice video...sir..how to find the cross validation of non standard parameter...example specificity..

  • @shaiksajid613.
    @shaiksajid613. 3 ปีที่แล้ว

    what is that accuracy is that train accuracy or test ?

  • @piyushaneja7168
    @piyushaneja7168 3 ปีที่แล้ว

    sir i m confused in this, we are selecting a part of dataset for testing nd rest for training ,in next iteration we are selecting a part for testing(that was already trained in previous iteration? if so then it wont give correct accuracy as model has already seen the data? or Am i missing some point..

    • @zulfiquarshaikh3461
      @zulfiquarshaikh3461 3 ปีที่แล้ว +2

      Bro in second iteration data that goes in training ans testing is randomly lifted..but its not the same in second iteration. And second has unseen data for testing

    • @piyushaneja7168
      @piyushaneja7168 3 ปีที่แล้ว

      @@zulfiquarshaikh3461 okay thnku bro..!!!

  • @SumitKumar-uq3dg
    @SumitKumar-uq3dg 5 ปีที่แล้ว

    In cross validaion we are running different models and taking mean of all the acuracies. So which model will be our final model!

  • @BiranchiNarayanNayak
    @BiranchiNarayanNayak 6 ปีที่แล้ว

    When to use the trian_test_split() and cross_val_score() on the dataset ? As I have seen most of the programs use train_test_split with 70%,30% or 60%,40% train,test data split and fit the model. So which is the best approach ?

    • @janvonschreibe3447
      @janvonschreibe3447 6 ปีที่แล้ว

      There is not really a neat rule. A rule of thumb is to take the same ratio as your test/train set ratio

  • @Raja-tt4ll
    @Raja-tt4ll 5 ปีที่แล้ว

    very nice video. Thank you.

  • @nagandranathvemishetti9247
    @nagandranathvemishetti9247 3 ปีที่แล้ว

    Sir will it work for multi-class problem

  • @Hiyori___
    @Hiyori___ 3 ปีที่แล้ว

    great tutorial

  • @louerleseigneur4532
    @louerleseigneur4532 3 ปีที่แล้ว

    Thanks Krish

  • @shahadiqbal176
    @shahadiqbal176 5 ปีที่แล้ว

    have u done it usind decision tree, random forest, naive bayes??

  • @akpovoghoigherighe964
    @akpovoghoigherighe964 6 ปีที่แล้ว

    This is very useful.

  • @MasterofPlay7
    @MasterofPlay7 4 ปีที่แล้ว

    can you output the model summary and the confusion matrix using cross_val_score?

  • @niteshsrivastava6504
    @niteshsrivastava6504 5 ปีที่แล้ว

    Does cross_val_score functio uses hyper params and startified folds?

  • @markmorillo2954
    @markmorillo2954 3 ปีที่แล้ว

    Great

  • @yogendrasaikiran4486
    @yogendrasaikiran4486 3 ปีที่แล้ว

    Iam unable to use that cross validation function in my system

  • @tiagosilvacorrea9004
    @tiagosilvacorrea9004 5 ปีที่แล้ว

    Very Good! Thanks

  • @mekalamadhankumar3224
    @mekalamadhankumar3224 3 ปีที่แล้ว

    It is difficult calculate which model suitable to use data because we can use all models to check the good accuracy . This is to lead to big problem in coding part .

  • @prashanthpandu2829
    @prashanthpandu2829 5 ปีที่แล้ว

    I am having a doubt that u have to use cross_val_score on train datset or on the whole dataset

    • @venkilfc
      @venkilfc 4 ปีที่แล้ว

      @@generationwolves If you use cross validation to tune your hyper parameters and improve your model, then you shouldn't apply cross validation on the entire dataset but only on the training data. Test data must be always independent. Otherwise it will result in data leakage. If you just want to have an overall look of the scores of the splits then you can apply on the whole dataset.

  • @kareemel-tantawy8355
    @kareemel-tantawy8355 5 ปีที่แล้ว

    k fold cross validation use to decide which model is the best for regression and classification only
    or I can use it to decide which model is the best for clustering

    • @chinedumezeakacha1604
      @chinedumezeakacha1604 4 ปีที่แล้ว

      Just for classification. Not used for clustering I think.

  • @sunnysavita9071
    @sunnysavita9071 5 ปีที่แล้ว +2

    sir you didn't define the test_size in train_test_split().

    • @rohithn2056
      @rohithn2056 4 ปีที่แล้ว

      if u don't define automatically train_test_split function takes 75:25 ratio

  • @akashpoudel571
    @akashpoudel571 5 ปีที่แล้ว

    Sir it's cross val comes first n then tunning the parameter always in general??

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว

      First hypertunning then cross validation

    • @akashpoudel571
      @akashpoudel571 5 ปีที่แล้ว

      @@krishnaik06 Ok sir..

    • @akashpoudel571
      @akashpoudel571 5 ปีที่แล้ว

      @@krishnaik06 Could u upload some more algorithms with their params meaning ...just the video on hyperparameters for logistic algm,regression ...If u have time sir

  • @arunkumaracharya9641
    @arunkumaracharya9641 4 ปีที่แล้ว

    You said...if CV = 10, then ten experiments are conducted but did not tell in what ratio train and test sample is split. Also there is different interpretation of random_state on internet. If random_state = 'none' then sample changes and if random_state = any integer then sample does not change irrespective of any integer you choose. But in your case sample did change if any integer was used. Please clarify

    • @maynorhernandez746
      @maynorhernandez746 ปีที่แล้ว +1

      In the case of CV= 10 the ratio is train=0.9, testing=0.1, thats because you split the "cake" in 10 pieces. Let´s see for instance CV= 5. The cake is split in 5 pieces so you have 4 pieces to training and 1 piece to test. So you will have train= 4/5= 0.8 and testing 1/5= 0.2.
      For a CV= 4. Training = 3/4= 0.75 and testing =1/4= 0.25.
      I hope this clarify

  • @auroshisray9140
    @auroshisray9140 3 ปีที่แล้ว

    Thank you sir

  • @rajusrkr5444
    @rajusrkr5444 5 ปีที่แล้ว

    xCELENT VIDEO

  • @ANILKUMAR-qd8lx
    @ANILKUMAR-qd8lx 6 ปีที่แล้ว

    Please can be explain feature selection in model

  • @markmorillo2954
    @markmorillo2954 3 ปีที่แล้ว

    Nice viddo

  • @nguyenluu3082
    @nguyenluu3082 3 ปีที่แล้ว

    Can you explain the effect of random_state to the accuracy?

    • @jackdaws7125
      @jackdaws7125 2 ปีที่แล้ว +1

      each random state is a different randomization of the train-tes split of the data. So the reason the accuracy is changing is that in each case the split was done differently and lead to different results, which is why its quite unreliable and CrossValidation helps us solve it

  • @Data_mata
    @Data_mata ปีที่แล้ว

    ❤❤

  • @AmitVerma-yg8pp
    @AmitVerma-yg8pp 4 ปีที่แล้ว

    I am a little confused with cv folds, and no.of values in X and Y dataset.

    • @KrishnaMishra-fl6pu
      @KrishnaMishra-fl6pu 3 ปีที่แล้ว

      If you take your k fold value as 5, then the CV will perform 5 exps
      Suppose there are 50 records and you took k fold value as 5
      Then size of the test data would be = 50/5 i.e 10
      Exp1==> test data = df[0:10,:]
      Exp2 ==> test data = df[10:20,:]
      Exp3 ==> test data = df[20:30,:]
      Exp4 ==> test data = df[30:40,:]
      Exp5 ==> test data = df[40:50,:]

  • @samueleboh8965
    @samueleboh8965 5 ปีที่แล้ว

    Thanks

  • @simplify1411
    @simplify1411 2 ปีที่แล้ว

    Sir what if the total observations are like 107 or 191 or any prime number...How to split using k fold cv?

  • @MasterofPlay7
    @MasterofPlay7 4 ปีที่แล้ว

    can you output the model summary and the confusion matrix using cross_val_score?

    • @chinedumezeakacha1604
      @chinedumezeakacha1604 4 ปีที่แล้ว

      No I would't think so. I think cross validation is a quick way of determining which ML algorithm is most suitable. When you use which ever one that returns a high CV score, you can then do the model summary and confusion matrix using the confusion matrix library.

    • @MasterofPlay7
      @MasterofPlay7 4 ปีที่แล้ว +1

      @@chinedumezeakacha1604 actually I tried it and you can do it