Selecting the best model in scikit-learn using cross-validation

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ย. 2024

ความคิดเห็น • 598

  • @dataschool
    @dataschool  3 ปีที่แล้ว +11

    Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos

  • @thevivekmathema
    @thevivekmathema 6 ปีที่แล้ว +34

    i almost gave up python,untill i met your channel. you are my savior

    • @dataschool
      @dataschool  6 ปีที่แล้ว +4

      Wow, thank you! Good luck with your Python education! :)

  • @giovannibruner8455
    @giovannibruner8455 8 ปีที่แล้ว +112

    This videos are so well done, so clear and easy to follow that it makes appear ML a trick for kids. Congratulations, great teaching.

    • @dataschool
      @dataschool  8 ปีที่แล้ว +13

      Thanks for your kind words!

    • @deneb6139
      @deneb6139 7 ปีที่แล้ว +11

      can't agree more! best video resource on cross validation on the internet.

  • @ngochua6679
    @ngochua6679 3 ปีที่แล้ว +4

    Kevin, I appreciate the slow but thorough walk through, you and StatQuests are awesome people. Thank you.

    • @dataschool
      @dataschool  3 ปีที่แล้ว +1

      Thank you so much!

  • @jiwachhetri4165
    @jiwachhetri4165 3 ปีที่แล้ว +1

    This is the best sklearn tutorial I have come across.

  • @apachaves
    @apachaves 7 ปีที่แล้ว +13

    Amazing video! Very instructive. And the presenter has a very clear voice and pace.

    • @dataschool
      @dataschool  7 ปีที่แล้ว +1

      Thank you so much! I'm glad you liked it!

  • @jamesdalley2394
    @jamesdalley2394 ปีที่แล้ว +1

    I watched a dozen videos on this topic. I was pretty certain I understood it, but I still had a few questions. You're video cleared those questions up amazingly! Thank you.

    • @dataschool
      @dataschool  ปีที่แล้ว

      That's awesome to hear! 🙌

  • @aawinecoff
    @aawinecoff 7 ปีที่แล้ว +1

    I really appreciate how easy this video series is to follow and that the notebooks are available so you can follow along. It would be excellent if the notebooks were updated to reflect that the cross_validation is now deprecated.

    • @dataschool
      @dataschool  7 ปีที่แล้ว +4

      Thanks for the suggestion! Right now I'm on paternity leave from Data School, but it's on my to-do list :)

    • @johnf9231
      @johnf9231 7 ปีที่แล้ว

      Congratulations!

    • @jsx0328
      @jsx0328 6 ปีที่แล้ว

      It's deprecated, but it still works for me in Jupyter Notebook... I literally did the exact same cross validation

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      I recently updated the code to use Python 3.6 and scikit-learn 0.19.1. The updated code can be found here: github.com/justmarkham/scikit-learn-videos

  • @djs749
    @djs749 3 ปีที่แล้ว

    Domain of liking something always was dominated on BY spontaneity BUT never it was without reasons. I liked all your videos with much enthusiasm and the simple reason is they are just BRILLIANT!

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      Thank you so much!

  • @davidtemael1307
    @davidtemael1307 6 ปีที่แล้ว

    I took coursera lesson twice but never got what was going on and you bro walked me through like I've never expected! thank you

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Awesome! You are very welcome!

  • @edbull4891
    @edbull4891 2 ปีที่แล้ว +1

    Always eager to learn. You demystified the subject and you even made it easy for a 75 year old brain to comprehend ML methods. :) :) :)

  • @hamkam33521
    @hamkam33521 2 ปีที่แล้ว

    I found a lot and lot of videos about ML and cross validation, I watched them all, I tried to follow but it was very hard understand.
    But you, you make it easier, I was very confused with this cross validation and now it's more than clear.
    Thank you very much for this video and for your channel

    • @dataschool
      @dataschool  2 ปีที่แล้ว

      You're very welcome! Glad it was helpful to you!

  • @datascienceds7965
    @datascienceds7965 6 ปีที่แล้ว +2

    Whenever I needed a references, I always end up with your videos after a long search. That prove you are THE best teacher.

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      What a nice thing to say! Thank you! :)

  • @BrothersFreedive
    @BrothersFreedive 9 ปีที่แล้ว +1

    Excellent series! This is the first time I've studied machine learning. You are doing an outstanding job of transforming it from a science fiction term into a tangible subject. I really appreciate these videos!

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      BrothersFreedive You're very welcome! I greatly appreciate your kind comments!

  • @akshaysingh1914
    @akshaysingh1914 5 ปีที่แล้ว +4

    Sir firstly I would like to you thanks a lot, because you spent so much time to make this video ....this is really helpful to initial phase learner's for ML ; keep doing sir , I stopped this video in mid to say thanks , you saved my lots of hour to understand cross validation.....

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      That's awesome to hear! Thanks so much for letting me know! 🙌

  • @tush16ar
    @tush16ar 7 ปีที่แล้ว

    being a beginner to machine learning , this video lecture series are of great help ,providing crystal clear understanding of the concepts presented along the course .
    Dear sir please keep up with the good work

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      That's great to hear! Good luck with your education!

  • @RHONSON100
    @RHONSON100 6 ปีที่แล้ว +1

    Just like Andrew Ng you are a genius....he gave a clear explanation in theory and you produced a mesmerising implementation techniques in such a simple way that is inexplicable..............Awesome machine learning video i have watched so far...you helped me a lot you could never imagine..thank you sir

    • @dataschool
      @dataschool  6 ปีที่แล้ว +1

      Thanks very much for your kind words!

    • @RHONSON100
      @RHONSON100 6 ปีที่แล้ว

      you are most welcome sir

  • @luismiguelcrespo9499
    @luismiguelcrespo9499 5 ปีที่แล้ว

    Had to listened to it in 1.5 speed but it is very clear and concise. Thanks.

  • @colmorourke4657
    @colmorourke4657 4 ปีที่แล้ว +2

    Outstanding work once again Kevin. A treasure to newcomers in the area.

  • @cozylifemodular1863
    @cozylifemodular1863 2 ปีที่แล้ว +1

    Just chiming in to thank you for the series, really helps demistify and fill in the gaps. Looking forward to working through

    • @dataschool
      @dataschool  2 ปีที่แล้ว

      You're very welcome!

  • @lucamarcello9696
    @lucamarcello9696 3 ปีที่แล้ว

    The best explanation of cross-validation on the internet. Thank you!

  • @manasa41087
    @manasa41087 8 ปีที่แล้ว

    I am so glad I found you. I am aspiring an data scientist and I find all your videos extremely useful and better than any documentation since you explain the intricate details very well. Thanks!!

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      Wow, thanks so much for your kind comments! Good luck in your journey to become a data scientist!

  • @miguelamaro4900
    @miguelamaro4900 4 ปีที่แล้ว +1

    i have watched several videos on this subject. this was the only one that has met my expectations

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Thanks for your kind words!

  • @mueez.mp4
    @mueez.mp4 6 ปีที่แล้ว +1

    HOW DOES THIS VIDEO NOT HAVE LIKE A MILLION VIEWS?!? So good. Thank you, man!

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      HA! Thank you :)

  • @danishbhatia5004
    @danishbhatia5004 7 ปีที่แล้ว

    Sir, I really appreciate your work. I strongly think that these video series is probably the best I have ever come so far. I truly praise the way you teach, this becomes utmost clear.
    Thank you very much for these videos and hope to see more of them.

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      I'm glad to hear my videos have been helpful to you!

  • @unsharma9229
    @unsharma9229 4 ปีที่แล้ว +1

    i usually don't subscribe any channel but you earned this subs from me...keep going lots of love

  • @richardpacholski2715
    @richardpacholski2715 9 ปีที่แล้ว

    Thank you Kevin, This is fantastic material. You made it easy for 60 year old brain to comprehend ML methods. Cross-validation material was excellent. I will try to run it on my own data sets. Looking forward to the next one.
    Regards
    Richard

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Richard Pacholski Awesome! Very glad to hear :) I'm looking forward to making the next one!

  • @XRobotexEditz
    @XRobotexEditz 7 ปีที่แล้ว +5

    The Best explanation I have seen ever.

    • @dataschool
      @dataschool  7 ปีที่แล้ว +2

      Wow, thank you so much!

  • @DTPwr
    @DTPwr 5 ปีที่แล้ว +1

    All about cross validation in one video , THAAAAAAAAAAAAAAAAAAAAAANK YOU

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      You're very welcome! :)

  • @hasheemb.danbatta4010
    @hasheemb.danbatta4010 6 ปีที่แล้ว +1

    I have no words to use in order to show you my deep appreciation.
    May God continue to uplift your knowledge.

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Thank you so much! Glad to hear I was helpful to you!

  • @nonyabeeswax
    @nonyabeeswax 6 ปีที่แล้ว +1

    Natural born teacher. Bravo! Thank you!

  • @marwanalbadawii
    @marwanalbadawii 9 หลายเดือนก่อน

    Your explanations are very straightforward. Thanks a lot.

    • @dataschool
      @dataschool  8 หลายเดือนก่อน

      Thanks!

  • @flamboyantperson5936
    @flamboyantperson5936 6 ปีที่แล้ว +4

    Great videos. Please keep the good work doing. We really need your lectures. Thank you so much.

    • @dataschool
      @dataschool  6 ปีที่แล้ว +1

      Thanks for your kind words! I will definitely release more videos! :)

    • @flamboyantperson5936
      @flamboyantperson5936 6 ปีที่แล้ว +2

      Waiting eagerly for new series because I have completed watching all your videos. Thank you so much for teaching me Python. You have made me educated you are a teacher for me and I respect you. Thank you so much.

    • @dataschool
      @dataschool  6 ปีที่แล้ว +1

      Awesome! Thank you for watching and learning! :)

  • @atwinemugume
    @atwinemugume 5 ปีที่แล้ว

    I love the simplicity in the videos. Thank you. I have learned some things that were confusing before. Especially cross validation

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Awesome! That's great to hear.

  • @riderblack6401
    @riderblack6401 7 ปีที่แล้ว

    I will always be your audience. Your teaching saves me. lol

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Glad to hear! :)

  • @dawittekie3796
    @dawittekie3796 7 ปีที่แล้ว +1

    I am so happy to follow such kind of lecture because your teaching way is attractive and your language clarity is very excellent so I get knowledge an input for my ML thesis because am doing on classification (prediction) problem

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Glad to hear that my videos are helpful to you! Good luck with your thesis!

  • @saisreenivas8875
    @saisreenivas8875 5 ปีที่แล้ว +1

    You are awesome...you teach everything in a simple way....ask for the feedback....and make them much better.....And the best thing is you make everything (I REPEAT EVERYTHING) easy for us....So sweet of you :)

    • @dataschool
      @dataschool  5 ปีที่แล้ว +1

      That is so kind of you to say! Thank you so much 😄

  • @wlancer8826
    @wlancer8826 5 ปีที่แล้ว

    You're soooooo good at explaining confusing concepts!!! I'm always wondering about the negative sign in loss function until now!!! Thank you!!!

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      You're very welcome! You might also want to read this post for an update: www.dataschool.io/how-to-update-your-scikit-learn-code-for-2018/

  • @mightyhlungwane2639
    @mightyhlungwane2639 2 ปีที่แล้ว +1

    Kevin, you are very good in explaining. I wish I found you earlier. I just subscribed to receive all future videos and thank you for all the explanations.

    • @dataschool
      @dataschool  2 ปีที่แล้ว

      Thanks for your kind words! 🙏

  • @Bena_Gold
    @Bena_Gold 5 ปีที่แล้ว

    This is the best explanation so far ... far better than my professor ... thumbs up ...

    • @dataschool
      @dataschool  5 ปีที่แล้ว +1

      Great to hear! :)

  • @jsbros.
    @jsbros. 7 ปีที่แล้ว

    It was confusing at first but you made it so clear I wish you will upload many other videos on ML. If you will upload a application on ML with python will be great for every one. Thank you. Love your way of teaching.

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Thanks for your kind words! Here is my series on machine learning with Python: th-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html

  • @michaelmotlh
    @michaelmotlh 4 ปีที่แล้ว

    All your videos W.I.N - you’re the best

  • @raidtape123
    @raidtape123 7 ปีที่แล้ว +10

    I really really appreciate your efforts..this series is so helpful in learning.

    • @dataschool
      @dataschool  7 ปีที่แล้ว +2

      I'm glad to hear the series is helpful to you! :)

  • @aditidubey2826
    @aditidubey2826 5 ปีที่แล้ว +1

    amazingly explained. Sufficiently Slow for a fresher in Machine learning.. Easily understandable. Keep it up.

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Awesome, thank you! :)

  • @anilreddyk5
    @anilreddyk5 5 ปีที่แล้ว +1

    Thanks for the Video. This is the best Video on KNN Cross validation that I have watched. Appreciate your effort...

  • @nikhilpandey2364
    @nikhilpandey2364 7 ปีที่แล้ว

    Thanks. The notebook helped me a lot. I hope more topics get coverage like this one had. Please do a video on PCA.

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Thanks for your suggestion! I'll consider it for the future.

  • @Lala-qh3wl
    @Lala-qh3wl ปีที่แล้ว +1

    Great ! Thank you very much for sharing such a clear explanation 🙌

    • @dataschool
      @dataschool  ปีที่แล้ว

      You're very welcome!

  • @sevicore
    @sevicore 2 ปีที่แล้ว

    Good video. Maybe is for my level of english but i dont understand why at 24:00 we compare the accuracy of KNN with Linear Regression when KNN is used for classification and Linear Regression is used for regression. I know Cross Validation works for both but the response variable in both cases should be different, since for KNN should be categorical and for LR should be continuos.
    Great series, enjoying them so far. Thanks for the good content :)

    • @dataschool
      @dataschool  2 ปีที่แล้ว +1

      I'm comparing KNN with Logistic Regression, which is used for classification. Hope that helps!

  • @user-vt3uh2in7o
    @user-vt3uh2in7o 8 หลายเดือนก่อน +1

    best explanation, easy to understand. thank you so much

    • @dataschool
      @dataschool  7 หลายเดือนก่อน

      Thank you!

  • @salmanpatel5666
    @salmanpatel5666 3 ปีที่แล้ว +1

    Thanks a ton, perfectly explained the concept and the code

  • @juancarlosesquivel7855
    @juancarlosesquivel7855 6 ปีที่แล้ว

    Very clear and detailed explanations. Also the links after the videos are very helpful. Thanks.

  • @pierrelaurent8284
    @pierrelaurent8284 7 ปีที่แล้ว

    Can't wait to see next lesson ! Bravo

  • @johnnovotny4286
    @johnnovotny4286 2 ปีที่แล้ว +1

    Excellent. Thanks for sharing your expertise.

  • @andretenreiro
    @andretenreiro 6 ปีที่แล้ว

    Great videos to learn about machine learning! Thanks Kevin for making this avaiable.

  • @abusaleham
    @abusaleham 8 ปีที่แล้ว +10

    Awesome explanation....!

  • @stjepan_8902
    @stjepan_8902 6 ปีที่แล้ว +1

    thank you for introducing me to ML, and also for helping me understand Python through your great pandas videos!

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      You are very welcome!

  • @Kinsella-yt
    @Kinsella-yt 8 ปีที่แล้ว

    Thank you v much for the series. Well done. You make the complex simple, with your clear explanations - a true mark of your understanding of your subject. Respect to you Chief. :) :)

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      +Andrew Kinsella Thank you so much! I have spent a lot of time figuring out how to explain this material clearly :)

  • @soumyareddy3695
    @soumyareddy3695 5 ปีที่แล้ว +1

    Hi Kevin, You are such a great teacher. Love your videos. Cant thank you enough!!

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Thanks very much for your kind words! You are very welcome :)

  • @johnathangonzalez5286
    @johnathangonzalez5286 6 ปีที่แล้ว +2

    REALLY well done!! I found this video extremely helpful. Keep up the good work

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Thanks for your kind comment! :)

  • @prathameshmahankal4180
    @prathameshmahankal4180 5 ปีที่แล้ว +1

    I really love your videos. They are so simple and to the point! Thanks for making such videos. :)

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Thanks very much for your kind words!

  • @KS-ko2zl
    @KS-ko2zl 4 ปีที่แล้ว

    Thank you so much. Your tutorial and style of explaining is exceptional.

  • @ousmanelom6274
    @ousmanelom6274 4 ปีที่แล้ว +1

    best tutorial in youtube

  • @kamran_desu
    @kamran_desu 8 ปีที่แล้ว

    Excellent explanation of cross validation and it's wonders - thanks for the improvement recommendations also

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      Thanks for your kind comment, and you're very welcome!

    • @kamran_desu
      @kamran_desu 8 ปีที่แล้ว

      I actually used your method on my GBM model, something like a 10-fold stratified cross-validation on 80% of the data for hyperparameter tuning e.g. max depth, min rows, etc. in a search grid, and then kept 20% as hold-out set, works quite consistently :)

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      Great to hear!

  • @dataschool
    @dataschool  6 ปีที่แล้ว +34

    *Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos

    • @mubeenkhan8210
      @mubeenkhan8210 5 ปีที่แล้ว

      Updated link shows me this message :
      Sorry, something went wrong. Reload?

    • @MasterofPlay7
      @MasterofPlay7 4 ปีที่แล้ว

      is this still relevant 2020?

  • @jnandikonda
    @jnandikonda 5 ปีที่แล้ว

    Best Explanation ever done by anyone in Machine Learning Community. Hats Off to your Great work and effort in teaching us. May god bless you. Yes we need more concepts on Scikit learn than pandas. but use pandas functionality when needed

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Thanks very much for your kind words!

  • @Kevin7896bn
    @Kevin7896bn 5 ปีที่แล้ว +1

    One of the best explanation. Thanks

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Thanks for your kind words!

  • @HARDYBOY290988
    @HARDYBOY290988 4 ปีที่แล้ว

    First of all let me thank you at first for the extraordinary work you are doing.........
    You are explaining extraordinary things in a very ordinary way..............
    I had a query regarding FIT with the cross validation...........
    When we do linear regression with a SINGLE test train split data set.........we get a SINGLE FIT to predict over the test data........
    Whereas when we do cross validation (for eg cv=10) in linear regression.....we have 10 training datasets...............
    BUT DO WE ALSO HAVE 10 FIT MODELS AS WELL FOR EACH TRAINING MODEL ??????
    OR
    DO WE HAVE AN AVERAGE OF ALL THE 10 FIT MODELS?????
    ###########################################################################
    I am able to get coefficient & intercept of the fit model via single training dataset
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25,random_state=1,)
    lm=LinearRegression()
    fitting=lm.fit(X_train,y_train)
    fitting.coef_
    fitting.intercept_
    How to get the intercept & coefficient for the fit model via cross validation???????
    ############################################################################
    what is the significance of cross_val_predict???? Does it have any relation my query???

  • @marvinjosephagor9493
    @marvinjosephagor9493 8 ปีที่แล้ว

    These are all great material. Thank you very much for uploading these videos. Keep up the great work and know that they are all appreciated! :)

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      +MJoseph A Excellent! You are very welcome.

  • @sarabroadcasting9591
    @sarabroadcasting9591 6 ปีที่แล้ว

    This is really a very good video. Easy to understand

  • @pranayrungta
    @pranayrungta 6 ปีที่แล้ว

    Very well explained!!!! Explanation is very impressive...

  • @vishalaggarwal8783
    @vishalaggarwal8783 6 ปีที่แล้ว

    SIr your lectures are out of this world.Sir please please please make a Seaborn tutorial Series

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Thanks for your kind words, and your suggestion!

  • @eyalcarmi3984
    @eyalcarmi3984 6 ปีที่แล้ว

    Your video tutorial is very good. Thank you for helping understand these topics

  • @rakeshkumarkuwar6053
    @rakeshkumarkuwar6053 5 ปีที่แล้ว

    Thank you sir for such a detailed explanation. I was struggling with the topic. Then luckily found your video and very bit of it is full of knowledge.
    Thanks again for making such informative videos.

  • @tabnaka
    @tabnaka 9 ปีที่แล้ว

    Looking forward to the next video on this!

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      tabnaka Great! It will come out in about two weeks.

  • @javonnii436
    @javonnii436 3 ปีที่แล้ว

    Great video! The only line of code that I needed to update is reshaping the data to pass into the binarize function and then flatten the return ndarray.
    '''y_pred_class_2 = binarize(y_pred_prob.reshape((192,1)), threshold=0.3).flatten() '''

  • @mdinesk
    @mdinesk 9 ปีที่แล้ว

    These videos have been very useful and excellent! Thanks!

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      +Dinesh Kumar Murali Great, thanks for your kind words!

  • @mdudius37
    @mdudius37 5 ปีที่แล้ว +1

    In some places I’ve seen people run K fold cross validation on the entire dataset and in other places I’ve seen people run it only on the training set. They then calculate on the test set separately. Is there any recommendation regarding which practice makes more sense? Great video!!

    • @dataschool
      @dataschool  5 ปีที่แล้ว +1

      This is beyond the scope of what I can get into in a TH-cam comment... sorry!

  • @matinafragkogianni1376
    @matinafragkogianni1376 9 ปีที่แล้ว +4

    Great video, thanks a lot!

    • @dataschool
      @dataschool  9 ปีที่แล้ว +2

      +Matina Fragkogianni You're very welcome, I'm happy to help!

  • @ianmelanson9520
    @ianmelanson9520 8 ปีที่แล้ว

    Great videos. Good approach combining thought process and tools.

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      +Ian Melanson Thanks!

  • @Renan-st1zb
    @Renan-st1zb 7 ปีที่แล้ว

    Great videos! It is well explained, once you understand why (and this is so important) you are using some function or model. Besides, you also have a great resource material (and it shows you have done it with excelence). You are an awesome professor!
    Congrats, from Brazil :)

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Thanks so much for your kind words! I'm glad the videos have been helpful to you. Good luck with your machine learning education!

  • @Dexter01
    @Dexter01 4 ปีที่แล้ว +1

    You are charismatic, thank you!

  • @manishthapliyal6372
    @manishthapliyal6372 5 ปีที่แล้ว +2

    Beautifully explained

  • @mikezhu7852
    @mikezhu7852 4 ปีที่แล้ว

    THE best tutorial ever! Thank u

  • @rabellomusic
    @rabellomusic 7 ปีที่แล้ว

    you are amazing. Thank you for creating this course.

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      You're very welcome! I'm glad it's helpful to you!

  • @RayedWahed
    @RayedWahed 8 ปีที่แล้ว

    Please consider more lessons on Data Visualization and representation as well. Thank you!

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      +Rayed Bin Wahed Thanks for the suggestion!

  • @ovoalways
    @ovoalways 9 ปีที่แล้ว

    Great job Kevin. You're videos are really helpful

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Ogheneovo Dibie Glad you're enjoying them!

    • @ovoalways
      @ovoalways 9 ปีที่แล้ว

      Thanks Kevin. Do you know of any resources that make it easy to import and transform my own datasets before using sci kit learn? I have data samples contain both numerical, categorical and boolean features. Thanks Kevin

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Ogheneovo Dibie I mostly use Pandas for data reading and transformation. I demonstrate Pandas in this video: th-cam.com/video/3ZWuPVWq7p4/w-d-xo.html
      Does that help?

  • @hppeng
    @hppeng 7 ปีที่แล้ว

    THANK YOU SO MUCH. Wonderful video series. Well done. Thanks again.

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      You're very welcome! I'm glad the series is helpful to you!

  • @manalelai2598
    @manalelai2598 6 ปีที่แล้ว

    What a great material ! Thanks a million

    • @dataschool
      @dataschool  6 ปีที่แล้ว +1

      You are very welcome!

  • @abmsaroar2829
    @abmsaroar2829 7 ปีที่แล้ว

    Very Informative and well presented. Thanks a lot for sharing

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      You're very welcome! Hope you enjoy the rest of the series: th-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html

  • @jongcheulkim7284
    @jongcheulkim7284 2 ปีที่แล้ว +1

    Thank you so much. This is very helpful^^

    • @dataschool
      @dataschool  2 ปีที่แล้ว

      Glad it was helpful!

  • @reshaknarayan3944
    @reshaknarayan3944 6 ปีที่แล้ว +1

    Saved my day! May god bless you.

  • @swagatmishra9350
    @swagatmishra9350 4 ปีที่แล้ว

    Thank you very much for such a very beautiful explanation!!!

  • @NipunRawat08
    @NipunRawat08 4 ปีที่แล้ว

    The best tutorials ever!!

  • @uniqueraj518
    @uniqueraj518 9 ปีที่แล้ว

    Nice to see your video after long time, I have some confusion, I hope i will be clear after your response.
    1. Please correct me that i understood the demerit of Train/Test split is high variance or differences between Training and Testing data will affect the Testing accuracy.
    2. I really want to know what does random_state parameter does when you change it form 4 , 3,2,1 and 0.
    3. For Classification , you mentioned Stratified Sampling to make K- Fold. How does it affect for the accuracy of the model ( For eg. out of total 5000 rows or observation if have 80% ham and 20 % spam mail or out of total 5000 rows or observation if have 50% ham and 50 % spam mail ) in my collected dataset.
    4. Since you have numerical feature only , so you used accuracy as a metrics to select the best feature, what do you suggest if your dataset contains object datatypes like dates format, and string objects like text data.
    I am new student of datascience , so i am sorry for long comments

    • @dataschool
      @dataschool  9 ปีที่แล้ว +1

      unique raj Great questions! My responses:
      1. The disadvantage of train/test split is that the resulting performance estimate (called "testing accuracy") is high variance, meaning that it may change a lot depending upon the random split of the data into training and testing sets.
      2. Try removing the random_state parameter, and running train_test_split multiple times. Every time you run it, you will get different splits of the data. Now use random_state=1, and run it multiple times. Every time you run it, you will get the same exact split of the data. Now change it to random_state=2, and run it multiple times. Every time you run it, you will get the same exact split of the data, though it will be different than the split resulting from random_state=1. Thus, the point of using random_state is to introduce reproducibility into your process. It doesn't actually matter whether you use random_state=1 or random_state=9999. What matters is that if you set a random_state, you can reproduce your results.
      3. In this context, stratified sampling relates to how the observations are assigned to the cross-validation folds. The reason to use stratified sampling in this context is that it will produce a more reliable estimate of out-of-sample accuracy. It doesn't actually have anything to do with making the model itself more accurate.
      4. My choice of classification accuracy as the evaluation metric is not actually related to the data types of the features. Your features in a scikit-learn model will always be numeric. If you have non-numeric values that you want to use as features, you have to transform them into numeric features (which I will cover in a future video).
      Hope that helps!

  • @sridhaarrb
    @sridhaarrb 6 ปีที่แล้ว

    Good Vedio, it make concepts clear and great to understand...

  • @anukumawatradha4899
    @anukumawatradha4899 5 ปีที่แล้ว

    Too good sir, its really very helpful. Actually I request you to made a video on "How to select models from various available ones"

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Thanks for your suggestion!

  • @brendensong8000
    @brendensong8000 3 ปีที่แล้ว

    Another great class!!!!

  • @arunavasengupta160
    @arunavasengupta160 5 ปีที่แล้ว +1

    Brilliant explanation. But can you/somebody please make me understand, why u choose to k=20, it can be 13 or 17..???

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      I can't remember exactly, but I probably chose it because it's the simplest model among those options with the best performance. For more details on what I mean by "simple", see here: scott.fortmann-roe.com/docs/BiasVariance.html

  • @jlaroche0
    @jlaroche0 6 ปีที่แล้ว

    The difference between the average RMSE with the Newspaper feature and 10-fold cross-validation without the Nespaper feature seems - to me - quite negligible. Could you walk through the logic behind choosing when the difference is actually small enough to keep a feature and when one should decide to drop a feature? Thanks!
    And, BTW: your video series is amazing! Keep it up!

    • @dataschool
      @dataschool  6 ปีที่แล้ว +1

      Thanks for your kind words!
      The simple answer is that you should always prefer a simpler model (less features), unless having more features provides a "meaningful" increase in performance. There's no strict definition for "meaningful", it depends on context. Hope that helps!

  • @tomasemilio
    @tomasemilio 7 ปีที่แล้ว

    Dude, honestly, this is golden material. Where do you teach? I want to watch more of your tutorials. please send a link or something.

    • @dataschool
      @dataschool  7 ปีที่แล้ว +1

      Thanks very much! Currently I teach online only. You can find more of my tutorials here, and also sign up for my newsletter: www.dataschool.io/
      I'll be announcing new tutorials, webcasts, and/or courses in the coming months. Stay tuned! :)