AdaBoost, Clearly Explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ค. 2024
  • AdaBoost is one of those machine learning methods that seems so much more confusing than it really is. It's really just a simple twist on decision trees and random forests.
    NOTE: This video assumes you already know about Decision Trees...
    • Decision and Classific...
    ...and Random Forests....
    • StatQuest: Random Fore...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    Sources:
    The original AdaBoost paper by Robert E. Schapire and Yoav Freund
    www.sciencedirect.com/science...
    And a follow up by co-created Schapire:
    rob.schapire.net/papers/explai...
    The idea of using the weights to resample the original dataset comes from Boosting Foundations and Algorithms, by Robert E. Schapire and Yoav Freund
    mitpress.mit.edu/books/boosting
    Lastly, Chris McCormick's tutorial was super helpful:
    mccormickml.com/2013/12/13/ada...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    TH-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    0:56 The three main ideas behind AdaBoost
    3:30 Review of the three main ideas
    3:58 Building a stump with the GINI index
    6:27 Determining the Amount of Say for a stump
    10:45 Updating sample weights
    14:47 Normalizing the sample weights
    15:32 Using the normalized weights to make the second stump
    19:06 Using stumps to make classifications
    19:51 Review of the three main ideas behind AdaBoost
    Correction:
    10:18. The Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25, not 0.42.
    #statquest #adaboost

ความคิดเห็น • 1.7K

  • @statquest
    @statquest  4 ปีที่แล้ว +162

    Correction:
    10:18. The Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25, not 0.42.
    NOTE 0: The StatQuest Study Guide is available: app.gumroad.com/statquest
    NOTE 2: Also note: In statistics, machine learning and most programming languages, the default log function is log base 'e', so that is the log that I'm using here. If you want to use a different log, like log base 10, that's fine, just be consistent.
    NOTE 3: A lot of people ask if, once an observation is omitted from a bootstrap dataset, is it lost for good? The answer is "no". You just lose it for one stump. After that it goes back in the pool and can be selected for any of the other stumps.
    NOTE: 4: A lot of people ask "Why is "Heart Disease =No" referred as "Incorrect""? This question is answered in the StatQuest on decision trees: th-cam.com/video/_L39rN6gz7Y/w-d-xo.html However, here's the short version: The leaves make classifications based on the majority of the samples that end up in them. So if most of the samples in a leaf did not have heart disease, all of the samples in the leaf are classified as not having heart disease, regardless of whether or not that is true. Thus, some of the classifications that a leaf makes are correct, and some are not correct.
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @parvezaiub
      @parvezaiub 4 ปีที่แล้ว +1

      Isn't it be 0 .1109?

    • @statquest
      @statquest  4 ปีที่แล้ว +14

      @@parvezaiub That's what you get when you use log base 10. However, in statistics, machine learning and most programming languages, the default log function is log base 'e'.

    • @thunderkat2911
      @thunderkat2911 3 ปีที่แล้ว +1

      you should pin this to the top

    • @sidisu
      @sidisu 3 ปีที่แล้ว +23

      Hi Josh - great videos, thank you! Question on your Note 3: How does omitted observations get "back into the pool"? Seems in the video around 16:16, the subsequent stumps are made based on performance of the previous stump (re-weighting observations from previous stump)... if that's the case, when do you put "lost observations" back into the pool? How would you update the weights if the "lost observations" was not used to assess the performance of the newest stump?

    • @lejyonerxx
      @lejyonerxx 3 ปีที่แล้ว +4

      First, thank you for those great videos. I have the same question that Tim asked. How does omitted observations get "back into the pool"?

  • @indrab3091
    @indrab3091 3 ปีที่แล้ว +108

    Einstein says "if you can't explain it simply you don't understand it well enough" and i found this AdaBoost explanation bloody simple. Thank you, Sir.

    • @statquest
      @statquest  3 ปีที่แล้ว +3

      Thank you! :)

  • @codeinair627
    @codeinair627 2 ปีที่แล้ว +29

    Everyday is a new stump in our life. We should give more weightage to our weakness and work on it. Eventually, we will become strong like Ada Boost. Thanks Josh!

  • @iftrejom
    @iftrejom 2 ปีที่แล้ว +124

    Josh, this is just awesome. The simple and yet effective ways you explain otherwise complicated Machine Learning topics is outstanding. You are a talented educator and such a bless for the entire ML / Data Science / Statistics learners all around the world.

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Awesome, thank you!

  • @shaunchee
    @shaunchee 3 ปีที่แล้ว +18

    Man right here just clarified my 2-hour lecture in 20 mins. Thank you.

  • @aleksey3231
    @aleksey3231 4 ปีที่แล้ว +309

    Please, Can anyone make 10 hours version 'dee doo dee doo boop'?

    • @statquest
      @statquest  4 ปีที่แล้ว +53

      You made me laugh out loud! :)

    • @50NTD
      @50NTD 3 ปีที่แล้ว +10

      sounds good, i want it too

    • @sketchbook8578
      @sketchbook8578 3 ปีที่แล้ว +5

      @@statquest I would seriously play it for my background music during work... Please make one lol.

    • @swaralipibose9731
      @swaralipibose9731 3 ปีที่แล้ว

      I also want some 'dee doo Dee doo boop '

    • @VLM234
      @VLM234 3 ปีที่แล้ว

      @StatQuest how to apply adaboost for regression?

  • @dreamhopper
    @dreamhopper 4 ปีที่แล้ว +43

    Wow. I cannot emphasize on how much I'm learning from your series on machine learning. Thank you so much! :D

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      Hooray! I'm glad the videos are helpful. :)

  • @anishchhabra5313
    @anishchhabra5313 ปีที่แล้ว +7

    This video is just beyond excellent. Crystal clear explanation, no one could not have done it better. Thank you, Josh.

  • @miesvanaar5468
    @miesvanaar5468 4 ปีที่แล้ว +7

    Dude... I really appreciate you make these videos and put so much effort in to making them clear. I am buying a t-shirt to do my small part in supporting this amazing channel,.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Hooray!! Thank you very much! :)

  • @catherineLC2094
    @catherineLC2094 3 ปีที่แล้ว +8

    Thank you for the study guides Josh! I did not know about them and I spend 5 HOURS making notes about your videos of decision trees and random forests. I think 3 USD value less than 5 hours of my time, I purchased the study guide for AdaBoost and cannot wait for the rest of them (specially neural networks!)

    • @statquest
      @statquest  3 ปีที่แล้ว

      Hooray!!! I'm so happy you like them. As soon as I finish my videos on Neural Networks, I'll start making more study guides.

  • @AayushRampal
    @AayushRampal 4 ปีที่แล้ว +19

    You are THE BEST, can't tell how much i've got to learn from statquest!!!

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Awesome! Thank you so much! :)

  • @emadrio
    @emadrio 4 ปีที่แล้ว +13

    Thank you for this. These videos are concise and easy to understand. Also, your humor is 10/10

  • @grumpymeercat
    @grumpymeercat 5 ปีที่แล้ว +11

    I love this format, you're great.
    RiTeh strojno mafija where you at?

  • @olegzarva708
    @olegzarva708 4 ปีที่แล้ว +3

    You're my hero, Josh! This is so much more understandable than twisted formulas.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @huyentrangnguyen8917
    @huyentrangnguyen8917 2 ปีที่แล้ว +1

    I am a beginner in ML and all of your videos help me a lot to understand these difficult things. I have nothing to say but thank you so so sooooooooo much.

  • @debabrotbhuyan4812
    @debabrotbhuyan4812 4 ปีที่แล้ว +2

    How come I missed this channel for so long? Absolutely brilliant.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you!

  • @schneeekind
    @schneeekind 5 ปีที่แล้ว +247

    HAHA love your calculation sound :D :D :D

  • @prethasur7376
    @prethasur7376 2 ปีที่แล้ว +4

    Your tutorials are simply awesome Josh! You are a great help!

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Glad you like them!

  • @RaviShankar-jm1qw
    @RaviShankar-jm1qw 3 ปีที่แล้ว +1

    I get impressed by each video of yours..and in free time recapitulated what you taught in the videos, sometimes. Awesome Josh!!!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Awesome! Thank you!

  • @kamranabbasi6757
    @kamranabbasi6757 2 ปีที่แล้ว +1

    Best video of Ada Boost on the TH-cam, watched it two times to understand it fully.
    It's such a beautiful explanation...

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you! :)

  • @lisun7158
    @lisun7158 2 ปีที่แล้ว +6

    AdaBoost: Forest of Stumps
    1:30 stump: a tree just with 1 node and 2 leaves.
    3:30 AdaBoot: Forest of Stumps;
    Different stumps have different weight/say/voice;
    Each stump takes previous stumps' mistakes into account. (AdaBoot, short for Adaptive Boosting)
    6:40 7:00 Total Error: sum of (all sample weights (that associated with incorrectly classified samples))
    7:15 Total Error ∈ [0,1] since all sample weights of the train data are added to 1.
    (0 means perfect stump; 1 means horrible stump)
    --[class notes]

  • @prudvim3513
    @prudvim3513 5 ปีที่แล้ว +32

    I always love Josh's Videos. There is a minor calculation error while calculating amount of say for chest pain stump. (1-3/8)/(3/8) = 5/3, not 7/3

  • @marcelocoip7275
    @marcelocoip7275 2 ปีที่แล้ว +2

    Hi Josh, I'm very grateful with your videos, they really complement my ML python programing studies. I really really (double really bam) apreciatte that you take the time to answer our questions. I know that you receive a lot of compliments about your explanations aproach (It's spectacular) but this "after-sales" service (answering alllll the coments) is even more valuable to me. I'm building myself as a DS, and sometines I fell "mentorless", your answers are some kind of kindly warm push towards my objetive. I will gratefully buy a Triple Bam Mug (It's very cool!) with my first salary. Cheers from Argentina!

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you very much!!! I'm glad you like my videos and good luck with your studies!

  • @nicolasavendano9459
    @nicolasavendano9459 3 ปีที่แล้ว +8

    I can't believe how useful your channel has been these days man! I literally search up anything ML related in youtube and there's your great video explaining! The intro songs and BAMS make everything so much clearer dude, the only bad thing I could say about these videos is that they lack a conclusion song lol

  • @pabloruiz577
    @pabloruiz577 5 ปีที่แล้ว +77

    AdaBoost -> Gradient Boosting -> XGBoost series will be awesome! First step AdaBoost clearly explained : )

    • @statquest
      @statquest  5 ปีที่แล้ว +40

      I'm just putting the finishing touches on Gradient Descent, which will come out in a week or so, then Gradient Boosting and XGBoost.

    • @pabloruiz577
      @pabloruiz577 5 ปีที่แล้ว +2

      That sounds great@@statquest! I guess you are the Machine Teaching

    • @Criptoverso_oficial
      @Criptoverso_oficial 5 ปีที่แล้ว +2

      @@statquest I'm waiting this as well!

    • @maleianoie7774
      @maleianoie7774 5 ปีที่แล้ว +2

      @@statquest when will you post Gradient Boosting and XGBoost?

    • @shivaprasad1277
      @shivaprasad1277 5 ปีที่แล้ว +2

      @@statquest waiting for Gradient Boosting and XGBoost

  • @jatintayal1488
    @jatintayal1488 5 ปีที่แล้ว +61

    That opening scared me..😅😅

    • @OttoFazzl
      @OttoFazzl 4 ปีที่แล้ว +4

      You were scared to learn that ML is not so complicated? BAMM!

    • @abhishek007123
      @abhishek007123 4 ปีที่แล้ว +1

      Lolo

  • @anirudhgangadhar6158
    @anirudhgangadhar6158 2 หลายเดือนก่อน +1

    This is by far the best explanatory video on "AdaBoost" that I have come across.

    • @statquest
      @statquest  2 หลายเดือนก่อน

      Thanks!

  • @cslewisster
    @cslewisster 2 ปีที่แล้ว +1

    I should have checked here instead of everywhere else. Josh sings a song and explains things so clearly. Love the channel. Thanks again!

  • @roeiamos4491
    @roeiamos4491 2 ปีที่แล้ว +3

    The explanation is brilliant, thank so much for keeping things so simple

  • @bhupensinha3767
    @bhupensinha3767 5 ปีที่แล้ว +12

    Hi Josh, excellent video. But I am not able to understand how weighted gini index is calculated after j have adjusted the sample weights ... Can you PL help?

    • @haobinyuan3260
      @haobinyuan3260 4 ปีที่แล้ว

      I am confused as well :(

    • @jiayiwu4101
      @jiayiwu4101 3 ปีที่แล้ว

      It is same as Gini Impurity in Decision Tree video.

    • @DawFru
      @DawFru ปีที่แล้ว

      Take the example of Chest Pain
      Gini index = 1 - (3/5)^2 - (2/5)^2 = 0.48 for the Yes category
      Gini index = 1 - (2/3)^2 - (1/3)^2 = 0.44 for the No category
      Since each category has a different number of samples, we have to take the weighted average in order to get the overall (weighted) Gini index.
      Yes category weight = (3 + 2) / (3 + 2 + 2 + 1) = 5/8
      No category weight = (2 + 1) / (3 + 2 + 2 + 1) = 3/8
      Total Weighted Gini index = 0.48 * (5/8) + 0.44 * (3/8) = 0.47

  • @MrMetralloCity
    @MrMetralloCity 4 ปีที่แล้ว +1

    i love your intros and outros, are simply awesome!!! i really enjoy learning with your channel!!!!

  • @anushagupta4944
    @anushagupta4944 5 ปีที่แล้ว

    Thanks a lot for this video Josh. So fun and easy to understand. Keep up the good work.

  • @dimitriskass1208
    @dimitriskass1208 4 ปีที่แล้ว +26

    The real question is: Is there a model which can predict the volume of "bam" sound ?

    • @statquest
      @statquest  4 ปีที่แล้ว +4

      Great Question! :)

    • @dimitriskass1208
      @dimitriskass1208 4 ปีที่แล้ว

      @@statquest 😆😆

    • @indrab3091
      @indrab3091 3 ปีที่แล้ว +3

      The Bam has total error 0, so the amount of say will freak out :)

  • @alexthunder3897
    @alexthunder3897 4 ปีที่แล้ว +90

    I wish math in real life happened as fast as 'dee doo dee doo boop' :D

  • @breakurbody
    @breakurbody 5 ปีที่แล้ว +2

    Thank you Statquest. Was eagerly waiting for Adaboost, Clearly Explained.

    • @statquest
      @statquest  5 ปีที่แล้ว

      Hooray!!! :)

  • @daesoolee1083
    @daesoolee1083 3 ปีที่แล้ว +1

    Wow, you explained the concept of bootstrapping so easily without even mentioning it! Impressive!

  • @rossburton1085
    @rossburton1085 5 ปีที่แล้ว +3

    I've just started a PhD in sepsis immunology and applied machine learning and this channel has been a god send.
    Josh, in the future would you have any interest in creating some videos about mixture models? Something I'm struggling to get my head around at the moment and I am struggling to find good learning resources for

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      I'm definitely planning on doing videos on mixture models. I have to finish a few more Machine Learning videos, then I want to do a handful of basic stats videos and then I'll dive into mixture models.

    • @ucanhvan4557
      @ucanhvan4557 7 หลายเดือนก่อน

      Hi Ross, I really hope that you get your Phd, I am also a new Phd student who trying to apply ML to my Mechanical research. Could you please guide me with some suggestions to begin?. Thank you so much!

  • @gdinu0
    @gdinu0 4 ปีที่แล้ว +3

    such a complex concept you explained with ease.. Awesome video

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @tymothylim6550
    @tymothylim6550 3 ปีที่แล้ว +1

    Thank you very much for this video! It was a difficult topic but the step-by-step process helped me familiarize with the topic! Very helpful going through of examples!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it helped!

  • @MayMay-dz4yb
    @MayMay-dz4yb 2 ปีที่แล้ว +1

    This is very enjoying and yet understandable to watch, best adaboost explanation, I've watch almost all the video here

    • @statquest
      @statquest  2 ปีที่แล้ว

      Wow, thanks!

  • @holloloh
    @holloloh 4 ปีที่แล้ว +4

    Could you elaborate on weighted gini function? Do you mean that for computing the probabilities we take weighted sums instead of just taking the ratio, or is it something else?

    • @mario1ua
      @mario1ua 5 หลายเดือนก่อน

      I understand he calculates Gini for every leaf, then multiplies by whatever number of predictions is in that leaf and divides by total number of predictions in both leafs (8) so this index is weighted by the size of that leaf. Then sums weighted indices from both leafs. At least I'm getting the same results when applying this formula.

  • @mashinov1
    @mashinov1 5 ปีที่แล้ว +31

    Josh, you're the best. Your explanations are easy to understand, plus your songs crack my girlfriend up.

    • @statquest
      @statquest  5 ปีที่แล้ว +2

      That's awesome!! :)

  • @endlessriddles2663
    @endlessriddles2663 4 ปีที่แล้ว +1

    Your videos have seriously been saving me! Thank you so much and keep them coming!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @user-fo2cy4hq6z
    @user-fo2cy4hq6z ปีที่แล้ว +1

    vraiment exceptionnelle!! le travail et l'effort pour vulgariser presque les concepts du machine learning et sans oublié les stats en général, tout simplement prodigieux. Un grand merci Josh!! chacun ses héros, moi j'en ai trouvé un!!! bonne continuation.

  • @aakashjain5035
    @aakashjain5035 5 ปีที่แล้ว +5

    Hi Josh you are doing great job. Can you please make a video on Xgboost. That will be very helpful

  • @harry5094
    @harry5094 5 ปีที่แล้ว +5

    Hi Josh,
    Love your videos from India,
    Can you please tell me how to calculate the amount of say in regression case and also the sample weights?
    Thanks

    • @saisuryakeshetty7589
      @saisuryakeshetty7589 ปีที่แล้ว

      Did you get your answer? If yes, could you please explain

  • @JT2751257
    @JT2751257 4 ปีที่แล้ว +3

    had to watch two times to fully grasp the concept.. Worth every minute :)

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Awesome! Thanks for giving the video a second chance. :)

  • @eddiethinhvuong1607
    @eddiethinhvuong1607 2 ปีที่แล้ว +1

    Your channel is the best one about Stats I found so far

    • @statquest
      @statquest  2 ปีที่แล้ว

      Wow, thanks!

  • @jinqiaoli8985
    @jinqiaoli8985 4 ปีที่แล้ว +3

    Hi Josh,
    I love your videos so much! You are awesome!!
    A quick question on total error, how could a tree give a total error greater than 0.5? In such a case, I guess the tree will simply flip the label?
    Is this because of the weight? The total error is calculated on the original sample, not the resampled sample? If so, even though a tree correctly classifies a sample that previous trees cannot, its vote may be reversed. How could it improve the overall accuracy?
    Thank you!

    • @statquest
      @statquest  4 ปีที่แล้ว

      A tree can have a total error of up to 1 if it totally gets everything wrong. In that case, we would just swap its outputs, by giving it a large, but negative, "amount of say" and then it would get everything right! And while it's very hard to imagine that this is possible using a tree as a "weak learner", you have to remember that AdaBoost was originally designed to work with any "weak learner", not just short trees/stumps, so by allowing total error to go over 0.5 it is flexible to the results of any "weak learner".

    • @jinqiaoli8985
      @jinqiaoli8985 4 ปีที่แล้ว +1

      @@statquest Bam!!! Thanks for the quick reply. I think I got the point. Looking forward to episode 2 of XGBoost, Merry Christmas and Happy New Year! 😃😃

    • @statquest
      @statquest  4 ปีที่แล้ว

      @@jinqiaoli8985 I can't wait to release the next XGBoost video. I just have a few more slides to work on before it's ready.

  • @jonasvilks2506
    @jonasvilks2506 5 ปีที่แล้ว +3

    Hello. There is a little error in arithmetics. But AdaBoost is clearly explained! Error on 10:18: Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25 but not 0.42.
    I also join others in asking to talk about Gradient Boosting next time.
    Thank you.

    • @statquest
      @statquest  5 ปีที่แล้ว +2

      Aaaaah. There's always one silly mistake. This was a copy/paste error. Oh well. Like you said, it's not a big deal and it doesn't interfere with the main ideas... but one day, I'll make a video without any silly errors. I can dream! And Gradient Boosting will be soon (in the next month or so).

    • @williamzheng5918
      @williamzheng5918 5 ปีที่แล้ว

      @@statquest Don't worry about small errors like these, your time is GOLD and shouldn't be consumed by these little mistakes, use it to create more 'BAM'! The audience will check the errors for you! All you need to do is to pin that comment when appropriate so that other people will notice.
      PS, how to PIN a comment (I paste it here to save your precious time ^_^) :
      - Sign in to TH-cam.
      - In the comments below a video, select the comment you want like to pin.
      - Click the menu icon > Pin. If you've already pinned a comment, this will replace it. ...
      - Click the blue button to confirm. On the pinned comment, you'll see a "Pinned by" icon.

  • @balajisubramanian6565
    @balajisubramanian6565 5 ปีที่แล้ว +1

    Thank you Josh for all of your great videos. You are a good Samaritan!

    • @statquest
      @statquest  5 ปีที่แล้ว

      Awesome! Thank you! :)

  • @dwaipayansaha4443
    @dwaipayansaha4443 2 ปีที่แล้ว +1

    Hi Josh Starmer ,
    A huge BAM for this video.
    The best explanation I have ever seen for Adaboost.
    Keep helping people.

    • @statquest
      @statquest  2 ปีที่แล้ว

      Glad it was helpful!

  • @jaivratsingh9966
    @jaivratsingh9966 4 ปีที่แล้ว +4

    Thanks, Josh for this great video! Just to highlight, at 10:21 your calculation should be 1/2 * log((1-3/8)/3/8)=1/2*log(5/3)
    How did you conclude that the first stump will be on weights? because of min total error or min total impurity among three features? It might happen that total error and impurity may not rank the same for all features, though they happen to be the same rank here.

    • @statquest
      @statquest  4 ปีที่แล้ว

      I've put a note about that error in the video's description. Unfortunately TH-cam will not let me edit videos once I post them. The stump was weighted using the formula given at 7:32

  • @tejpunjraju9718
    @tejpunjraju9718 3 ปีที่แล้ว +8

    "Devmaanush" hai ye banda!
    Translation: This dude has been sent by God!

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Thank you very much! :)

  • @chinedunwasolu4913
    @chinedunwasolu4913 4 ปีที่แล้ว +1

    I just want to say, THANK YOU. You video really helped me to understand the equations.

  • @Otonium
    @Otonium 5 ปีที่แล้ว +1

    Thank you for all those crystal clear explained videos. Really appreciated.

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you! :)

    • @rahimimahdi
      @rahimimahdi 5 ปีที่แล้ว +1

      Thanks, I’m learning machine learning with your cool videos

    • @statquest
      @statquest  5 ปีที่แล้ว

      @@rahimimahdi Hooray! :)

  • @sezaneyuan3111
    @sezaneyuan3111 4 ปีที่แล้ว +4

    Love the opening music, make me laugh at machine learning course. What an odd!

  • @swadhindas5853
    @swadhindas5853 5 ปีที่แล้ว +6

    Ammount of say for chest pain how 7/3 i think it will be 5/3

  • @richardwatts62
    @richardwatts62 4 ปีที่แล้ว +1

    Incredible teaching. Thank you very much for creating all of this excellent content.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @shashiranjanhere
    @shashiranjanhere 5 ปีที่แล้ว +1

    Looking forward to Gradient Boosting Model and implementation example. Somehow I find it difficult to understand it intuitively. Your way of explaining the things goes straight into my head without much ado.

    • @statquest
      @statquest  5 ปีที่แล้ว

      Awesome! Gradient Boosting should be available soon.

    • @atinsingh164
      @atinsingh164 5 ปีที่แล้ว +1

      Thanks, that will be very helpful!

    • @statquest
      @statquest  5 ปีที่แล้ว

      @@atinsingh164 I'm working on it right now.

  • @radusdirect
    @radusdirect 4 ปีที่แล้ว +3

    3:22 "Errors made by the 2nd stump influences the making of the 3rd stump"; it is not accurate to say that the errors made by "i_th" stump influence "i+1_th" stump. The errors made by the "1 to i" additive classifiers collectively influence the construction of the "i+1_th" stump. But, otherwise, this is a wonderful presentation.

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      You are correct - the mistakes are additive.

    • @gorgolyt
      @gorgolyt 4 ปีที่แล้ว

      @@statquest Please fix the video because that's the confusion I came here to rectify. That's a big mistake.

    • @gorgolyt
      @gorgolyt 4 ปีที่แล้ว

      Actually having read the original AdaBoost authors now, I don't think the training model is a sum of the previous models..?

  • @vedprakash-bw2ms
    @vedprakash-bw2ms 5 ปีที่แล้ว +3

    I love your Songs..
    Please make a video on XGBoost .

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thanks! I'm working on Gradient Descent and then Gradient Boost. Those should be out soon.

  • @xizhilow6815
    @xizhilow6815 3 ปีที่แล้ว +2

    thank you for making these videos, they are really helping me with my ML class!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Hooray! And good luck with your class.

  • @countryberry07
    @countryberry07 4 ปีที่แล้ว +1

    Excellent video, very clear explanation of a fairly complex predictive modeling technique.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @devarshigoswami9149
    @devarshigoswami9149 3 ปีที่แล้ว +3

    It'd be really refreshing to hear an actual model make dee doo dee doo boop' sounds while training.

  • @UsmanKhan-lp2mg
    @UsmanKhan-lp2mg 4 ปีที่แล้ว +5

    Hey Josh, I'm back to study Machine Learning for Final Exams 😂😂😂

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Good luck and let me know how they go! :)

  • @rahulsihara8946
    @rahulsihara8946 หลายเดือนก่อน +1

    such an amaizng explanation, intuitvely shows how Ada boost helps in making the model better than decision tree.

    • @statquest
      @statquest  หลายเดือนก่อน

      Thanks!

  • @vlogwithdevesh9914
    @vlogwithdevesh9914 3 ปีที่แล้ว +1

    You are a great teacher who makes learning a lot fun!!!

  • @ccuuttww
    @ccuuttww 5 ปีที่แล้ว +24

    10:15 Warning wrong calculation alert
    it is 5/8 not 7/8 since it is 1-3/8
    and your remaining part fxxk up!

    • @salmenkveld5717
      @salmenkveld5717 4 ปีที่แล้ว

      There is also another error in the formula. The formula should be with ln instead of log!

  • @a_sn_hh7027
    @a_sn_hh7027 5 ปีที่แล้ว +3

    tripple bam

  • @amalnasir9940
    @amalnasir9940 2 ปีที่แล้ว +1

    No wonder why AdaBoost takes looong time to run! Thank you for the nice explanation as always!

  • @mrcharm767
    @mrcharm767 ปีที่แล้ว +1

    u just cant imagine how great this way .. this could not be learnt better than this video

    • @statquest
      @statquest  ปีที่แล้ว +1

      Thanks again! :)

  • @AmitKumar-sj9gr
    @AmitKumar-sj9gr 4 ปีที่แล้ว +1

    I am really in love with you dude !!! Hearty congrats for amazing work. Please keep doing, Cheers.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thanks so much! :)

  • @gren287
    @gren287 5 ปีที่แล้ว +1

    Im so excited, thanks for another Statquest. :)

  • @harshtripathi465
    @harshtripathi465 5 ปีที่แล้ว +2

    Hello Sir,
    I really love the simple ways in which you explain such difficult concepts. It would be really helpful to me and probably a lot of others if you could make a series on Deep Learning, i.e., neural networks, gradient descent etc.
    Thanks!

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you so much! I'm working on Gradient Descent right now. I hope it is ready in the next week or two.

  • @user-jw8fl4ru9i
    @user-jw8fl4ru9i 2 ปีที่แล้ว +1

    I am in love with this channel. I think the main reason is the Josh explanation style :D

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thanks! 😃

  • @calvin5371
    @calvin5371 5 ปีที่แล้ว +1

    Hey, Josh great tutorial .. just one question ... till when do you keep making the new samples and classifying on the new dataset .. till a feature gets all the samples correctly classified because of a great penalty of misclassified features?

    • @HL-iw1du
      @HL-iw1du 9 หลายเดือนก่อน

      I was wondering that too. Maybe when the weights start to stabilize.

  • @chyldstudios
    @chyldstudios 4 ปีที่แล้ว +2

    Another amazing data science video! This guy is crushing it.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @user-xw9cp3fo2n
    @user-xw9cp3fo2n 2 ปีที่แล้ว +1

    Thanks, Josh, your explanation is amazing. Greetings from Egypt

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you! :)

  • @bayurukmanajati1224
    @bayurukmanajati1224 4 ปีที่แล้ว +1

    This is very informative for me. I skipped the Decision Tree but... but... I can understand it! Love your vids!

  • @birukabereambaw3425
    @birukabereambaw3425 3 ปีที่แล้ว +1

    Dude , you are brilliant brilliant brilliant , how did you come with this kind of teaching style , Clearly Explained !!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thanks a ton!

  • @thunderkat2911
    @thunderkat2911 3 ปีที่แล้ว +1

    Your channel has saved me a looooooot of time. Thanks!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad to hear it!

  • @challakartheek1
    @challakartheek1 5 ปีที่แล้ว

    Superb lecture... Best explanation for me so far.

  • @AfsanaKhan-dg5lf
    @AfsanaKhan-dg5lf 4 ปีที่แล้ว +1

    Your explanations are just awesome!!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much. :)

  • @gabrielcournelle3055
    @gabrielcournelle3055 3 ปีที่แล้ว +2

    Wow, that was great. I love how you make it sound so simple. Just a question about the construction of the third stump without weighted gini. Suppose sample number 5 was not picked to build my new dataset to feed to stump number 2. Can sample number 5 be picked to feed to stump number 3 or do I lose it for the rest of the stumps ?

    • @statquest
      @statquest  3 ปีที่แล้ว

      you just lose it for one stump. After that it goes back in the pool and can be selected for any of the other stumps.

    • @cwc2005
      @cwc2005 3 ปีที่แล้ว +1

      @@statquest Hi Josh, but wouldn't there be no sample weight assigned to that number 5 if it is not iterated in the 2nd stump? without a sample weight as a result of stump 2, then how would it be randomly selected in the selection for the 3rd stump?
      Does that mean the changes to the weights from stump 2, will be only applied to samples in stump 2 but the samples not selected will still retain their weights from stump 1, and the normalization of weights then be done for all samples regardless of them in stump 2 or not?

    • @statquest
      @statquest  3 ปีที่แล้ว

      @@cwc2005 I believe the old weights are retained for samples not included in building the new stump.

    • @marcelocoip7275
      @marcelocoip7275 2 ปีที่แล้ว

      @@statquest But in that case, the elements that were not picked will be "more relevant" in the next stump? Seems like a weakness/inconsistency of the method.

  • @zimdogmail
    @zimdogmail 3 ปีที่แล้ว +2

    Hey Josh great video and I appreciate how active you are in the comments. I have a few questions that came to me when watching the video and reading some of the comments and trying to implement this myself that I couldn't find the answer in the video or in the stat quest for this video I purchased:
    1: Regarding your note 3 in the pinned comment: if we put the observations back in the dataset to be selected for other stumps, what weight is to be associated with the samples (both included and not included in the 'new dataset')?
    - And if we are using the original dataset for future stumps, (as far as I understand by your note, but maybe I am incorrect as you say "get rid of the original" at 18:08) what is the point of making new weights all initialized at 1/total for this new dataset if this dataset is not used for these future stumps?
    2: What are we to do if there are more options than just "yes" and "no" for a variable, say for example we added "sometimes" for chest pain, what would be the stump(s) created for this variable like at 5:30? Would it be CHEST PAIN that has branches to yes, no, and sometimes? Could you give an example of making a stump(s) for this variable with 3 different options?
    3: How possible is it for you to make a part 2 that includes a variable with more than 2 options and the third iteration of making stumps haha? I think this would help so we can see how this new dataset's weights effects our original dataset and shows an example of your note 3 of picking out of the original dataset a second time.
    Again great videos and thanks

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      1) This is a good question, and it's not something I'm 100% sure I know the correct answer to. So I'll recommend checking out the original manuscript cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf as it might have the answers you are interested in.
      2) If you have more than yes/no options, you can one-hot-encode your data. Hopefully I'll do a video on one-hot-encoding sometime soon.

  • @quant-trader-010
    @quant-trader-010 2 ปีที่แล้ว +2

    Man, you are too good at explaining things!

  • @davidkaftan5563
    @davidkaftan5563 3 ปีที่แล้ว

    Man, you are really good at explaining these things.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @burakkaya6149
    @burakkaya6149 3 หลายเดือนก่อน +1

    Even somebody who doesn't know English, could understand machine learning with your videos

    • @statquest
      @statquest  3 หลายเดือนก่อน

      Thank you!

  • @fgfanta
    @fgfanta 5 ปีที่แล้ว +1

    Thanks! What about gradient boosting? Is it used for genomics? I am aware that it has been successful used in Kaggle competitions, but don't find applications to genomics, in spite of the support of XGBoost and CatBoost for R.

    • @statquest
      @statquest  5 ปีที่แล้ว +3

      You know what's really funny - I just wrote a genomics application that uses XGBoost, so I know it can work in that setting. I'm using it to predict cell type from single-cell RNA-seq data. It works better than AdaBoost or Random Forests. However, it turns out that Random Forests have some nice statistical properties that make me want to use them over gradient boost. I may pursue both methods.

  • @yabgdouglas6032
    @yabgdouglas6032 2 ปีที่แล้ว +1

    these are seriously SO good - thank you so much for all the help :)

    • @statquest
      @statquest  2 ปีที่แล้ว

      Glad you like them!

  • @ayenewyihune
    @ayenewyihune 2 ปีที่แล้ว +1

    I will recommend this channel for as many as I can

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you very much! :)

  • @salih394
    @salih394 4 ปีที่แล้ว +1

    Looking forward to XGBoost @StatQuest with Josh Starmer!

  • @yulinliu850
    @yulinliu850 5 ปีที่แล้ว +1

    Another great video! Thanks a lot, Josh!

  • @anderarias
    @anderarias 5 ปีที่แล้ว +1

    Awesome explanation. Thank you so much!

    • @statquest
      @statquest  5 ปีที่แล้ว

      You're welcome :)

  • @tobyto4614
    @tobyto4614 2 ปีที่แล้ว

    Great video! I also got a question regarding on the process of training process. For example, in the next iteration of training, the dataset would most likely consist of previously incorrectly predicted data (because they have higher weights to be drawn randomly). Would this make the model shifting to make better prediction on those data points, and become less accuracy in predicting those that were predicted correctly at the first place (in prior iterations)?

    • @statquest
      @statquest  2 ปีที่แล้ว

      Probably, but remember, the goal is to use the whole ensemble to make predictions, so it's a good thing to have some stumps perform well on some samples and other stumps perform well on other samples.

  • @kohei4828
    @kohei4828 5 ปีที่แล้ว +1

    hi, another great video! I had a hard time understanding boosting algorithm but this made it so much clearer! I have a little question though, can we think of a GBDT(LightGBM, XGboost etc.) as an Adaboost using decision trees as weak learners instead of stumps?
    Also, another video suggestion: Kernel PCA(or Kernel method in general)
    I know a lot of people have troubles understanding the concept of it including me...lol

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      I'm Glad to hear that the video helped you understand boosting. Gradient boosting is very similar to AdaBoost, and, technically, both AdaBoost and Gradient Boosting can use stumps, although stumps are more common for AdaBoost and larger trees are more common for Gradient Boosting. Another difference is that AdaBoost uses an exponential function to modify the Sample Weights (e^amount of say). In general terms, the exponential function is AdaBoot's "loss function". In contrast, GradientBoost can use any loss function (so, in a sense, AdaBoost is a subset of the stuff you can do with GradientBoosting). Does that make sense?

  • @sabalsubedi4114
    @sabalsubedi4114 4 ปีที่แล้ว +1

    Dude anything I try to learn related to machine learning or statistics, your video pops up at the top. Thanks a bunch for making all these fun videos! Using your video not only to understand stuff but also to explain it to other people!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Awesome, thank you!

  • @nkauvmeaern10121989
    @nkauvmeaern10121989 2 ปีที่แล้ว +1

    Very nicely explained. I have never seen such a good explanation. Love U ♥♥♥

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you! :)

  • @johndoex94
    @johndoex94 2 ปีที่แล้ว +1

    Your videos are consistenly great. An absolute godsend

    • @statquest
      @statquest  2 ปีที่แล้ว

      Wow, thank you!