AdaBoost, Clearly Explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ม.ค. 2025

ความคิดเห็น • 1.7K

  • @statquest
    @statquest  5 ปีที่แล้ว +181

    Correction:
    10:18. The Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25, not 0.42.
    NOTE 0: The StatQuest Study Guide is available: app.gumroad.com/statquest
    NOTE 2: Also note: In statistics, machine learning and most programming languages, the default log function is log base 'e', so that is the log that I'm using here. If you want to use a different log, like log base 10, that's fine, just be consistent.
    NOTE 3: A lot of people ask if, once an observation is omitted from a bootstrap dataset, is it lost for good? The answer is "no". You just lose it for one stump. After that it goes back in the pool and can be selected for any of the other stumps.
    NOTE: 4: A lot of people ask "Why is "Heart Disease =No" referred as "Incorrect""? This question is answered in the StatQuest on decision trees: th-cam.com/video/_L39rN6gz7Y/w-d-xo.html However, here's the short version: The leaves make classifications based on the majority of the samples that end up in them. So if most of the samples in a leaf did not have heart disease, all of the samples in the leaf are classified as not having heart disease, regardless of whether or not that is true. Thus, some of the classifications that a leaf makes are correct, and some are not correct.
    Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @parvezaiub
      @parvezaiub 5 ปีที่แล้ว +2

      Isn't it be 0 .1109?

    • @statquest
      @statquest  5 ปีที่แล้ว +18

      @@parvezaiub That's what you get when you use log base 10. However, in statistics, machine learning and most programming languages, the default log function is log base 'e'.

    • @thunderkat2911
      @thunderkat2911 4 ปีที่แล้ว +1

      you should pin this to the top

    • @sidisu
      @sidisu 4 ปีที่แล้ว +22

      Hi Josh - great videos, thank you! Question on your Note 3: How does omitted observations get "back into the pool"? Seems in the video around 16:16, the subsequent stumps are made based on performance of the previous stump (re-weighting observations from previous stump)... if that's the case, when do you put "lost observations" back into the pool? How would you update the weights if the "lost observations" was not used to assess the performance of the newest stump?

    • @lejyonerxx
      @lejyonerxx 4 ปีที่แล้ว +3

      First, thank you for those great videos. I have the same question that Tim asked. How does omitted observations get "back into the pool"?

  • @indrab3091
    @indrab3091 4 ปีที่แล้ว +165

    Einstein says "if you can't explain it simply you don't understand it well enough" and i found this AdaBoost explanation bloody simple. Thank you, Sir.

    • @statquest
      @statquest  4 ปีที่แล้ว +6

      Thank you! :)

  • @iftrejom
    @iftrejom 3 ปีที่แล้ว +135

    Josh, this is just awesome. The simple and yet effective ways you explain otherwise complicated Machine Learning topics is outstanding. You are a talented educator and such a bless for the entire ML / Data Science / Statistics learners all around the world.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Awesome, thank you!

  • @codeinair627
    @codeinair627 3 ปีที่แล้ว +68

    Everyday is a new stump in our life. We should give more weightage to our weakness and work on it. Eventually, we will become strong like Ada Boost. Thanks Josh!

    • @statquest
      @statquest  3 ปีที่แล้ว +17

      bam!

    • @enicay7562
      @enicay7562 4 หลายเดือนก่อน +1

      TRIPLE BAM!!!

  • @shaunchee
    @shaunchee 4 ปีที่แล้ว +27

    Man right here just clarified my 2-hour lecture in 20 mins. Thank you.

  • @dreamhopper
    @dreamhopper 4 ปีที่แล้ว +47

    Wow. I cannot emphasize on how much I'm learning from your series on machine learning. Thank you so much! :D

    • @statquest
      @statquest  4 ปีที่แล้ว +3

      Hooray! I'm glad the videos are helpful. :)

  • @catherineLC2094
    @catherineLC2094 4 ปีที่แล้ว +8

    Thank you for the study guides Josh! I did not know about them and I spend 5 HOURS making notes about your videos of decision trees and random forests. I think 3 USD value less than 5 hours of my time, I purchased the study guide for AdaBoost and cannot wait for the rest of them (specially neural networks!)

    • @statquest
      @statquest  4 ปีที่แล้ว

      Hooray!!! I'm so happy you like them. As soon as I finish my videos on Neural Networks, I'll start making more study guides.

  • @anishchhabra5313
    @anishchhabra5313 2 ปีที่แล้ว +7

    This video is just beyond excellent. Crystal clear explanation, no one could not have done it better. Thank you, Josh.

  • @kamranabbasi6757
    @kamranabbasi6757 3 ปีที่แล้ว +3

    Best video of Ada Boost on the TH-cam, watched it two times to understand it fully.
    It's such a beautiful explanation...

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @huyentrangnguyen8917
    @huyentrangnguyen8917 2 ปีที่แล้ว +2

    I am a beginner in ML and all of your videos help me a lot to understand these difficult things. I have nothing to say but thank you so so sooooooooo much.

  • @miesvanaar5468
    @miesvanaar5468 4 ปีที่แล้ว +7

    Dude... I really appreciate you make these videos and put so much effort in to making them clear. I am buying a t-shirt to do my small part in supporting this amazing channel,.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Hooray!! Thank you very much! :)

  • @anirudhgangadhar6158
    @anirudhgangadhar6158 10 หลายเดือนก่อน +1

    This is by far the best explanatory video on "AdaBoost" that I have come across.

    • @statquest
      @statquest  10 หลายเดือนก่อน

      Thanks!

  • @nicolasavendano9459
    @nicolasavendano9459 4 ปีที่แล้ว +8

    I can't believe how useful your channel has been these days man! I literally search up anything ML related in youtube and there's your great video explaining! The intro songs and BAMS make everything so much clearer dude, the only bad thing I could say about these videos is that they lack a conclusion song lol

  • @AayushRampal
    @AayushRampal 4 ปีที่แล้ว +19

    You are THE BEST, can't tell how much i've got to learn from statquest!!!

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      Awesome! Thank you so much! :)

  • @marcelocoip7275
    @marcelocoip7275 2 ปีที่แล้ว +3

    Hi Josh, I'm very grateful with your videos, they really complement my ML python programing studies. I really really (double really bam) apreciatte that you take the time to answer our questions. I know that you receive a lot of compliments about your explanations aproach (It's spectacular) but this "after-sales" service (answering alllll the coments) is even more valuable to me. I'm building myself as a DS, and sometines I fell "mentorless", your answers are some kind of kindly warm push towards my objetive. I will gratefully buy a Triple Bam Mug (It's very cool!) with my first salary. Cheers from Argentina!

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you very much!!! I'm glad you like my videos and good luck with your studies!

  • @cslewisster
    @cslewisster 3 ปีที่แล้ว +1

    I should have checked here instead of everywhere else. Josh sings a song and explains things so clearly. Love the channel. Thanks again!

  • @lisun7158
    @lisun7158 3 ปีที่แล้ว +8

    AdaBoost: Forest of Stumps
    1:30 stump: a tree just with 1 node and 2 leaves.
    3:30 AdaBoot: Forest of Stumps;
    Different stumps have different weight/say/voice;
    Each stump takes previous stumps' mistakes into account. (AdaBoot, short for Adaptive Boosting)
    6:40 7:00 Total Error: sum of (all sample weights (that associated with incorrectly classified samples))
    7:15 Total Error ∈ [0,1] since all sample weights of the train data are added to 1.
    (0 means perfect stump; 1 means horrible stump)
    --[class notes]

  • @emadrio
    @emadrio 5 ปีที่แล้ว +14

    Thank you for this. These videos are concise and easy to understand. Also, your humor is 10/10

  • @olegzarva708
    @olegzarva708 5 ปีที่แล้ว +3

    You're my hero, Josh! This is so much more understandable than twisted formulas.

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you! :)

  • @debabrotbhuyan4812
    @debabrotbhuyan4812 4 ปีที่แล้ว +2

    How come I missed this channel for so long? Absolutely brilliant.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you!

  • @prudvim3513
    @prudvim3513 5 ปีที่แล้ว +34

    I always love Josh's Videos. There is a minor calculation error while calculating amount of say for chest pain stump. (1-3/8)/(3/8) = 5/3, not 7/3

  • @EricRajaspera
    @EricRajaspera ปีที่แล้ว +1

    vraiment exceptionnelle!! le travail et l'effort pour vulgariser presque les concepts du machine learning et sans oublié les stats en général, tout simplement prodigieux. Un grand merci Josh!! chacun ses héros, moi j'en ai trouvé un!!! bonne continuation.

  • @rossburton1085
    @rossburton1085 6 ปีที่แล้ว +3

    I've just started a PhD in sepsis immunology and applied machine learning and this channel has been a god send.
    Josh, in the future would you have any interest in creating some videos about mixture models? Something I'm struggling to get my head around at the moment and I am struggling to find good learning resources for

    • @statquest
      @statquest  6 ปีที่แล้ว +1

      I'm definitely planning on doing videos on mixture models. I have to finish a few more Machine Learning videos, then I want to do a handful of basic stats videos and then I'll dive into mixture models.

    • @ucanhvan4557
      @ucanhvan4557 ปีที่แล้ว

      Hi Ross, I really hope that you get your Phd, I am also a new Phd student who trying to apply ML to my Mechanical research. Could you please guide me with some suggestions to begin?. Thank you so much!

  • @dwaipayansaha4443
    @dwaipayansaha4443 2 ปีที่แล้ว +1

    Hi Josh Starmer ,
    A huge BAM for this video.
    The best explanation I have ever seen for Adaboost.
    Keep helping people.

    • @statquest
      @statquest  2 ปีที่แล้ว

      Glad it was helpful!

  • @grumpymeercat
    @grumpymeercat 6 ปีที่แล้ว +11

    I love this format, you're great.
    RiTeh strojno mafija where you at?

  • @daesoolee1083
    @daesoolee1083 4 ปีที่แล้ว +1

    Wow, you explained the concept of bootstrapping so easily without even mentioning it! Impressive!

  • @pabloruiz577
    @pabloruiz577 6 ปีที่แล้ว +79

    AdaBoost -> Gradient Boosting -> XGBoost series will be awesome! First step AdaBoost clearly explained : )

    • @statquest
      @statquest  6 ปีที่แล้ว +41

      I'm just putting the finishing touches on Gradient Descent, which will come out in a week or so, then Gradient Boosting and XGBoost.

    • @pabloruiz577
      @pabloruiz577 6 ปีที่แล้ว +2

      That sounds great@@statquest! I guess you are the Machine Teaching

    • @Criptoverso_oficial
      @Criptoverso_oficial 6 ปีที่แล้ว +2

      @@statquest I'm waiting this as well!

    • @maleianoie7774
      @maleianoie7774 5 ปีที่แล้ว +2

      @@statquest when will you post Gradient Boosting and XGBoost?

    • @shivaprasad1277
      @shivaprasad1277 5 ปีที่แล้ว +2

      @@statquest waiting for Gradient Boosting and XGBoost

  • @RaviShankar-jm1qw
    @RaviShankar-jm1qw 4 ปีที่แล้ว +1

    I get impressed by each video of yours..and in free time recapitulated what you taught in the videos, sometimes. Awesome Josh!!!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Awesome! Thank you!

  • @haaritchavda4458
    @haaritchavda4458 5 หลายเดือนก่อน +3

    "Biiii dooo dooo bo di doo di boo diiiii doooo" This is exactly how my brain reacts when it sees mathematical calculations.

    • @statquest
      @statquest  5 หลายเดือนก่อน

      bam! :)

  • @birukabereambaw3425
    @birukabereambaw3425 4 ปีที่แล้ว +1

    Dude , you are brilliant brilliant brilliant , how did you come with this kind of teaching style , Clearly Explained !!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thanks a ton!

  • @prethasur7376
    @prethasur7376 3 ปีที่แล้ว +4

    Your tutorials are simply awesome Josh! You are a great help!

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Glad you like them!

  • @eddievuong
    @eddievuong 3 ปีที่แล้ว +1

    Your channel is the best one about Stats I found so far

    • @statquest
      @statquest  3 ปีที่แล้ว

      Wow, thanks!

  • @aleksey3231
    @aleksey3231 4 ปีที่แล้ว +333

    Please, Can anyone make 10 hours version 'dee doo dee doo boop'?

    • @statquest
      @statquest  4 ปีที่แล้ว +58

      You made me laugh out loud! :)

    • @50NTD
      @50NTD 4 ปีที่แล้ว +11

      sounds good, i want it too

    • @sketchbook8578
      @sketchbook8578 4 ปีที่แล้ว +5

      @@statquest I would seriously play it for my background music during work... Please make one lol.

    • @swaralipibose9731
      @swaralipibose9731 4 ปีที่แล้ว

      I also want some 'dee doo Dee doo boop '

    • @VLM234
      @VLM234 4 ปีที่แล้ว

      @StatQuest how to apply adaboost for regression?

  • @burakkaya6149
    @burakkaya6149 11 หลายเดือนก่อน +1

    Even somebody who doesn't know English, could understand machine learning with your videos

    • @statquest
      @statquest  11 หลายเดือนก่อน

      Thank you!

  • @jatintayal1488
    @jatintayal1488 5 ปีที่แล้ว +60

    That opening scared me..😅😅

    • @OttoFazzl
      @OttoFazzl 5 ปีที่แล้ว +4

      You were scared to learn that ML is not so complicated? BAMM!

    • @abhishek007123
      @abhishek007123 5 ปีที่แล้ว +1

      Lolo

  • @MayMay-dz4yb
    @MayMay-dz4yb 3 ปีที่แล้ว +1

    This is very enjoying and yet understandable to watch, best adaboost explanation, I've watch almost all the video here

    • @statquest
      @statquest  3 ปีที่แล้ว

      Wow, thanks!

  • @mashinov1
    @mashinov1 6 ปีที่แล้ว +30

    Josh, you're the best. Your explanations are easy to understand, plus your songs crack my girlfriend up.

    • @statquest
      @statquest  6 ปีที่แล้ว +2

      That's awesome!! :)

  • @sabalsubedi4114
    @sabalsubedi4114 4 ปีที่แล้ว +1

    Dude anything I try to learn related to machine learning or statistics, your video pops up at the top. Thanks a bunch for making all these fun videos! Using your video not only to understand stuff but also to explain it to other people!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Awesome, thank you!

  • @schneeekind
    @schneeekind 5 ปีที่แล้ว +247

    HAHA love your calculation sound :D :D :D

  • @ayenewyihune
    @ayenewyihune 3 ปีที่แล้ว +1

    I will recommend this channel for as many as I can

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you very much! :)

  • @dimitriskass1208
    @dimitriskass1208 4 ปีที่แล้ว +26

    The real question is: Is there a model which can predict the volume of "bam" sound ?

    • @statquest
      @statquest  4 ปีที่แล้ว +4

      Great Question! :)

    • @dimitriskass1208
      @dimitriskass1208 4 ปีที่แล้ว

      @@statquest 😆😆

    • @indrab3091
      @indrab3091 4 ปีที่แล้ว +3

      The Bam has total error 0, so the amount of say will freak out :)

  • @breakurbody
    @breakurbody 6 ปีที่แล้ว +2

    Thank you Statquest. Was eagerly waiting for Adaboost, Clearly Explained.

    • @statquest
      @statquest  6 ปีที่แล้ว

      Hooray!!! :)

  • @alexthunder3897
    @alexthunder3897 5 ปีที่แล้ว +90

    I wish math in real life happened as fast as 'dee doo dee doo boop' :D

  • @КатеринаПаршина-е8ф
    @КатеринаПаршина-е8ф 3 ปีที่แล้ว +1

    I am in love with this channel. I think the main reason is the Josh explanation style :D

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thanks! 😃

  • @roeiamos4491
    @roeiamos4491 3 ปีที่แล้ว +3

    The explanation is brilliant, thank so much for keeping things so simple

  • @mrcharm767
    @mrcharm767 2 ปีที่แล้ว +1

    u just cant imagine how great this way .. this could not be learnt better than this video

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Thanks again! :)

  • @bhupensinha3767
    @bhupensinha3767 5 ปีที่แล้ว +12

    Hi Josh, excellent video. But I am not able to understand how weighted gini index is calculated after j have adjusted the sample weights ... Can you PL help?

    • @haobinyuan3260
      @haobinyuan3260 4 ปีที่แล้ว

      I am confused as well :(

    • @jiayiwu4101
      @jiayiwu4101 4 ปีที่แล้ว

      It is same as Gini Impurity in Decision Tree video.

    • @DawFru
      @DawFru 2 ปีที่แล้ว +2

      Take the example of Chest Pain
      Gini index = 1 - (3/5)^2 - (2/5)^2 = 0.48 for the Yes category
      Gini index = 1 - (2/3)^2 - (1/3)^2 = 0.44 for the No category
      Since each category has a different number of samples, we have to take the weighted average in order to get the overall (weighted) Gini index.
      Yes category weight = (3 + 2) / (3 + 2 + 2 + 1) = 5/8
      No category weight = (2 + 1) / (3 + 2 + 2 + 1) = 3/8
      Total Weighted Gini index = 0.48 * (5/8) + 0.44 * (3/8) = 0.47

    • @priyanshgarg1292
      @priyanshgarg1292 2 หลายเดือนก่อน

      @@DawFru thanks buddy

  • @amalnasir9940
    @amalnasir9940 3 ปีที่แล้ว +1

    No wonder why AdaBoost takes looong time to run! Thank you for the nice explanation as always!

  • @jonasvilks2506
    @jonasvilks2506 6 ปีที่แล้ว +3

    Hello. There is a little error in arithmetics. But AdaBoost is clearly explained! Error on 10:18: Amount of Say for Chest Pain = (1/2)*log((1-(3/8))/(3/8)) = 1/2*log(5/8/3/8) = 1/2*log(5/3) = 0.25 but not 0.42.
    I also join others in asking to talk about Gradient Boosting next time.
    Thank you.

    • @statquest
      @statquest  6 ปีที่แล้ว +3

      Aaaaah. There's always one silly mistake. This was a copy/paste error. Oh well. Like you said, it's not a big deal and it doesn't interfere with the main ideas... but one day, I'll make a video without any silly errors. I can dream! And Gradient Boosting will be soon (in the next month or so).

    • @williamzheng5918
      @williamzheng5918 5 ปีที่แล้ว

      @@statquest Don't worry about small errors like these, your time is GOLD and shouldn't be consumed by these little mistakes, use it to create more 'BAM'! The audience will check the errors for you! All you need to do is to pin that comment when appropriate so that other people will notice.
      PS, how to PIN a comment (I paste it here to save your precious time ^_^) :
      - Sign in to TH-cam.
      - In the comments below a video, select the comment you want like to pin.
      - Click the menu icon > Pin. If you've already pinned a comment, this will replace it. ...
      - Click the blue button to confirm. On the pinned comment, you'll see a "Pinned by" icon.

  • @rahulsihara8946
    @rahulsihara8946 9 หลายเดือนก่อน +1

    such an amaizng explanation, intuitvely shows how Ada boost helps in making the model better than decision tree.

    • @statquest
      @statquest  9 หลายเดือนก่อน

      Thanks!

  • @gdinu0
    @gdinu0 5 ปีที่แล้ว +3

    such a complex concept you explained with ease.. Awesome video

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you! :)

  • @JT2751257
    @JT2751257 4 ปีที่แล้ว +3

    had to watch two times to fully grasp the concept.. Worth every minute :)

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Awesome! Thanks for giving the video a second chance. :)

  • @drsudar
    @drsudar 4 ปีที่แล้ว +3

    3:22 "Errors made by the 2nd stump influences the making of the 3rd stump"; it is not accurate to say that the errors made by "i_th" stump influence "i+1_th" stump. The errors made by the "1 to i" additive classifiers collectively influence the construction of the "i+1_th" stump. But, otherwise, this is a wonderful presentation.

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      You are correct - the mistakes are additive.

    • @gorgolyt
      @gorgolyt 4 ปีที่แล้ว

      @@statquest Please fix the video because that's the confusion I came here to rectify. That's a big mistake.

    • @gorgolyt
      @gorgolyt 4 ปีที่แล้ว

      Actually having read the original AdaBoost authors now, I don't think the training model is a sum of the previous models..?

  • @shashiranjanhere
    @shashiranjanhere 6 ปีที่แล้ว +1

    Looking forward to Gradient Boosting Model and implementation example. Somehow I find it difficult to understand it intuitively. Your way of explaining the things goes straight into my head without much ado.

    • @statquest
      @statquest  6 ปีที่แล้ว

      Awesome! Gradient Boosting should be available soon.

    • @atinsingh164
      @atinsingh164 6 ปีที่แล้ว +1

      Thanks, that will be very helpful!

    • @statquest
      @statquest  6 ปีที่แล้ว

      @@atinsingh164 I'm working on it right now.

  • @holloloh
    @holloloh 5 ปีที่แล้ว +4

    Could you elaborate on weighted gini function? Do you mean that for computing the probabilities we take weighted sums instead of just taking the ratio, or is it something else?

    • @mario1ua
      @mario1ua ปีที่แล้ว

      I understand he calculates Gini for every leaf, then multiplies by whatever number of predictions is in that leaf and divides by total number of predictions in both leafs (8) so this index is weighted by the size of that leaf. Then sums weighted indices from both leafs. At least I'm getting the same results when applying this formula.

  • @ahmedrejeb8575
    @ahmedrejeb8575 3 ปีที่แล้ว +1

    I'll make a special dedication to this man on the day of my graduation.

  • @jaivratsingh9966
    @jaivratsingh9966 5 ปีที่แล้ว +4

    Thanks, Josh for this great video! Just to highlight, at 10:21 your calculation should be 1/2 * log((1-3/8)/3/8)=1/2*log(5/3)
    How did you conclude that the first stump will be on weights? because of min total error or min total impurity among three features? It might happen that total error and impurity may not rank the same for all features, though they happen to be the same rank here.

    • @statquest
      @statquest  5 ปีที่แล้ว

      I've put a note about that error in the video's description. Unfortunately TH-cam will not let me edit videos once I post them. The stump was weighted using the formula given at 7:32

  • @kaicheng9766
    @kaicheng9766 2 ปีที่แล้ว +1

    Hi Josh, great video as always!
    Questions:
    1. Given there are 3 attributes, and the reiterative process for picking 1 out of the 3 attributes EACH TIME, I assume an attribute could be reused for more than 1 stump? and if so, when we do stop reiterating?
    2. Given the resampling is by random selections (based on the new weight of course), I would assume that means everytime we re-do AdaBoost we may get different forests of stumps?
    3. Where can we find more info on using Weighted Gini Index? Will they yield same model? or it can be very different?
    Thank you!

    • @statquest
      @statquest  2 ปีที่แล้ว

      1) The same attribute can be used as many times as needed. Keep in mind that, due to the bootstrapping procedure, each iteration gives us a different dataset to work with.
      2) Yes (so, consider setting the seed for the random number function first).
      3) I wish I could tell you. If I had found a good source on the weighted gini, I would have covered it. Unfortunately, I couldn't find one.

  • @jinqiaoli8985
    @jinqiaoli8985 5 ปีที่แล้ว +3

    Hi Josh,
    I love your videos so much! You are awesome!!
    A quick question on total error, how could a tree give a total error greater than 0.5? In such a case, I guess the tree will simply flip the label?
    Is this because of the weight? The total error is calculated on the original sample, not the resampled sample? If so, even though a tree correctly classifies a sample that previous trees cannot, its vote may be reversed. How could it improve the overall accuracy?
    Thank you!

    • @statquest
      @statquest  5 ปีที่แล้ว

      A tree can have a total error of up to 1 if it totally gets everything wrong. In that case, we would just swap its outputs, by giving it a large, but negative, "amount of say" and then it would get everything right! And while it's very hard to imagine that this is possible using a tree as a "weak learner", you have to remember that AdaBoost was originally designed to work with any "weak learner", not just short trees/stumps, so by allowing total error to go over 0.5 it is flexible to the results of any "weak learner".

    • @jinqiaoli8985
      @jinqiaoli8985 5 ปีที่แล้ว +1

      @@statquest Bam!!! Thanks for the quick reply. I think I got the point. Looking forward to episode 2 of XGBoost, Merry Christmas and Happy New Year! 😃😃

    • @statquest
      @statquest  5 ปีที่แล้ว

      @@jinqiaoli8985 I can't wait to release the next XGBoost video. I just have a few more slides to work on before it's ready.

  • @harshtripathi465
    @harshtripathi465 6 ปีที่แล้ว +2

    Hello Sir,
    I really love the simple ways in which you explain such difficult concepts. It would be really helpful to me and probably a lot of others if you could make a series on Deep Learning, i.e., neural networks, gradient descent etc.
    Thanks!

    • @statquest
      @statquest  6 ปีที่แล้ว

      Thank you so much! I'm working on Gradient Descent right now. I hope it is ready in the next week or two.

  • @aakashjain5035
    @aakashjain5035 5 ปีที่แล้ว +5

    Hi Josh you are doing great job. Can you please make a video on Xgboost. That will be very helpful

  • @wongkitlongmarcus9310
    @wongkitlongmarcus9310 3 ปีที่แล้ว +1

    the most underrated channel

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @harry5094
    @harry5094 5 ปีที่แล้ว +5

    Hi Josh,
    Love your videos from India,
    Can you please tell me how to calculate the amount of say in regression case and also the sample weights?
    Thanks

    • @saisuryakeshetty7589
      @saisuryakeshetty7589 ปีที่แล้ว

      Did you get your answer? If yes, could you please explain

  • @nashaeshire6534
    @nashaeshire6534 3 ปีที่แล้ว

    Thanks a lot,
    I didn't understand my UCSD lecture but, thanks to you (and your team mate), It's now super clear!
    I've bought AdaBoost and Classification tree study guide, great help!
    1/ Could you consider making a study guide about the 4 gradient boost videos?
    2/ Did you ever think writing a book with your team?
    Have a great day.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Thank you very much for supporting StatQuest!!! I'm writing a book right now. It should be out later this spring.

    • @nashaeshire6534
      @nashaeshire6534 3 ปีที่แล้ว +1

      @@statquest I'll buy it as fast as a bam!

  • @tejpunjraju9718
    @tejpunjraju9718 4 ปีที่แล้ว +8

    "Devmaanush" hai ye banda!
    Translation: This dude has been sent by God!

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you very much! :)

  • @thepresistence5935
    @thepresistence5935 3 ปีที่แล้ว +1

    Understood clearly, it took 2 hours to complete this, but now i know start to finish

  • @swadhindas5853
    @swadhindas5853 5 ปีที่แล้ว +6

    Ammount of say for chest pain how 7/3 i think it will be 5/3

  • @tudorpricop5434
    @tudorpricop5434 ปีที่แล้ว

    I have some questions.
    1. When do we stop calculating new stumps and their amount of say ? I mean how do we know when we have enough stumps to work with ?
    2. You mentioned in the comment below that "once an observation is omitted from a bootstrap dataset, is it not lost for good. You just lose it for one stump. After that it goes back in the pool"
    When do we come back to an earlier data set in order to make use of a "omitted observation" ? As far as I understood, we only work with the modified dataset which keeps getting smaller and smaller (in terms of unique items)

    • @statquest
      @statquest  ปีที่แล้ว

      1) You keep adding stumps until predictions no longer improve.
      2) After each stump is created, we return all of the omitted samples back to the dataset.

  • @sezaneyuan3111
    @sezaneyuan3111 4 ปีที่แล้ว +4

    Love the opening music, make me laugh at machine learning course. What an odd!

  • @وذكرفإنالذكرىتنفعالمؤمنين-ق7ز

    Thanks, Josh, your explanation is amazing. Greetings from Egypt

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @ccuuttww
    @ccuuttww 5 ปีที่แล้ว +25

    10:15 Warning wrong calculation alert
    it is 5/8 not 7/8 since it is 1-3/8
    and your remaining part fxxk up!

    • @salmenkveld5717
      @salmenkveld5717 4 ปีที่แล้ว

      There is also another error in the formula. The formula should be with ln instead of log!

  • @beshosamir8978
    @beshosamir8978 2 ปีที่แล้ว

    I swear that is the greatest channel about machine learning and statistics , Great job josh
    I just have a quick question :
    what if we have stump that both children say (left children get 2 yes 0 no ) and (right children get 2 yes and 1 no) and that is the best we could come with so what should i do ? i saw a different video and i found that he classify left children as yes and right children with no and he say the first stump make 2 errors but how !!!!!!!!!!!!!!!!!!
    we say that the node vote for a majority so it should be both say yes ,and the right child get 1 error so first stump made one error , right ?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      I believe they should both vote yes.

  • @devarshigoswami9149
    @devarshigoswami9149 4 ปีที่แล้ว +3

    It'd be really refreshing to hear an actual model make dee doo dee doo boop' sounds while training.

  • @sanjanakatala4311
    @sanjanakatala4311 10 หลายเดือนก่อน +1

    thank you so much, it was very helpful and easy to undersatnd, much better than my college professor and big blogs on the same available online, god bless you, if i see people like you, i feel that social media is in safe and wise hands who uses it wisely and trust me, my professors should take classes from you on how to make teaching simple,effective and interesting😇😭🥺😎🤓

    • @statquest
      @statquest  10 หลายเดือนก่อน +1

      Glad it was helpful!

  • @UsmanKhan-lp2mg
    @UsmanKhan-lp2mg 5 ปีที่แล้ว +5

    Hey Josh, I'm back to study Machine Learning for Final Exams 😂😂😂

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      Good luck and let me know how they go! :)

  • @madaramarasinghe829
    @madaramarasinghe829 3 หลายเดือนก่อน

    Thanks for all these wonderful videos. They make ML easy for many new learners!
    I just have few quick questions:
    1. No of stumps depend on the no of features in the dataset, right? If there are only 5 features, only 5 stumps can be created, right (or less than that if we don't plan to use all stumps)
    2. As "Amount of Say = 1/2*log[(1-Total error)/Total error]", we get positive or negative values for "Amount of Say" depending upon "Total Error". So when we try to calculate "New sample weight" do we need to change the sign (+/-) of the exponent explicitly? Doesn't it appear in the "Amount of Say" correctly?
    3. Can you kindly show how a new test sample get evaluated in Ada boost?
    Appreciate your thoughts on these. Thanks

    • @statquest
      @statquest  3 หลายเดือนก่อน

      1. Each time we create a stump, we test all of the features (so, if we have 5 features, we'll test 5 candidate stumps), but we can create as many stumps as we would like.
      2. What time point in the video, minutes and seconds, are you asking about?
      3. This is illustrated at the end of the video.

    • @madaramarasinghe829
      @madaramarasinghe829 3 หลายเดือนก่อน

      @@statquest
      Thanks for the quick resopnses.
      Re 2: At 12:43 you talk about e^(amount of say) and at 13:13 you talk about e^(- amount of say)

    • @statquest
      @statquest  3 หลายเดือนก่อน

      @@madaramarasinghe829 If the sample was correctly classified we use the negative version to decrease its weight. If the sample was incorrectly classified we use the positive version to increase its weight.

  • @vedprakash-bw2ms
    @vedprakash-bw2ms 6 ปีที่แล้ว +3

    I love your Songs..
    Please make a video on XGBoost .

    • @statquest
      @statquest  6 ปีที่แล้ว

      Thanks! I'm working on Gradient Descent and then Gradient Boost. Those should be out soon.

  • @fatemeh2222
    @fatemeh2222 7 หลายเดือนก่อน +1

    Thank you man. Appreciate such thorough but concise explanation.

    • @statquest
      @statquest  7 หลายเดือนก่อน +1

      Glad it was helpful!

  • @a_sn_hh7027
    @a_sn_hh7027 6 ปีที่แล้ว +3

    tripple bam

  • @quant-trader-010
    @quant-trader-010 3 ปีที่แล้ว +2

    Man, you are too good at explaining things!

  • @lorenaferreiramarani126
    @lorenaferreiramarani126 2 ปีที่แล้ว +1

    OMG! I'm completely delighted with your with your didactics! Awesome! Your students are very privileged! Gosh! Regards from Brazil fan :D

  • @antoniogiuseppefaietalasar6849
    @antoniogiuseppefaietalasar6849 ปีที่แล้ว +1

    Josh and Kahn, the greatest teachers of all time. They should have their own University.

  • @RameshKumar-yk4kl
    @RameshKumar-yk4kl 2 ปีที่แล้ว +1

    Thank you so much Josh... u filled josh in me ..(josh means happy in hindi) love from Hyderabad.INDIA....❤

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you very much! :)

  • @karannchew2534
    @karannchew2534 3 ปีที่แล้ว

    For my future reference.
    11:36:
    If prediction for a sample was wrong, then increase its weight for future correction.
    If Amount Of Say is high (the tree is good), increase the weight more.
    Wrong - > Increase weight.
    Better Tree -> more amount of say -> adjust weight more

  • @DED_Search
    @DED_Search 4 ปีที่แล้ว +1

    May I ask a couple of questions please? Hopefully, you can help shed some light.
    1. What is the threshold for a prediction flip? Do we flip as long as total error rate is larger than 0.5, i.e., amount of say is negative? Or we only flip if total error rate is larger than 0.6 or 0.7? Or it is a hyper-parameter to be tuned?
    2. Can we say that weighted Gini index and sample weights update achieve the same goal but through different mechanisms? Recall that in order to mitigate class imbalance, we could either apply class weights or sample weights. I see the resemblance here.
    Thank you!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Presumably you flip if the error rate is larger than 0.5 because flipping will improve predictions.

    • @DED_Search
      @DED_Search 4 ปีที่แล้ว +1

      @@statquest thank you. ❤️

    • @jiayiwu4101
      @jiayiwu4101 4 ปีที่แล้ว

      @@statquest So the error rate should not be larger than 0.5. Is this statement correct?

    • @statquest
      @statquest  4 ปีที่แล้ว

      @@jiayiwu4101 Presumably. In theory, AdaBoost is intended to be used with _any_ weak learner, and maybe there is some weak learner that is really, really bad. However, AdaBoost is almost always used with classification tree stumps. And with stumps, I don't think it is possible to have error > 0.5.

  • @ericcartman106
    @ericcartman106 2 ปีที่แล้ว +1

    This guy's voice is so calming.

  • @balajisubramanian6565
    @balajisubramanian6565 6 ปีที่แล้ว +1

    Thank you Josh for all of your great videos. You are a good Samaritan!

    • @statquest
      @statquest  6 ปีที่แล้ว

      Awesome! Thank you! :)

  • @nikunj2554
    @nikunj2554 4 ปีที่แล้ว

    Hello Josh. Thanks for an awesome video on Adaboost. Can you make a video of the difference between Natural log and common log, when should we use each of them and the properties and real life applications? I had hard time figuring out why the amount of say was 0.97 at 9:22 in the video. I was calculating it using the log key on my calculator whereas in reality it was the natural log. If I take 0.5*log(7) then my amount of say would be 0.422 whereas in reality you have used 0.5*ln(7) which gives 0.97 as the amount of say. I believe not many people would come to know that the log is natural and that I think it will be great if you use ln for natural log henceforth to avoid confusion. That being said, I found all your videos super exciting and thanks for sharing the knowledge.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thanks for the note. In all of my recent videos I explain what I mean when I write "log()".

  • @lylashi1875
    @lylashi1875 2 ปีที่แล้ว +1

    I have a question about building the forest of stumps (video time 5:57) - let's say for the chest pain, if in both leaves there is more heart disease than no heart disease, how should we decide the output of the leaves? Should we decide "yes heart disease" as correct in both of the leaves? or we randomly decide "no heart disease" as correct in one of the leaves?

    • @statquest
      @statquest  2 ปีที่แล้ว

      The output from the leaves is always the classification that gets the most votes. So this stump would classify everything the same way.

  • @MrMetralloCity
    @MrMetralloCity 5 ปีที่แล้ว +1

    i love your intros and outros, are simply awesome!!! i really enjoy learning with your channel!!!!

  • @nkauvmeaern10121989
    @nkauvmeaern10121989 3 ปีที่แล้ว +1

    Very nicely explained. I have never seen such a good explanation. Love U ♥♥♥

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @LexPodgorny
    @LexPodgorny ปีที่แล้ว

    Pretty good explanations, given you survive the deadly level of condescension. I almost switched to some indian dude explaining the same thing, but he exceeded my max limit of "basically"s, and I switched back just to get the grasp of the subject. While at it, I realised what this statquest tone reminds me of. The Fun with Flags Show of Sheldon Cooper. But when he goes "Question!", it is definitely Dwight Schrute from The Office. The two of my favorite sitcoms characters!

    • @statquest
      @statquest  ปีที่แล้ว

      I think I have "resting condescension" voice. I swear, it's not something I'm trying to do to make people annoyed - it's just my normal voice.

  • @TheOraware
    @TheOraware 5 ปีที่แล้ว +1

    Hi Josh , very nice video i did not find such a nice explanation. One feedback in calculation at 10:18 , the numerator should be 5/8 not 7/8

    • @statquest
      @statquest  5 ปีที่แล้ว

      You are correct! However, this typo has been spotted before and is mentioned towards the top of the video's Description. Unfortunately, youtube makes it close to impossible to edit videos once they are posted, so for now, the best I can do is leave the note in the description.

  • @deepaksurya2078
    @deepaksurya2078 หลายเดือนก่อน

    hi smal mistake: 1- 3/8 should 5/8 at 10:31 rest all good, love your videos!

    • @statquest
      @statquest  หลายเดือนก่อน

      Thanks! That error is noted in a pinned comment and in the video's description.

  • @bayurukmanajati1224
    @bayurukmanajati1224 5 ปีที่แล้ว +1

    This is very informative for me. I skipped the Decision Tree but... but... I can understand it! Love your vids!

  • @AakarshNair
    @AakarshNair 2 ปีที่แล้ว +1

    Please never stop making videos

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you! :)

  • @MariamEljamil
    @MariamEljamil 6 ปีที่แล้ว +1

    The best! The simplest! The most informative!

    • @statquest
      @statquest  6 ปีที่แล้ว

      Thank you! :)

  • @thunderkat2911
    @thunderkat2911 4 ปีที่แล้ว +1

    Your channel has saved me a looooooot of time. Thanks!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Glad to hear it!

  • @PaulXiaofangLan
    @PaulXiaofangLan 4 ปีที่แล้ว

    Dear Josh,
    I have 2 more questions.
    For Random Forest, we use bagging to generate many trees. But you haven't introduced this idea for AdaBoost.
    1. Does that mean, the `amount of say` of all trumps is determined by the whole data set (corpus)?
    2. How many AdaBoost stumps we should generate? What's the rule to determine the number?
    3. Could a stump share the same features with other stumps?
    And, as you mentioned, the creation of the first stump will effect the second stump. My second question is:
    1. How to determine which feature we should use for the first stump? By entropy/Gini comparison between the features?
    2. If it is, is the idea of the first stump creation applied to the second stump, thus third, fourth, etc.
    Thanks.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      1) The amount of say is determined using the current dataset, which can change for each stump, as explained at 15:32 using the sample weights.
      2) You should generate enough so that classification stabilizes. You can draw some metric you are interested in (for example, "accuracy") on the y-axis and the number of stumps on the x-axis. Create stumps until you see that predictions are no longer improving.
      3) You can use the same feature over and and over.
      1) You can use whatever metric you want to you. I believe GINI is frequently the default setting.
      2) All stumps are made the same way, however the dataset can change for each stump due to changes in sample weights as explained at 15:32.

    • @PaulXiaofangLan
      @PaulXiaofangLan 4 ปีที่แล้ว +1

      @@statquest Thank you very much, Josh. Now I understand the text description at 15:32.

  • @xizhilow6815
    @xizhilow6815 4 ปีที่แล้ว +2

    thank you for making these videos, they are really helping me with my ML class!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Hooray! And good luck with your class.