XGBoost Part 1 (of 4): Regression

StatQuest with Josh Starmer

มุมมอง 670 974

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 18 พ.ย. 2024

ความคิดเห็น •

@statquest 4 ปีที่แล้ว ⁺⁶³
Corrections:
16:50 I say "66", but I meant to say "62.48". However, either way, the conclusion is the same.
22:03 In the original XGBoost documents they use the epsilon symbol to refer to the learning rate, but in the actual implementation, this is controlled via the "eta" parameter. So, I guess to be consistent with the original documentation, I made the same mistake! :)
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@blacklistnr1 4 ปีที่แล้ว ⁺³
Terminology alert!! "eta" refers to the greek letter Η(upper case)/η(lower case), it is one of the greek's many "ee" sounds(as in wheeeeee), it's definitely not epsilon.
@MrPopikeyshen 3 ปีที่แล้ว ⁺⁴
like just for this sound 'bip-bip-pilulipup'
@servaastilkin7733 ปีที่แล้ว
@@blacklistnr1 I came here to say the same thing.
Maybe this helps:
èta - η sounds somewhat like the vowels in "air"
epsilon - ε sounds somewhat like the vowel in "get"
@pulkitkapoor4091 3 ปีที่แล้ว ⁺²²²
I got my first job in Data Science because of the content you prepare and share.
Can't thank you enough Josh. God bless :)
@statquest 3 ปีที่แล้ว ⁺⁵⁵
That is awesome! Congratulations! TRIPLE BAM! :)
@SaurabhMishra-tt5qt 2 ปีที่แล้ว ⁺¹
which company bro?
@sendhana-46 ปีที่แล้ว
kya company bhai?
@ImGeneralJAckson 8 หลายเดือนก่อน
Same :-)
@giannislazaridis6788 4 ปีที่แล้ว ⁺⁵²
I'm starting writing my Master Thesis and there were still some things I needed to make clear before using XGBoost for my classification problem. God Bless You
@statquest 4 ปีที่แล้ว ⁺³
Thank you! :)
@nikilisacrow2339 3 ปีที่แล้ว ⁺⁵³
Can I just say I LOVE STATQUEST! Josh does the intuition of a complex algorithm and the math of it so well and then to make it into an engaging video that is so easy to watch is just amazing! I just LOVE this channel. You you boosted the gradient of my learning on machine learning in an extreme way. Really appreciate these videos
@statquest 3 ปีที่แล้ว ⁺⁵
Wow! Thank you very much!!! I'm so glad you like the videos. :)
@Hardson 4 ปีที่แล้ว ⁺³⁹⁵
That's why I pay my Internet.
@statquest 4 ปีที่แล้ว ⁺¹⁶
Thanks! :)
@johnhutton5491 4 หลายเดือนก่อน ⁺⁴
This dude puts the STAR in starmer. You are an international treasure.
@statquest 4 หลายเดือนก่อน ⁺²
Thank you! :)
@nitinvijayy 2 ปีที่แล้ว ⁺²
Best Channel for anyone Working in the Domain of Data Science and Machine Learning.
@statquest 2 ปีที่แล้ว
Thanks!
@guoshenli4193 4 ปีที่แล้ว ⁺²
I am a graduate student at Duke, since some of the materials are not covered in the class, I always watch your videos to boost my knowledge. Your videos help me a lot in learning the concepts of these tree models!! Great thanks to you!!!!! You make a lot of great videos and contribute a lot in online learning!!!!
@statquest 4 ปีที่แล้ว
Thank you very much and good luck with your studies! :)
@DonDon-gs4nm 4 ปีที่แล้ว ⁺⁷
After watching your video, I understood the concept of 'understanding'.
@ChingFungChan-b4l 3 หลายเดือนก่อน ⁺¹
Hi Josh,
I just bought your illustrated guide in PDF. This is the first time I've supported someone on social media. Your video helped me a lot with my learning. Can't express how grateful I'm with these learning materials. You broke down monster maths concepts and equation to baby monster that I can easily digest. I hope by making this purchase, you get the most contribution out of my support.
Thank you!
@statquest 3 หลายเดือนก่อน
Thank you very much for supporting StatQuest! It means a lot to me that you care enough to contribute. BAM! :)
@PauloBuchsbaum 4 ปีที่แล้ว ⁺²
An incredible job of clear, concise and non-pedantic explanation. Absolutely brilliant!
@statquest 4 ปีที่แล้ว
Thank you very much!
@pavankumar6992 4 ปีที่แล้ว ⁺³
Fantastic explanation for XGBoost. Josh Starmer, you are the best. Looking forward to your Neural Network tutorials.
@statquest 4 ปีที่แล้ว ⁺²
Thanks! I hope to get to Neural Networks as soon as I finish this series on XGBoost (which will have at least 3 more videos).
@kamalamarepalli1165 7 หลายเดือนก่อน ⁺¹
I have never seen an data science video like this....good informative, very clear, super explanation of math and wonderful animation and energetic voice....Learning many things very easily....thank you so much!!
@statquest 7 หลายเดือนก่อน
Thank you very much!
@andreitolkachev8295 3 ปีที่แล้ว ⁺²
I wanted to watch this video last week, but you sent me on a magical journey through adaboost, logistic regression, logs, trees, forests, gradient boosting.... Good to be back
@statquest 3 ปีที่แล้ว
Glad you finally made it back!
@pranavjain9799 2 ปีที่แล้ว
same haha
@jaikishank 4 ปีที่แล้ว ⁺²
Thanks Josh for your explanation. XGBoost explanation cannot be made simpler and illustrative than this. I love your videos.
@statquest 4 ปีที่แล้ว ⁺¹
Thank you very much! :)
@tusharsub1000 3 ปีที่แล้ว ⁺¹
I had left all hope of learning machine learning owing to its complexity. But because of you I am still giving it a shot..and so far I am enjoying...
@statquest 3 ปีที่แล้ว
Hooray!
@RidWalker ปีที่แล้ว
I've never I had so much fun learning something new! Not since I stared at my living room wall for 20min and realized it wasn't pearl, but eggshell white! Thanks for this!
@statquest ปีที่แล้ว ⁺²
Glad you got the wall color sorted out! Bam! :)
@mainhashimh5017 2 ปีที่แล้ว ⁺³
Man, the quality and passion put into this. As well as the sound effects! I'm laughing as much as I'm learning. DAAANG.
You're the f'ing best!
@statquest 2 ปีที่แล้ว ⁺¹
Thank you very much! :)
@breopardo6691 3 ปีที่แล้ว ⁺⁵
In my heart, there is a place for you! Thank you Josh!
@statquest 3 ปีที่แล้ว
Thanks!
@moidhassan5552 4 ปีที่แล้ว ⁺⁹
Wow, I am really interested in Bioinformatics and was learning Machine Learning techniques to apply to my problems and out of curiosity, I checked your LinkedIn profile and turns out you are a Bioinformatician too. Cheers
@statquest 4 ปีที่แล้ว ⁺³
Bam! :)
@gawdman 4 ปีที่แล้ว ⁺⁵
Hey Josh! This is fantastic. As an aspiring data scientist with a couple of job interviews coming up, this really helped!
@statquest 4 ปีที่แล้ว
Awesome!!! Good luck with your interviews and let me know how they go. :)
@shhdeshp 10 หลายเดือนก่อน ⁺¹
I just LOVE your channel! Such a joy to learn some complex concepts. Also, I've been trying to find videos that explain XGBoost under the hood in detail and this is the best explanation I've come across. Thank you so much for the videos and also boosting them with an X factor of fun!
@statquest 10 หลายเดือนก่อน
Awesome, thank you!
@glowish1993 4 ปีที่แล้ว ⁺⁴
You make learning math and machine learning interesting and allow viewers to understand the essential points behind complicated algorithms, thank you for this amazing channel :)
@statquest 4 ปีที่แล้ว
Thank you! :)
@jackytsui422 4 ปีที่แล้ว ⁺¹
I am learning machine learning from scratch and your videos helped me a lot. Thank you very much!!!!!!!!!!!
@statquest 4 ปีที่แล้ว
Good luck! :)
@SaraSilva-zu7wn 3 ปีที่แล้ว ⁺¹
Clear explanations, little songs and a bit of silliness. Please keep them all, they're your trademark. :-)
@statquest 3 ปีที่แล้ว
Thank you! BAM! :)
@hellochii1675 4 ปีที่แล้ว ⁺¹⁹
xgboosting！This must be my Christmas 🎁 ~~ Happy holidays ~
@statquest 4 ปีที่แล้ว ⁺⁵
Yes, this is sort of an early christmas present. :)
@Azureandfabricmastery 4 ปีที่แล้ว ⁺²
Thank you! Super easy to understand one of the important ml algorithm XGBoost. Visual illustrations are the best part!
@statquest 4 ปีที่แล้ว
Thank you very much! :)
@prasanshasatpathy6664 2 ปีที่แล้ว ⁺⁴
Nowadays I write a "bam note" for important notes for algorithms.
@statquest 2 ปีที่แล้ว
That's awesome! :)
@Jagentic 2 หลายเดือนก่อน ⁺²
it was at some point that I realized that AIML for data science (which I’m currently amid)is really just the ultimate expression of statistics using machine learning to produce mind-boggling scale - and that calc and trig, linear alg and python, some computer science .. are all just tools in the box of the statistician,, which makes the data science. but just like someone with a toolbox full of hammers and saws…. One needs to know how, when and why to use them to build a house fine house. Holy Cow ¡BAM!
@statquest 2 หลายเดือนก่อน
BAM! :)
@kennywang9929 4 ปีที่แล้ว ⁺²
Man, you do deserve all the thanks from the comments! Waiting for part2! Happy new year!
@statquest 4 ปีที่แล้ว ⁺¹
Thanks!!! I just recorded Part 2 yesterday, so it should be out soon.
@hanyang4321 4 ปีที่แล้ว ⁺²
I watched all of the videos in your channel and they're extremely awesome! Now I have much deeper understanding in many algorithms. Thanks for your excellent work and I'm looking forward to more lovely videos and your sweet songs!
@statquest 4 ปีที่แล้ว
Thank you very much! :)
@anupriy 2 ปีที่แล้ว ⁺²
Thanks for making such great videos, sir! You indeed get each concepts CLEARLY EXPLAINED.
@statquest 2 ปีที่แล้ว ⁺¹
Thank you! :)
@jjlian1670 4 ปีที่แล้ว ⁺⁴
I have been waiting for your video for XGBoost, hope for LightGBM next!
@oldguydoesntmatter2872 4 ปีที่แล้ว ⁺²
I've been using Random Forests with various boosting techniques for a few years. My regression (not classification) database has 500,000 - 5,000,000 data points with 50-150 variables, many of them highly correlated with some of the others. I like to "brag" that I can overfit anything. That, of course, is a problem, but I've found a tweak that is simple and fast that I haven't seen elsewhere.
The basic idea is that when selecting a split point, pick a small number of data vectors randomly from the training set. Pick the variable(s) to split on randomly. (Variables plural because I usually split on 2-4 variables into 2^^n boosting regions - another useful tweak.) The thresholds are whatever the data values are for the selected vectors. Find the vector with the best "gain" and split with that. I typically use 5 - 100 tries per split and a learning rate of .5 or so. It's fast and mitigates the overfitting problem.
Just thought someone might be interested...
@zhonghengzhang603 4 ปีที่แล้ว
Sounds awesome, would you like share the code?
@antoniojunior-dados 2 ปีที่แล้ว ⁺¹
You are the best, Josh. Greetings from Brazil! We are looking forward you video explaining clearly the LightGBM!
@statquest 2 ปีที่แล้ว ⁺¹
I hope do have that video soon.
@machi992 4 ปีที่แล้ว ⁺¹
I actually started looking for XGBoost, but every video assumes I know something. I have ended up watching more than 8 videos just to have no problems understanding and fulfilling the requirements, and find them awesome.
@statquest 4 ปีที่แล้ว ⁺¹
Bam! Congratulations!
@liuxu7879 2 ปีที่แล้ว ⁺¹
Hey Josh, I really love your contents, you are the one who really explains the model details.
@statquest 2 ปีที่แล้ว
WOW! Thank you so much for supporting StatQuest!
@karannchew2534 3 ปีที่แล้ว ⁺¹
For my future reference.
1) Initiate with a predicted value e.g. 0.5.
2) Get residual. Each sample vs. initial predicted value.
3) Build a mini tree, using the Residuals value of each sample.
.Residuals
.Different values of feature as cut off point at branches. Each value give a set of Similarity and Gain scores
..Similarity (use lambda here, the regularisation parameter) - measure how close the residual values to each other
..Gain (affected by lamda)
.Pick the feature value that give highest gain - this determines how to split the data - which create the branch (and leaves) - which produce a mini tree.
4) Prune tree. Using gain threshold (aka complexity parameter), gamma.
If gain>gamma, keep branch, else prune
5) Get Output Value OV for each leaf. Mini tree done.
OV = sum of Residuals / (no. of Residuals + lambda)
6) Predict value for each sample using the newly created mini tree.
Run each sample data through the mini tree.
New Predicted value = last predicted value + eta * OV
7) Get new set of residual: New predicted value vs actual value of each sample.
8) Re do from step 3. Create more mini trees...
.Each tree 'boosts' the prediction - improving the result.
.Each tree creates new residual as input to creating the next new tree.
...until no more improvement or no. of tree is reached.
@statquest 3 ปีที่แล้ว
Noted
@carlpiaf4476 ปีที่แล้ว
Could be improved by adding how the decision cut off point is made.
@anggipermanaharianja6122 3 ปีที่แล้ว ⁺¹
Awesome... this vid should be a mandatory in any schools
@statquest 3 ปีที่แล้ว
bam! :)
@modandtheganggaming3617 4 ปีที่แล้ว ⁺⁵
Thank you! I'd been waited for XGBoost explained for so long
@statquest 4 ปีที่แล้ว ⁺²
I'm recording part 2 today (or tomorrow) and it will be available for early access on Monday (and for everyone a week from monday).
@mangli4669 4 ปีที่แล้ว ⁺³
Hey Josh, first I wanted to say thank you for your awesome content. You are the number one reason I am graduating my degree haha! I would love a behind the scenes video about how you make your videos. How you prepare for topic, how you make your animations and your fancy graphs! And some more singing ofcourse!
@statquest 4 ปีที่แล้ว
That would be awesome. Maybe I'll do something like this in 2020. :)
@gorilaz0n 2 ปีที่แล้ว ⁺²
Gosh! I love your fellow-kids vibe!
@statquest 2 ปีที่แล้ว
Thanks!
@smarttradzt4933 3 ปีที่แล้ว ⁺¹
whenever i can't understand anything, I always think of statquest...BAM!
@statquest 3 ปีที่แล้ว ⁺¹
bam!
@guillemperdigooliveras5351 4 ปีที่แล้ว ⁺²⁴
As always, loved it! I can now wear my Double Bam t-shirt even more proudly :-)
@statquest 4 ปีที่แล้ว ⁺¹
Awesome!!!!!! :)
@anggipermanaharianja6122 3 ปีที่แล้ว ⁺¹
why not wearing the Triple Bam?
@guillemperdigooliveras5351 3 ปีที่แล้ว ⁺¹
@@anggipermanaharianja6122 for a second you gave me hopes about new Statquest t-shirts being available with a Triple Bam drawing!
@nilanjana1588 ปีที่แล้ว ⁺¹
You make it little bit easy to understand Josh . I am saved.
@statquest ปีที่แล้ว
Thanks!
@vladimirmihajlovic1504 7 หลายเดือนก่อน ⁺¹
Love StatQuest. Please cover lightGBM and CatBoost!
@statquest 7 หลายเดือนก่อน
I've got catboost, you can find it here: statquest.org/video-index/
@palvinderbhatia3941 ปีที่แล้ว ⁺¹
Wow woww wowww !! How can you explain such complex concepts so easily. I wish I can learn this art from you. Big Fan!! 🙌🙌
@statquest ปีที่แล้ว
Thank you so much 😀
@王涛-d3y 4 ปีที่แล้ว ⁺¹
Awesome video!!! It's the best tutorial I have ever seen about XGBoost. Thank you very much!
@statquest 4 ปีที่แล้ว
Thank you! :)
@andrewnguyen5881 4 ปีที่แล้ว
Thank you for all of your videos! Super helpful and educational. I did have some questions for follow-up:
- With Gamma being so important in the pruning process, how do you select gamma? I ask because aren't there situations where you could select a Gamma that would/wouldn't prune ALL branches, which would defeat the purpose of pruning right?
- Is lambda a parameter that:
a. Have to test multiple and tune your model to find the most suitable lambda (ie set your model to use one lambda)
b. You test multiple lambdas per tree so different trees will have different lambdas
@statquest 4 ปีที่แล้ว
If you want to know all about using XGBoost in practice, see: th-cam.com/video/GrJP9FLV3FE/w-d-xo.html
@andrewnguyen5881 4 ปีที่แล้ว ⁺¹
@@statquest Great! I was saving that video until i finished the other XGBoost videos
@andrewnguyen5881 4 ปีที่แล้ว
@@statquest Will this video also cover Cover from the Classification video?
@statquest 4 ปีที่แล้ว
Not directly, since I simply limited the size of the trees rather than worry too much about the minimum number of observations per leaf.
@shubhambhatia4968 4 ปีที่แล้ว ⁺¹
woah woah woah woah!... now i got the clear meaning of understanding after coming to your channel...as always i loved the xgboost series as well. thank you brother.;)
@statquest 4 ปีที่แล้ว
Thank you very much! :)
@nickbohl2555 4 ปีที่แล้ว ⁺¹
I have been super excited for this quest! Thanks as always Josh
@statquest 4 ปีที่แล้ว
Hooray!!!!
@DrJohnnyStalker 4 ปีที่แล้ว ⁺¹
Best XGBoost explanation i have ever seen! This is Andrew Ng Level!
@statquest 4 ปีที่แล้ว ⁺¹
Thank you very much! I just released part 4 in this series, so make sure you check them all out. :)
@DrJohnnyStalker 4 ปีที่แล้ว ⁺¹
@@statquest
I have binge watched them all. All are great and by far the best intuative explanation videos on XGBoost.
A series on lightgbm and catboost would complete the pack of gradient boosting algorithms. Thx for this great channel.
@statquest 4 ปีที่แล้ว
@@DrJohnnyStalker Thanks! :)
@shivasaib9023 4 ปีที่แล้ว ⁺²
I fell in love with XGBOOST. While Pruning every node I was like whatttt :p
@statquest 4 ปีที่แล้ว ⁺³
:)
@monkeydrushi 2 ปีที่แล้ว ⁺¹
God, thank you for your "beep boop" sounds. They just made my day!
@statquest 2 ปีที่แล้ว ⁺¹
Hooray! :)
@vijayyarabolu9067 4 ปีที่แล้ว ⁺²
8:45 checking my headphones - BAM; no problem with my headphones; 10:17 Double BAM; headphones are perfect
@statquest 4 ปีที่แล้ว
:)
@urvishfree0314 3 ปีที่แล้ว ⁺¹
thankyou so much i watched it 3-4 times already but finally everything makes sense. thankyou so much
@statquest 3 ปีที่แล้ว
Hooray!
@lxk19901 4 ปีที่แล้ว ⁺³
This is really helpful, thanks for putting them together!
@statquest 4 ปีที่แล้ว
Thank you! :)
@tobiasksr23 3 ปีที่แล้ว ⁺¹
I justo found this channel and i think it's amazing.
@statquest 3 ปีที่แล้ว
Glad to hear it!
@eytansuchard8640 ปีที่แล้ว
Thank you for this explanation. In python there is another regularization parameter, Alpha. Also, to the best of my knowledge the role of Eta is to reduce the error correction by subsequent trees in order to avoid sum explosion and in order to control the residual error correction by each tree.
@statquest ปีที่แล้ว
I believe that alpha controls the depth of the tree.
@eytansuchard8640 ปีที่แล้ว
@@statquest The maximal depth is a different parameter. Maybe Alpha regulates how often the depth can grow if it did not reach the maximal depth.
@statquest ปีที่แล้ว ⁺¹
@@eytansuchard8640 Ah, I should have been more clear - I believe alpha controls pruning. At least, that's what it does here: th-cam.com/video/D0efHEJsfHo/w-d-xo.html
@eytansuchard8640 ปีที่แล้ว
@@statquest Thanks for the link. It will be watched.
@geminicify 4 ปีที่แล้ว ⁺²
Thank you for posting this! I have been waiting for it for long!
@statquest 4 ปีที่แล้ว
Hooray! :)
@sajjadabdulmalik4265 3 ปีที่แล้ว ⁺²
You are always awesome no better explanation ever seen like this ❤️❤️ big fan 🙂🙂.. Triple bammm!!! Hope we have Lightgbm coming soon.
@statquest 3 ปีที่แล้ว ⁺¹
I've recently posted some notes on LightGBM on my twitter account. I hope to convert them into a video soon.
@sidbhatia4230 4 ปีที่แล้ว ⁺¹
Thanks, it helped a lot!
Looking forward to part 2, and if possible please make one on catboost as well!
@sachinrathi7814 4 ปีที่แล้ว ⁺¹
Waiting for this video since long back.
@statquest 4 ปีที่แล้ว
I hope it was worth the wait! :)
@sachinrathi7814 4 ปีที่แล้ว ⁺¹
@@statquest Indeed. I have gone through many post but everyone is telling about it combine week classified to make strong classifier..n same description every.
& Then the way of describing the things make differ Josh Starmer to others.
Marry Christmas 🤗
@stylianosiordanis9362 4 ปีที่แล้ว
please post slides, this is the best channel for ML. thank you
@fivehuang7557 4 ปีที่แล้ว ⁺¹
Happy holiday man! Waiting for your next episode
@statquest 4 ปีที่แล้ว ⁺¹
It should be out in the first week in 2020.
@gokulprakash8694 3 ปีที่แล้ว ⁺¹
Stat quest is the bestttttt!!!
love it love it love it!!!!!!
@statquest 3 ปีที่แล้ว
Thank you! :)
@kn58657 4 ปีที่แล้ว ⁺⁸
I'm doing a club remix of the humming during calculations. Stay tuned!
@statquest 4 ปีที่แล้ว ⁺⁴
Awesome!!!!! I can't wait to hear.
@iop09x09 4 ปีที่แล้ว ⁺¹
Wow! Very well explained, hats off.
@statquest 4 ปีที่แล้ว
Thanks! :)
@HANTAIKEJU 4 ปีที่แล้ว ⁺²
Hi Josh, Love your videos. Currently preparing Data Science interviews based on your video. Actually, really want to hear one about LGBM !
@statquest 4 ปีที่แล้ว
I'll keep that in mind.
@omkarjadhav13 4 ปีที่แล้ว ⁺⁵
You just amazing Josh. Xtreme Bam!!!
You make our life so easy.
Waiting for neural net vid and further Xgboost parts.
Please plan a meetup in Mumbai. #queston
@statquest 4 ปีที่แล้ว ⁺²
Thanks so much!!! I hope to visit Mumbai in the next year.
@ksrajavel 4 ปีที่แล้ว ⁺²
@@statquest Happy New Year, Mr. Josh.
New year arrived. Awaiting you in India.
@statquest 4 ปีที่แล้ว
@@ksrajavel Thank you! Happy New Year!
@bernardmontgomery3859 4 ปีที่แล้ว ⁺¹
xgboosting! my Christmas gift!
@statquest 4 ปีที่แล้ว
Hooray! :)
@ramnareshraghuwanshi516 3 ปีที่แล้ว
Thanks for uploading this.. i am your biggest fan!! I have noticed too many adds these days which really disturb :)
@statquest 3 ปีที่แล้ว
Sorry about the adds. TH-cam does that and I can not control it.
@natashadavina7592 4 ปีที่แล้ว ⁺²
your videos have helped me a lot!! thank you so much i hope you keep on making these videos:)
@statquest 4 ปีที่แล้ว
Thanks!
@ahmedelhamy1845 3 ปีที่แล้ว ⁺¹
Wonderful as usual Josh
@statquest 3 ปีที่แล้ว
Thanks!
@iBenutzername 2 ปีที่แล้ว
Hey Josh, the series is fantastic! I'd like to ask you to consider two more aspects of tree-based methods: 1) SHAP values (e.g., feature importance, interactions) and 2) nested data (e.g., daily measurements --> nested sampling?). I am more than happy to pay for that :-) thanks!
@statquest 2 ปีที่แล้ว ⁺¹
I'm working on SHAP already and I'll keep the other topic in mind.
@iBenutzername 2 ปีที่แล้ว ⁺¹
@@statquest That's great news, can't wait to see it in my sub box! Thanks a lot!
@irynap9262 4 ปีที่แล้ว
Fantastic explanation again!!! Thank you for you job😊 the only things that where not mentioned and I can’t figure out by myself are:
1. Does Xgboost use one variable at a time when builds each tree?
2. In case of more than one predictor variable how and why would xgboost choose a certain variable to be used to build the first tree 🌳 and other variables for the rest of the trees?
@statquest 4 ปีที่แล้ว ⁺¹
1 and 2) If you have more than one variable, then, at each branch, it checks all of the thresholds for all of the variables. The threshold/variable combination with the best Gain value is selected for the branch.
@irynap9262 4 ปีที่แล้ว ⁺¹
@@statquest can’t thank you enough 👍🏻👍🏻👍🏻 👏🏻👏🏻👏🏻 and really happy with the tree progress I am making watching your videos.
@whenmathsmeetcoding1836 4 ปีที่แล้ว
Gain in Similarity score for the nodes can be considered weighted reduction of variance of the nodes BTW good attempt to make this digestible to all
@statquest 4 ปีที่แล้ว
Thanks!
@ayenewyihune 2 ปีที่แล้ว ⁺¹
I'm enjoying your videos. I'd love if you can do one on Tabnet.
@statquest 2 ปีที่แล้ว ⁺¹
I'll keep that in mind!
@oriol-borismonjofarre6114 2 ปีที่แล้ว ⁺¹
Josh you are amazing!
@statquest 2 ปีที่แล้ว
Thank you!
@yulinliu850 4 ปีที่แล้ว ⁺¹
Great Xmas present! Thanks Josh!
@statquest 4 ปีที่แล้ว
Hooray! :)
@vithaln7646 4 ปีที่แล้ว ⁺²
JOSH is the top data scientist in the world
@statquest 4 ปีที่แล้ว
Ha! Thank you very much! :)
@shaz-z506 5 ปีที่แล้ว ⁺¹
Extreme Bam! Finally xgboost is here
@statquest 5 ปีที่แล้ว ⁺¹
That's a good one! :)
@praveerparmar8157 3 ปีที่แล้ว ⁺¹
That DANG was unexpected.....you should have given a DANG alert 😋
@statquest 3 ปีที่แล้ว ⁺¹
:)
@metiseh 3 ปีที่แล้ว ⁺¹
Bam!!! I am totally hypnotized
@statquest 3 ปีที่แล้ว
Thanks!
4 ปีที่แล้ว ⁺¹
Thank you for sharing this amazing video!
@statquest 4 ปีที่แล้ว ⁺¹
Thank you! :)
@alex_zetsu 4 ปีที่แล้ว ⁺¹
So what if the input data contains multiple inputs? So like "drug dosage, patient is adult, patient resident nation"? In our video example, you compared "Dosage < 22.5" and "Dosage < 30" where we decided "Dosage < 30" had a better gain. So with more than one input would we be considering "Dosage < 22.5," "Dosage < 30," "Patient is adult," "Patient lives in America," "Patient lives in Japan," "Patient lives in Germany,"... and "Patient lives in none of the above" to find the most gain? Also, I just realized that you'd want more samples than you have categories if you have categorical input since if all the patients lived in separate countries, you'd be able to get high similarity scores even if patient's residence was irrelevant to our output.
@statquest 4 ปีที่แล้ว ⁺¹
When you have more than one feature/variable that you are using to make a prediction, you calculate the gain for all of them and pick the one with the highest gain. And yes, if one feature has 100% predictive power, then that's not very helpful (unless it is actually related to what you want to predict).
@alex_zetsu 4 ปีที่แล้ว ⁺¹
Well, if we had a large sample all 3,000 people who lives in Japan had drug effectiveness less than 5 and people from other nations varied from 0 to 30 (even before counting drug dose), we'd be sure residence was relevant. If the sample has 4 people, we had 30 nations (plus none of the above) to input, the 100% predictive power of residence wouldn't be very helpful since they would get high similarity scores regardless of if it was relevant or not.
@ashfaqueazad3897 5 ปีที่แล้ว ⁺¹
Life saver. Was waiting for this.
@tc322 4 ปีที่แล้ว ⁺¹
Xtreme Christmas gift!! :) Thanks!!
@statquest 4 ปีที่แล้ว
:)
@burstingsanta2710 3 ปีที่แล้ว ⁺²
that DANG!!! just brought my attention back😂
@statquest 3 ปีที่แล้ว ⁺¹
bam! :)
@alfatmiuzma 2 ปีที่แล้ว ⁺¹
Can't thank you enough, MGB you 😊😊😊
@statquest 2 ปีที่แล้ว
Thanks!
@alimmr2008 3 ปีที่แล้ว ⁺¹
Excellent Job!
@statquest 3 ปีที่แล้ว
Thanks!
@zachariahmarrero9358 4 ปีที่แล้ว
You can change Xgboost’s default score. Set ‘base_score’ equal to the mean of your target variable (if using regression) or to the ratio of the majority class over sample size (if using classification). This will reduce the number of trees needed for fitting the algorithm and it will save a lot of time. If you don’t set the base score then the algorithm will, effectively, start by solving the problem of the mean. The reason why is because the mean has the unique property of being a ‘pretty good guess’ in the absence of any other meaningful information in the dataset. As another intuition, you’ll find too, that if you apply regularization too strongly that Xgboost will “predict” that essentially every case is either the mean or very close to it.
@statquest 4 ปีที่แล้ว
I'm not sure I understand what you mean by saying that if you don't set "base_score" then the algorithm starts by solving the problem of the mean. At 2:42 I mention that you can set the default "base_score" to anything, but the default value is 0.5. At least in R that's the default, which I'm pretty sure is different from solving the problem of the mean. But I might be missing something.
@zachariahmarrero9358 4 ปีที่แล้ว ⁺¹
@@statquest Oh I see, I misinterpreted what you meant were you said 'this prediction can be anything'. The problem of the mean is just an adhoc expression to say that the algorithm will spend its first 25% (roughly) of time running by getting performance that is as good as simply starting with the mean when your eval metric is rmse. It's not literally trying to determine what the mean is but it's just that your errors will pass 'through' the error achieved with a simple mean prediction. So rather than letting the algorithm do that, you can 'jump ahead' and have it start right at the mean. The end result is a model that relies on building fewer trees which means your hyperparameter tuning effort will go faster. There's a github comment/thread about the base_score default for regression and I believe in there someone has posted a more formal estimate of how much time is saved. I can say from personal experience that this one tweak has shaved days off my own analyses.
@statquest 4 ปีที่แล้ว ⁺²
Ah! I see. And I saw that GitHub thread as well. I think it is interesting that "regular" gradient boost does exactly what you say, use the mean (for regression) or the odds of the data (for classification), rather than have a fixed default. In fact, starting with the mean or odds of the data is a fundamental part of Gradient Boosting, so, technically speaking, XGBoost is not a complete implementation of the algorithm since it omits that step. Anyway, thanks for the practical/applied advice. It's is very helpful.
@zachariahmarrero9358 4 ปีที่แล้ว ⁺¹
@@statquest You're right, I hadn't realized that but you even have it illustrated in your gradient boost video.
btw I have probably a hundred tutorial/Xgboost explainers and yours is head and shoulders above the rest. It's incredibly clear, accessible, and accurate!
@SeitzAl1 4 ปีที่แล้ว ⁺¹
amazing lesson as always. thanks josh!
@statquest 4 ปีที่แล้ว
Thank you! :)
@aldo605 2 ปีที่แล้ว ⁺¹
Thank you so much. You are the best
@statquest 2 ปีที่แล้ว ⁺¹
Thank you very much for supporting StatQuest! BAM! :)
@gutsa3389 3 ปีที่แล้ว ⁺¹
Amazing explanation as usual !!! Josh, is it possible to make a StatQuest about LightGBM ? I'm sure that it will help a lot of students like me.
Thank you very much !
@statquest 3 ปีที่แล้ว ⁺²
I am working on that one.
@gutsa3389 3 ปีที่แล้ว
@@statquest Great !! We're waiting for that one. Thanks a lot
@jameswilliamson1726 ปีที่แล้ว ⁺¹
Well explained by animating a boring topic. Thx
@statquest ปีที่แล้ว
:)
@adityanimje843 3 ปีที่แล้ว ⁺¹
Hey Josh, love your videos :)
Any idea when you will make the videos for CatBoost and Light GBM ?
@statquest 3 ปีที่แล้ว
Maybe as early as July.
@adityanimje843 3 ปีที่แล้ว
@@statquest Thank you :)
One more question - I was reading Light GBM documentationand it said Light GBM grows "leaf wise" where as most DT algorithm grow "level wise" and that is a major advantage of Light GBM.
But in your videos ( RF and other DT algortihm ones ), all of the videos show that they are grown "leaf wise".
Am I missing miunderstanding something here ?
@statquest 3 ปีที่แล้ว
@@adityanimje843 I won't know the answer to that until I start researching Light GBM in July
@adityanimje843 3 ปีที่แล้ว
@@statquest Sure - thank you for the swift reply.
Looking forward to your new videos in July :)
@kandiahchandrakumaran8521 10 หลายเดือนก่อน ⁺¹
Wonderful tutorials, not only this video, but every video in StatQuest. Probably, the best videos with good explanation available in TH-cam. I was stuggling with Python until I followed your videos. Now I am very confident in analysing the big data.
One question: I am looking at the recurrence of a disease following surgery and I evaluate time for recurrence and probability with CPH. But in non-life science, eg. customer churn, default of payment etc there is no censored cases and not considered in the ML models, such as XGBoost. Is it correct for me to use ML in the similar way for cutomer churn and customer default, despite the censored data? Please advice. Many thanks.
I (not only me every budding data scienist) would very much appreciate if you could create a tutorial video (and upload) generating Nomogram for Time event? This will help me (and others) to analyse and publish to a peer reviewed Journal on the dataset I've collected on Cancer recurrence.
Best wishes.👍
@statquest 10 หลายเดือนก่อน ⁺¹
Thank you! To be honest, I don't really know the details of how to work with censored data with ML. My advice is to simply try it out (leave the censored data as missing) and see how it does. And I'll keep those topics in mind.
@oldguydoesntmatter2872 4 ปีที่แล้ว ⁺¹
Bravo! Excellent presentation. I've been through it a bunch of times trying to write my own code for my own specialized application. There's a lot of detail and nuance buried in a really short presentation (that's a compliment - congratulations!). Since you have nothing else to do (ha! ha!), would you consider writing a "StatQuest" book? I'll bid high for the first autographed copy!
@statquest 4 ปีที่แล้ว ⁺²
Thank you very much!
@rishabhahuja2506 4 ปีที่แล้ว
Thanks Josh for this great video. Your are explainations are dam good!! Waiting for Catboost and LightGBM. Bammmm!!!!
@statquest 4 ปีที่แล้ว
Thanks!

ต่อไป

เล่นอัตโนมัติ