I appreciate your endeavor making this video. You read and summarized the paper of XGBoost! I think you are the one who best explains XGBoost in the world!!!
This is funny how I always like StatQuest videos before watching them... and never regret it :D Exactly the same way as with this comment, I am so sure it's true ;)
OMG !!!! what an explanation ! extremely detailed explanation of extreme gradient boosting. I think no one on this planet can explain this topic like you Josh! You have literally done the autopsy of this algorithm to get into that details:) Thanks a ton for this amazing video !!!!
Please accept my virtual standing ovation :) I finished the entire XGBoost tutorial and you made it sound super simple. Please also add LightGBM to the list. Thanks again for all your work.
I just wanted to say thanks. I really struggled in college because I always felt like the professors explained things like we already knew everything they were teaching. The way that you break everything down is super helpful. I hope that you continue to make these sorts of videos even when you because a world famous musician. Keep it up :)
Great series on XGBoost! Thank you very much for making them. Now I got more clear understanding about XGBoost, especially its boosting trees and computational advantages that make it fast. The animations are wonderful, making things more easily to follow. I love them so much. Hope that you will make something on LightGBM and CatBoost.
So grateful for these videos. Has made understandable of xgboost so simple though the concepts are little complicated. Closing my eye i can know say how the tress are built, how are they optimised, how are they faster etc
This is the most lucid, most comprehensive single stop shop for the tree based algos that you are going to need in any job at all. Lightgbm missing definitely leaves something to be desired. Shall we expect a Lightgbm video anytime soon, Josh?
Hi Josh. I am not sure if you still check these comments, but I wanted to thank you for making these really amazing and informative videos. I am not sure I could pursue machine learning in my own time if I did not have these great resources to clearly explain the content to me. Thanks for making the maths fun and showing all the cool details, its great :)
Thanks you very much for the videos. You give the best explanation. I'm looking forward to the lightGBM video (hope there will be one, someday). That would be a HUGE BAAM!
Thank you for the detailed but easy-to-understand video. I'm also interested in LightGBM algorithm as well (seems like it was compared with xgboost a lot), so I would be happy if you made one for lgbm as well.
Thanks for the highly accessible explanations for how XGBoost performs its calculations. What about doing a few videos on tuning the hyperparameters for improved model fits? For boosted tree, and random forest, this is not so complicated, but XGBoost has many, many parameters that can be tuned, making the optimization of the model quite challenging.
Hi Josh, Thanks for the awesome video.While you are preparing the videos in R and Python for xgboost hyper parameter tuning, would be great if you can point to some resources for xgboost hyper parameters in the mean time.
Thanks for the great explanation! I have two questions: 1. How many dataset that we need so we can use parallel learning? 2. In parallel learning is we just make one tree so we can find weigth quantile sketch?
1. I'm not sure. However, it is probably mentioned in the documentation. 2. Parallel learning makes it so we can find the best feature split faster for a given node in a tree.
Finally.. graduated in tree based algos from the statquest academy. What a feeling :) :)... One absolute last hiccup-- What does "xgboost splits the data so that both drives get a unique set of data" mean? What is a unique set of data there? And why does it ensure that parallel reads can happen? Why can't parallel reads happen if the "unique set of data" isn't there on different drives?
The idea is that 1) we start out with a dataset that is too large to fit onto the same disk drive, so we have to split it up. We could split it so that there is some overlap between what goes on drive A and what goes on drive B, but there's no speed advantage to that. Instead, if each drive has a unique subset of the data, we can call each drive simultaneously (in parallel) to access records.
Thanks Josh for this amazing video. Your explainations are really great and helpful. It would great if you could share your approach for understanding the complex algorithms like xgboost. Currently I am digging into catboost, just curious to know what resources or plan you follow when you want to understand a new algorithm like reading research papers, understanding maths behind it, etc.
Thanks, Josh. One question about the greedy part (4th minute); in random forest (say, regression applications), even though we use a subset of features for each tree and for a bootsrapped subset of the data, we could still end up with many thresholds to examine similar to the problem with xgboost. Why not using quantiles for building random trees that deal with a high number of thresholds?
Great question! I don't think there is a reason why not. All you have to do is implement it. I think a lot of the "big data" optimizations that XGBoost has could be used with all kinds of other ML algorithms.
Th great ML channel. Josh, are you planning to give lectures on Convolutional neural network and Capsule network for Deep Learning? I'm expecting those Bam!
Nice explanation! I want to ask. When we use large dataset for sparsity aware split finding, are we must do parallel learning and weighted quantile sketch for find the threshold?
I have yet another question on the histogram-building process: Let's say I had 1M rows and 1 feature 1. Build the histogram. Now there are approx 33 split-points 2. Splitpoint 10 gives max gain 3. Data from the first 10 bins go to the left child and the data from the 10th bin onwards go to the right child *4. To further split the left and right child on the same feature, (a) Are histograms with ~33 split-points built again for data that landed in the left and the right child. Or, (b) is it that now only 10 already computed split-points would be considered to compute gain for the left child and 23 already computed split-points for the right child?* I think it is (b) since I think that is the only one that can result in a speedup IMO
Based on the manuscript alone, I would say that it is (a), but it's possible that, in practice, they reuse the splits and use (b). Theory and practice aren't always the exact same.
@@statquest Thank you for the reply, Josh :). Since yesterday, I have read the histogram based split finding vs goss based split finding comparison that is given in the lgbm paper. The two algorithms are juxtaposed side by side. Basis that I am reasonably confident that it is (b) There's another trick that is given to speed up in which the histogram is computed only for one child and the histogram for the other child is merely parent_node_histogram - computed_child_histogram. This would be possible only if the histogram is computed once at the beginning of the tree and then is reused again throughout the tree. I got as much from the lightgbm manuscript. Looking forward to your thoughts on the same
@@nitinsiwach1989 That make sense because LightGBM builds trees "leaf-wise" - so it looks at the two leaves and selects the one that has less variation to add branches to.
Hi Josh, thanks for making these awesome video to learn the XGBoost in depth. But I want to ask the *weight* you mentioned in, is it same with the *cover* you mentioned in previous videos? Since both of them have the same formula.
At 9:19 I say that the weights are derived from the Cover metric, and since I later say they have the exact same formula, then we can assume that they are the same.
@@statquest Would you mind explaining to me about it? Does LightGBM use the same concept as XGBoost in finding the candidate splits (e.g quantization)?
@@deana00 My understanding is that the first handful of trees (I believe 10 is the default) are built like XGBoost, and then it uses a subset of the elements in a feature based on the size of the gradients.
Hello Josh. Many thanks for another super BAM video! I have a question about the missing value part. You explained how xgboost incorporates missing values in training and makes predictions for missing values in future data. What happens when there are no missing values in training, but there are in testing/future data?
We can make our training data to have missing values so that if in future any test data have missing values then our model can easily handle that test data.
Great video! But I don't understand why *Hessian* is used to serve as *weights* for quantile histogram. What is the underlying mathematical reason that the 2nd order derivative plays a role of weight?
Hessian = p^ * (1-p)^, you can understand/feel Hessian as the probability threshold for whether to further branch out or not. If H = 0.9 * (1-0.9) = 0.09, H is very less, and the classification of that branch is enough as the classes are well separated, if H = 0.5 * (1-0.5) = 0.25, H is very large (max), and the resulted classification of that branch is not yet good enough, the classes for the observation are not well separated. So perhaps you can use 0.15 as the "probability threshold" Hessian value.... now making a quantile of these Hessian values well separates rightly classified and wrongly classified due their sum shown exactly @14:30.
Thank you for your really helping serie about XGBoost I've got a question. When you talk about huge data base, what do you mean? Also, is a 67,000 rows X 9 columns can be considered as a huge data base? Thank you in advance for your answer. BAM!
In this case "huge" = so big that we can not fit all of it in the available ram at the same time. On my computer 67,000 x 9 would not be huge because I can load that into memory all at once.
At 19:56 when choose leaf for missing value, you select left branch as default path. It make senses because missing value residuals (-3.5 and -2.5) are negative, which is similar as nonmissing value residuals (-5.5. and -7.5). I wonder if I can select Right branch as default path for missing value if my residual is large positive e.g. 10.5 in stead of -3.5 and -2.5.
Thank you for the amazing video !! but I have some questions Does it mean that all the missing value for that feature (let's say feature A) need to be in the same side (all in left or all in right)? so if it is, it means that XGBoost will treat all the missing value in the feature A as the same? Thank you Josh : D
Thanks for XGBoost series, if the feature like Dosage is a string format how XGBoost calculate that to build a tree? Like time series data ex. Sunday, Monday, Tuesday etc. Thanks
in 14:25 what if we have 5 samples at 0.1 probability instead of 2 samples as well as another group of 5 samples at 0.9 probability instead of 2 samples in addition to the last two samples with low confidence ? will the first and second group end up in two separate quantiles with total weight of 0.45, each? if so, then then third quantile will contain the last two samples with opposite residual since their sum of weights is almost equal to that of the first two quantiles, i.e., 0.48
Hi Josh!! Thanks for this ❤️. But can you explain if how did you find the residuals value of missing dosage using initial predictions in Sparsity Aware Split Finding? Or if anyone else knows can you please help on me this . Thanks in advanced.
I'm not sure I understand your question however, In order to calculate the residuals, we only need to know the drug effectiveness. In other words, we don't need to know the dosages to calculate the residuals. So we calculate the residuals and then figure out the default direction to go in the tree for the missing values (which we never have to actually fill in).
Yes yes, I completely overlooked that!!. Maximum Bam!!!❤️ Also we all really appreciate how you still reply and clear doubts from the old videos. Thanks Josh!
If i may ask politely, can someone explain it to me in general yet simple explanation of how did xgboost fill the missing value in datasets? It's for my thesis as my lecturer ask me to explain it more specific about it. Thank you:)
At15:06 minutes, we have sum of weights in one quantile as 0.18, in another we have 0.18, then the last two we have 0.24. But as far as I understood you explained that in weighted quantiles, the sum of weights in all the quantiles are equal but here its not equal in all 4??
It's not equal, but it's as close as it can be to being equal. Does that make sense? If equal isn't an option (and that is the case here), you XGBoost gets as close to equal as possible.
@@rahul-qo3fi XGBoost converts all categorical variables to numeric via One-hot-encoding ( th-cam.com/video/589nCGeWG1w/w-d-xo.html ). XGBoost can do this efficiently for variables with lots of options by using sparse matrices (which only keep track of the non-zero values).
Your videos are the excellent quality but please stop the bad music at every video start.... It is a humble request... Because it distracts the mind before starting the learning process...
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
I appreciate your endeavor making this video. You read and summarized the paper of XGBoost! I think you are the one who best explains XGBoost in the world!!!
Thank you very much! It took a long time to figure out all of the details, but I think it was worth it. :)
This is funny how I always like StatQuest videos before watching them... and never regret it :D Exactly the same way as with this comment, I am so sure it's true ;)
Not just XGBoost but anything
OMG !!!! what an explanation ! extremely detailed explanation of extreme gradient boosting.
I think no one on this planet can explain this topic like you Josh! You have literally done the autopsy of this algorithm to get into that details:)
Thanks a ton for this amazing video !!!!
BAM! Glad you liked it!
I couldn't thank you enough for the efforts that you have put into this series of lectures. Thank You JOSH!
Thank you very much! :)
All I can say is:
BAM!!
Double BAM!!
Triple BAM!!
Mega BAM!!
And finally, EXTREME BAM!!!!!!
Thank You Josh for these wonderful lectures!
BAM! :)
Please accept my virtual standing ovation :) I finished the entire XGBoost tutorial and you made it sound super simple. Please also add LightGBM to the list. Thanks again for all your work.
Thank you! :)
I just wanted to say thanks. I really struggled in college because I always felt like the professors explained things like we already knew everything they were teaching. The way that you break everything down is super helpful. I hope that you continue to make these sorts of videos even when you because a world famous musician. Keep it up :)
Ha! Thank you very much! I appreciate it. :)
I read the paper thousand times, it was never this clear. not even close. Thank you so much.
Glad it was helpful!
Great series on XGBoost! Thank you very much for making them. Now I got more clear understanding about XGBoost, especially its boosting trees and computational advantages that make it fast. The animations are wonderful, making things more easily to follow. I love them so much.
Hope that you will make something on LightGBM and CatBoost.
I'm working on LightGBM and CatBoost.
There is only one channel whose videos can be liked even before watching for one second, and that's StatQuest! Bam!
Thank you very much! :)
So grateful for these videos. Has made understandable of xgboost so simple though the concepts are little complicated. Closing my eye i can know say how the tress are built, how are they optimised, how are they faster etc
Hooray! I'm glad the videos are helpful. :)
Hi Josh, Thank you very much for the videos on XGBoost. You have successfully explained the topic clearly though it is complicated..Thanks a ton
Thanks! :)
thank you so much josh, I felt so lucky to find your videos on youtube, ultimately clearly explained and I just love it.
Thank you very much! :)
Your videos make Machine Learning - Human learnable
That's awesome! :)
This is the most lucid, most comprehensive single stop shop for the tree based algos that you are going to need in any job at all. Lightgbm missing definitely leaves something to be desired. Shall we expect a Lightgbm video anytime soon, Josh?
Maybe. I can't make any promises, but if I have time I'll try to squeeze it in.
The best explanation to XGBoost so far.
Thank you very much! :)
Josh aka Heisenberg, You're Awesome. Thanks for the series.
Thank you very much! :)
Now I can read the XGBoost paper much easily❤ Thanks a lot for these stat quests🙌
You're welcome 😊
Thank you thank you for such a great series of XGBoost videos, so clear explained while in depth!
Glad you like them!
Simply an awesome playlist on Boosting, so AWESOME BAM!!
Glad you enjoyed it!
Hi Josh. I am not sure if you still check these comments, but I wanted to thank you for making these really amazing and informative videos. I am not sure I could pursue machine learning in my own time if I did not have these great resources to clearly explain the content to me.
Thanks for making the maths fun and showing all the cool details, its great :)
Thank you very much! BAM! :)
Thanks you very much for the videos. You give the best explanation.
I'm looking forward to the lightGBM video (hope there will be one, someday). That would be a HUGE BAAM!
Hi Josh, would you please do a series to clearly explain Bayesian & Genetic hyperparameter tuning algorithms?
I second this
I would also second this!
Super cool!!!, It's the best explanation in the world.
Wow, thanks!
Best explanation for XGB available!
Thank you! :)
Thank you for the detailed but easy-to-understand video. I'm also interested in LightGBM algorithm as well (seems like it was compared with xgboost a lot), so I would be happy if you made one for lgbm as well.
I hope to have videos on lightGBM and CatBoost soon.
@@statquest Looking forward to the videos!
So great, cant praise your guys much!👍
Thank you!
Great work. You deserve more than a million subs for your effort and dedication😀
Thank you! I dream of the day! :)
what an amazing song in the beginning of this video!!!
Bam! :)
Thanks for the highly accessible explanations for how XGBoost performs its calculations. What about doing a few videos on tuning the hyperparameters for improved model fits? For boosted tree, and random forest, this is not so complicated, but XGBoost has many, many parameters that can be tuned, making the optimization of the model quite challenging.
I plan on doing "xgboost in R" or "xgboost in python" videos, and those will cover the hyperparameter tuning.
Hi Josh, Thanks for the awesome video.While you are preparing the videos in R and Python for xgboost hyper parameter tuning, would be great if you can point to some resources for xgboost hyper parameters in the mean time.
Informative video with detailed explanation.
Thanks! :)
Thanks for the great explanation!
I have two questions:
1. How many dataset that we need so we can use parallel learning?
2. In parallel learning is we just make one tree so we can find weigth quantile sketch?
1. I'm not sure. However, it is probably mentioned in the documentation.
2. Parallel learning makes it so we can find the best feature split faster for a given node in a tree.
your videos never disappoint me, I feel I can click on "like this" even before watching your videos
bam! :)
This video is just too awesome
Thank you!
Hi Josh! Your XGBoost videos are great! By the way, do you have a tutorial about LightGBM?
Not yet.
Hi Josh, thanks for another great video. Would request you to make one on hyper parameter tuning as well.
OK. I'll do that soon.
@@statquest Thanks again :)
Thank you so much for doing this ♥️ideo. Would it be possible to do videos on Ensemble Machines.
Finally.. graduated in tree based algos from the statquest academy. What a feeling :) :)... One absolute last hiccup-- What does "xgboost splits the data so that both drives get a unique set of data" mean? What is a unique set of data there? And why does it ensure that parallel reads can happen? Why can't parallel reads happen if the "unique set of data" isn't there on different drives?
The idea is that 1) we start out with a dataset that is too large to fit onto the same disk drive, so we have to split it up. We could split it so that there is some overlap between what goes on drive A and what goes on drive B, but there's no speed advantage to that. Instead, if each drive has a unique subset of the data, we can call each drive simultaneously (in parallel) to access records.
amazing thank you for the hard work!
My pleasure!
Hello Wonderful Josh,
Do you think it is time for xgboost video 5? Explaining the additions brought in by xgboost 2
Maybe!
Simple but powerful!!!
Bam! :)
This is pure class
Thanks! :)
best explanation in the world, I think there is typo at 19:12, instead of Dosage
Yep. that's a typo.
You are a king
:)
Thank you so much
Thanks!
you are the GOAT!!
Thank you! :)
I meaannnn! maybe you can see my thesis project and watch your name in the appreciations !!! jaja thanks a lot!! Grettings from Medellín-Colombia
Hola! Good luck with your thesis! I hope it goes well. Muchas gracias. :)
Tripple BAM!
:)
This is dope.
Thanks!
Loved it! Thanks!
Thank you very much! :)
Fantastic!
Many thanks!
Thanks Josh for this amazing video. Your explainations are really great and helpful. It would great if you could share your approach for understanding the complex algorithms like xgboost. Currently I am digging into catboost, just curious to know what resources or plan you follow when you want to understand a new algorithm like reading research papers, understanding maths behind it, etc.
I read everything I can about a subject and then re-read it until it starts to make sense. Then I create a simple example and play with it.
Thanks Josh for the response!!
Thanks, Josh. One question about the greedy part (4th minute); in random forest (say, regression applications), even though we use a subset of features for each tree and for a bootsrapped subset of the data, we could still end up with many thresholds to examine similar to the problem with xgboost. Why not using quantiles for building random trees that deal with a high number of thresholds?
Great question! I don't think there is a reason why not. All you have to do is implement it. I think a lot of the "big data" optimizations that XGBoost has could be used with all kinds of other ML algorithms.
Great, as always.
Thank you! :)
Hi,Josh. Thank you for exciting explanation! What about to make the same amazing series about LightGBM and Catboost?
I'll keep that in mind.
Thanks!
Muito obrigado!!! Thank you so much for supporting StatQuest!!! BAM! :)
Amazing!!!
Thanks!
Thank you so much for your videos! I have learnt so much from them. Could you do a video on LightGBM and catboost as well? :)
I'll keep those topics in mind.
Could you please make a course on Probability as well ?
One day I will.
Th great ML channel. Josh, are you planning to give lectures on Convolutional neural network and Capsule network for Deep Learning? I'm expecting those Bam!
I'm working on neural networks right now.
@@statquest waiting for this. thanks alot
Can you please do a video on expectation maximization algorithm 🙏🙏?
I hope to do that one day.
Nice explanation!
I want to ask.
When we use large dataset for sparsity aware split finding, are we must do parallel learning and weighted quantile sketch for find the threshold?
I'm not sure you must, but if you have a large dataset, it's probably a good idea. The whole idea is to be able to train the model quickly.
I have yet another question on the histogram-building process:
Let's say I had 1M rows and 1 feature
1. Build the histogram. Now there are approx 33 split-points
2. Splitpoint 10 gives max gain
3. Data from the first 10 bins go to the left child and the data from the 10th bin onwards go to the right child
*4. To further split the left and right child on the same feature, (a) Are histograms with ~33 split-points built again for data that landed in the left and the right child. Or, (b) is it that now only 10 already computed split-points would be considered to compute gain for the left child and 23 already computed split-points for the right child?* I think it is (b) since I think that is the only one that can result in a speedup IMO
Based on the manuscript alone, I would say that it is (a), but it's possible that, in practice, they reuse the splits and use (b). Theory and practice aren't always the exact same.
@@statquest Thank you for the reply, Josh :). Since yesterday, I have read the histogram based split finding vs goss based split finding comparison that is given in the lgbm paper. The two algorithms are juxtaposed side by side. Basis that I am reasonably confident that it is (b)
There's another trick that is given to speed up in which the histogram is computed only for one child and the histogram for the other child is merely parent_node_histogram - computed_child_histogram. This would be possible only if the histogram is computed once at the beginning of the tree and then is reused again throughout the tree. I got as much from the lightgbm manuscript. Looking forward to your thoughts on the same
@@nitinsiwach1989 That make sense because LightGBM builds trees "leaf-wise" - so it looks at the two leaves and selects the one that has less variation to add branches to.
Hi Josh, thanks for making these awesome video to learn the XGBoost in depth. But I want to ask the *weight* you mentioned in, is it same with the *cover* you mentioned in previous videos? Since both of them have the same formula.
At 9:19 I say that the weights are derived from the Cover metric, and since I later say they have the exact same formula, then we can assume that they are the same.
@@statquest wow thanks for the patient explanation! it's nice learn the algo in depth from you.
Hi Josh, Thank you for your great series.
Btw, do you happen to know how LightGBM find its candidate splits in Histogram-based split finding?
I have some notes on LightGBM, but I haven't made a video about it yet.
@@statquest Would you mind explaining to me about it? Does LightGBM use the same concept as XGBoost in finding the candidate splits (e.g quantization)?
@@deana00 My understanding is that the first handful of trees (I believe 10 is the default) are built like XGBoost, and then it uses a subset of the elements in a feature based on the size of the gradients.
Hey Josh, great job, looking forward for your next video!
Just out of curiosity, dou you plan to make a video teaching catboost?
Not in the immediate future. I'm working on neural networks and deep learning right now, but I might swing back to catboost when that is done.
@@statquest thank you very much! keep up the excelent work you've been doing, you're helping me and a lot of others to learn so much!
@@marcoscosta829 Thanks! :)
Hello Josh. Many thanks for another super BAM video! I have a question about the missing value part. You explained how xgboost incorporates missing values in training and makes predictions for missing values in future data. What happens when there are no missing values in training, but there are in testing/future data?
Good question! Unfortunately I do not know the answer. :(
We can make our training data to have missing values so that if in future any test data have missing values then our model can easily handle that test data.
You are amazing!
Thanks! :)
Great video! But I don't understand why *Hessian* is used to serve as *weights* for quantile histogram. What is the underlying mathematical reason that the 2nd order derivative plays a role of weight?
It helps us separate poorly classified samples that belong to different classes.
Hessian = p^ * (1-p)^, you can understand/feel Hessian as the probability threshold for whether to further branch out or not.
If H = 0.9 * (1-0.9) = 0.09, H is very less, and the classification of that branch is enough as the classes are well separated,
if H = 0.5 * (1-0.5) = 0.25, H is very large (max), and the resulted classification of that branch is not yet good enough, the classes for the observation are not well separated. So perhaps you can use 0.15 as the "probability threshold" Hessian value.... now making a quantile of these Hessian values well separates rightly classified and wrongly classified due their sum shown exactly @14:30.
Thank you for your really helping serie about XGBoost
I've got a question.
When you talk about huge data base, what do you mean?
Also, is a 67,000 rows X 9 columns can be considered as a huge data base?
Thank you in advance for your answer.
BAM!
In this case "huge" = so big that we can not fit all of it in the available ram at the same time. On my computer 67,000 x 9 would not be huge because I can load that into memory all at once.
@@statquest Thank you once again. I catch it now!
At 19:56 when choose leaf for missing value, you select left branch as default path. It make senses because missing value residuals (-3.5 and -2.5) are negative, which is similar as nonmissing value residuals (-5.5. and -7.5). I wonder if I can select Right branch as default path for missing value if my residual is large positive e.g. 10.5 in stead of -3.5 and -2.5.
You always pick the leaf that gives the optimal gain value.
Thank you for the amazing video !! but I have some questions
Does it mean that all the missing value for that feature (let's say feature A) need to be in the same side (all in left or all in right)?
so if it is, it means that XGBoost will treat all the missing value in the feature A as the same?
Thank you Josh : D
Yes
MEGAAA BAMMMMM is back with BAMMMMMMM
Hooray!!! :)
Please make video on time series model also
I'll keep that in mind.
Machine Learning is more than just applied statistics - Josh Starmer
yep. This is a good example of it.
Hi Josh...does one on one variable correlation and multicollinearity affect ML models???
It depends on the model.
TRIPLE BAM!!
YES! :)
BAMMMMMMM
I appreciate you
Thank you very much! :)
Thanks for XGBoost series, if the feature like Dosage is a string format how XGBoost calculate that to build a tree?
Like time series data
ex. Sunday, Monday, Tuesday etc.
Thanks
Here's a tutorial on XGBoost and time series: www.kaggle.com/robikscube/tutorial-time-series-forecasting-with-xgboost
@@statquest Thanks a lot :)
Greetings from Bali
in 14:25 what if we have 5 samples at 0.1 probability instead of 2 samples as well as another group of 5 samples at 0.9 probability instead of 2 samples in addition to the last two samples with low confidence ? will the first and second group end up in two separate quantiles with total weight of 0.45, each? if so, then then third quantile will contain the last two samples with opposite residual since their sum of weights is almost equal to that of the first two quantiles, i.e., 0.48
To be honest, I don't know.
Please make video on Generative Adversarial network (Gans)
Hi Josh!! Thanks for this ❤️. But can you explain if how did you find the residuals value of missing dosage using initial predictions in Sparsity Aware Split Finding? Or if anyone else knows can you please help on me this . Thanks in advanced.
I'm not sure I understand your question however, In order to calculate the residuals, we only need to know the drug effectiveness. In other words, we don't need to know the dosages to calculate the residuals. So we calculate the residuals and then figure out the default direction to go in the tree for the missing values (which we never have to actually fill in).
Yes yes, I completely overlooked that!!. Maximum Bam!!!❤️
Also we all really appreciate how you still reply and clear doubts from the old videos. Thanks Josh!
Can you please create one for LightGBM too?
I'll definitely keep it in mind.
Can you please help us understanding Light Gbm , cat-boosting algorithms
I'm working on those.
BAM BAM
:)
but, what if one missing values residual fit into the left side of the tree and one to the right side of the tree? then how will you predict?
To be honest, I'm not 100% sure. However, I believe that the default, when everything else is the same, is to go to the left.
@@statquest thanks for the reply josh. double bammmm!
What do u use for making the diagrams??... Power point or any other software??
I use "keynote", which is a free product that comes with apple computers.
Can we use XGBoost for time series forecasts?
I believe it is possible, but I haven't tried it myself.
If i may ask politely, can someone explain it to me in general yet simple explanation of how did xgboost fill the missing value in datasets? It's for my thesis as my lecturer ask me to explain it more specific about it.
Thank you:)
I explain this how XGBoost deals with missing data at 16:13. Is there some part that doesn't make sense to you?
At15:06 minutes, we have sum of weights in one quantile as 0.18, in another we have 0.18, then the last two we have 0.24. But as far as I understood you explained that in weighted quantiles, the sum of weights in all the quantiles are equal but here its not equal in all 4??
It's not equal, but it's as close as it can be to being equal. Does that make sense? If equal isn't an option (and that is the case here), you XGBoost gets as close to equal as possible.
@@statquest Okay so that means approximately equal. Thanks
❤❤❤❤
:)
What about categorical variables ? As quantiles wont be available how would these vars be handled?
@@rahul-qo3fi XGBoost converts all categorical variables to numeric via One-hot-encoding ( th-cam.com/video/589nCGeWG1w/w-d-xo.html ). XGBoost can do this efficiently for variables with lots of options by using sparse matrices (which only keep track of the non-zero values).
@@statquest 😍
I feel like 15:15 deserved a triple bam, but maybe that's just me
I think you might be right on that one.
i need to study operation system ...
noted
BAAAAAAAAAAAAAAAAM !!!!!!!!!!!!!!
:)
三联
Yes!
Your videos are the excellent quality but please stop the bad music at every video start.... It is a humble request... Because it distracts the mind before starting the learning process...
Noted!
Hi!