FIND THE BEST MODEL

yuzaR Data Science

มุมมอง 13 843

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ม.ค. 2025

ความคิดเห็น • 113

@anadias7889 8 หลายเดือนก่อน ⁺³
What a great video! I was looking for this information from too many sources, but it never was so clearly!
You are an amazing professor!
@yuzaR-Data-Science 8 หลายเดือนก่อน
You're so welcome! Glad it was useful!
@zane.walker 2 ปีที่แล้ว ⁺⁵
Very interesting (and useful) package. Thanks for much for bringing it to my attention and demonstrating it so effectively. Much appreciated!
@yuzaR-Data-Science 2 ปีที่แล้ว
Glad it was helpful, Zane!
@EsusGali ปีที่แล้ว ⁺¹
This video is gold, this channel its gold, you are gold mate
@yuzaR-Data-Science ปีที่แล้ว ⁺¹
Wow, thanks! Glad it's useful for more people than just me ;)
@shaunaheron4448 2 ปีที่แล้ว ⁺⁵
This is great! would love a vid covering how you would report your model building w/this method!
@yuzaR-Data-Science 2 ปีที่แล้ว ⁺³
Great suggestion! By the way, I am making a video right now about reporting. It's not particularly about reporting the {glmulti} results, I would just use the variable importance plot and may be the 10 best models with their formulas. But the new video will be about how to report results of some statistical tests and models. So, stay tuned and thanks for the feedback!
@wenxuanjiang-v9t ปีที่แล้ว ⁺¹
It is very informative! Much appreciated!
@yuzaR-Data-Science ปีที่แล้ว
I am really glad it is useful! Thanks 🙏 for your nice feedback!
@hendrikpehlke4973 6 หลายเดือนก่อน ⁺¹
Thank you for this video. I just have two question. 1. You showed the plot of the variable importance ("plot('model_name', type = "s")). Is there a way to extract the names of the variables and/or interactions using a threshold (e.g. 75%)? I need them as a list, e.g. "variab_interactions_plus_75
@yuzaR-Data-Science 6 หลายเดือนก่อน
Hi Hendrik, not that I know of. Once I needed to use for my paper, I created the data frame manually and only put the things inside, I wanted. But I think it's not that much more work, after algorithm done the whole heavy lifting ;) cheers
@muhammedhadedy4570 2 หลายเดือนก่อน
Excellent tutorial as usual, my dear professor.
Just one question, is there an R package that does the same job (model selection) for Cox regression models?
@yuzaR-Data-Science 2 หลายเดือนก่อน
Thanks Muhammed, I am not sure about that, since I never needed it. But I think it should go with Cox models too. I have got glmulti to work with most of the models. Please, let me know whether this works. Cheers
@bianca-fr7dk ปีที่แล้ว ⁺¹
Thanks for the amazing tutorial! I am getting an error though when trying to include a random factor: Error in glmer(paste(deparse(formula), random), data = data, REML = F, :
unused argument (REML = F), and am not sure what the issue with this could be. Do you have any suggestions? Thanks!
@yuzaR-Data-Science ปีที่แล้ว
Can’t check it at the moment since I am on holidays without computer. But try without REML=F or with REML=T
@bianca-fr7dk ปีที่แล้ว
@@yuzaR-Data-Science Sorry for the late reply but thank you so much for your help! I am now having issues with interpreting the output of glmulti using the random factor wrapper function. It seems like the random factor is not included in any of the possible models. How do I understand wether the random factor is part of the best model or not? Thank you so much in advance if you have time to answer this!
@JT-ph3hk 2 ปีที่แล้ว
Best tutorial ever!
@yuzaR-Data-Science 2 ปีที่แล้ว
Thanks a lot! And thanks for watching!
@alijanbain2852 2 ปีที่แล้ว ⁺¹
Thank you for this great video! It is very informative!
But what about the survival and cox models? How do we find the best model?
@yuzaR-Data-Science 2 ปีที่แล้ว
Good question! It says on the webpage of the package that it does support survival models: cran.r-project.org/web/packages/glmulti/index.html . But I didn't try it yet. So, if you do, please, let me know what was your experience. Cheers
@rubyanneolbinado95 9 หลายเดือนก่อน
Hello, I've created the best model using he package however upon checking the collinearity of the explanatory variables, it suggested that they have high VIF. What should I do next? Thank you so much for your informative videos.
@yuzaR-Data-Science 9 หลายเดือนก่อน
:) you should remove the less useful of them and rerun glmulti
@rubyanneolbinado95 9 หลายเดือนก่อน
Hi, why is R studio producing different results even though I am using the same call and data.
@yuzaR-Data-Science 9 หลายเดือนก่อน
I can't say that, but what I can say is that you are not using the same call and data. do you something before you model, or do you use sample_n() or similar? Or do you use any kind of similated things? then you have to set.seed() to some number and always run it with the rest of the code
@davidsonadrien6273 2 ปีที่แล้ว
Hi!! Thanks a lot for your presentation on this package. Really interesting! But I encountered an error using the said package on one of my databases. A warning message saying that I have too many predictors. What do you think is the maximum number of predictors that we can use with the function glmulti()?
Thanks again for your great work!
@yuzaR-Data-Science 2 ปีที่แล้ว ⁺¹
Hey, yeah, interesting question. I think, it's important to reduce the number of predictors to only useful ones. And then use glmulti or the backwards selection. If you say "databases" or ask for a maximum number, I think you did not sorted them out, correct? In this case you can have a lot of not-useful, collinear and otherwisely-contaminated variables, which would not produce anything useful anyway. But if all of them make sense, then only splitting them into some kind of subgroups and running several glmultis would do the trick. Cheers and thanks for watching!
@zane.walker 2 ปีที่แล้ว
So I checked out your video on the performance model and found it also useful. However, I am a little puzzled with the collinearity test. Now that glmulti has made it easier to include interaction terms in the regression model, the collinearity test seems to flag the collinearity as a problem. How should the collinearity of interaction terms be dispositioned?
@yuzaR-Data-Science 2 ปีที่แล้ว
The theory suggests the VIF for interactions does not supposed to be below 20. However, I found it hard to find when you model lots of interactions. And since the interactions contain the main effects, the collinearity will always be there. So, I only check the collinearity in predictors without interactions, meaning, I sort out the most "contaminations" from the multivariable model, and when no collinearity is left, I'll ask glmulti to check all possible pairwise interactions to uncover which are important. I am NOT saying it's a correct or the best approach! It's just my opinion, and there are a lot of other opinions, so, if you find a better one with a good reasoning, please let me know. Until then, thanks for watching!
@zane.walker 2 ปีที่แล้ว
@@yuzaR-Data-Science Statistics, like most things in life, is not black and white and so a little common sense, as you suggested, is always useful. Your approach makes sense to me! Perhaps the main objective with the check is to ensure that the analyst is aware when collinearity exists so he/she can judge whether it is appropriate. Much appreciated!
@yuzaR-Data-Science 2 ปีที่แล้ว
@@zane.walker you are welcome!
@yuchuxie8170 7 หลายเดือนก่อน
Very nice video! It helped me a lot.🤓
I have a question here, when I try to build a function for mixed effect models and continue to the next step of glmulti, it warns me that
"Error during wrapup: unused argument (REML = F)
Error: no more error handlers available (recursive errors?); invoking 'abort' restart".
Could you pls tell me how might this happen and how to solve this problem?
That would be very appreciated!😳
@yuzaR-Data-Science 7 หลายเดือนก่อน
hi, thanks for feedback!
first, have you installed all the important packages (lme4, lmerTest ... etc.)? and
secondly, have you tried this?
glmer.glmulti
@yuchuxie8170 7 หลายเดือนก่อน
@@yuzaR-Data-Science Thanks for your reply! Yes, I followed these steps, and still got the error warning.
@yuzaR-Data-Science 6 หลายเดือนก่อน
then I guess there are two many predictors. if I try it with ca. 20 it collapses and I need to restart rstudio. so, reduce the number of predictors to the most sensical ones and then run the glmulti
@yuchuxie8170 6 หลายเดือนก่อน
@@yuzaR-Data-Science I'll try this. Thank you soooo much for your kind suggestion！
@yuzaR-Data-Science 6 หลายเดือนก่อน
sure, let me know whether it worked. by the way if you use "d" instead of "h", you can see how many models you are going to make, and if it goes over 6figures, I would reduce the number of predictors first and then use the glmulti
@galan8115 2 ปีที่แล้ว
I will put my grain of sand here. I would like ADD that there are 2 cool arguments in this package his sets a constraint on candidate models. Minimal/maximum number of TERMS (main effects or interactions) to be included in candidate models: minsize and maxsize. When you dont want to populate your model with lots of variables.
@yuzaR-Data-Science 2 ปีที่แล้ว
Great idea, but I am not sure you can't do that already. I never needed it. Please, check out all the arguments in the glmulti description and create a feature request on the github of glmulti.
@yuzaR-Data-Science 2 ปีที่แล้ว
And thanks for watching!
@Y45HV1N 2 ปีที่แล้ว
hello! If I may, I have a small question for the wrapper function part for lme4 models!
If I had several random effects (e.g., random_effect_1, random_effect_2, random_effect_3), and if I used your function, should I put random = c("+(1|random_effect_1)","+(1|random_effect_2)""+(1|random_effect_3)") ??
thank you
@yuzaR-Data-Science 2 ปีที่แล้ว ⁺¹
Hey, sure. I did not try it yet. But from the top of my head I would say: random =
"(1|random_effect_1)+(1|random_effect_2)" . However, I would not overdo the random effects. I did strange experience with them, like the model collapse or did not converge. I do try to limit random effect on the only one, and usually only intercept, although random slope is more useful. The data does not always allows to use random slope fully. By the way, if you don't know Ben Bolker, check out his insanely useful post on GLMM: bbolker.github.io/mixedmodels-misc/glmmFAQ.html
@Y45HV1N 2 ปีที่แล้ว
@@yuzaR-Data-Science thanks a lot for the response and the recommendations!!
@yuzaR-Data-Science 2 ปีที่แล้ว
you are welcome!
@yecodjidansou4272 ปีที่แล้ว
Thks for this video. I have a question. Can we apply glmulti on multinomial regression?
@yuzaR-Data-Science ปีที่แล้ว ⁺¹
Yes, absolutely. I have an example on my blog: yuzar-blog.netlify.app/posts/2022-05-31-glmulti/#some-exotic-applications-glmer-or-multinom
@yecodjidansou4272 ปีที่แล้ว
@@yuzaR-Data-Science Thanks for your answer.
@yuzaR-Data-Science ปีที่แล้ว
@@yecodjidansou4272 sure! ;)
@КостадинКостадинов-ц8е 5 หลายเดือนก่อน
Great video!
@yuzaR-Data-Science 5 หลายเดือนก่อน
Thanks 🙏
@wenxuanjiang-v9t ปีที่แล้ว ⁺¹
soooo!! great!! thank u a lot
@yuzaR-Data-Science ปีที่แล้ว
You are welcome! Thanks for watching 🙏
@aram5704 3 หลายเดือนก่อน
Performance package says Objects of class `glmulti` are not supported model objects.
@yuzaR-Data-Science 3 หลายเดือนก่อน
yes, sadly it is the case, but it's not a problem, because I look at the best model, and simply rewrite it in glm or whatever parent-model framework I use. by the way, I don't do the tidymodels videos due to the same reason, they don't go well with my favorite R packages. cheers
@caduguimaraes ปีที่แล้ว
Amazing another vid. Thanks
@yuzaR-Data-Science ปีที่แล้ว
Another thanks 🙏 from me😉
@abhisheksingh-od4qo ปีที่แล้ว
I am trying to using Glmulti in R they show error not find function. What can I do I am applying to this in linear mixed model
@yuzaR-Data-Science ปีที่แล้ว ⁺¹
There are two ways I found on the internet to make glmulti work with glmer. One is:
glmer.glmulti
@abhisheksingh-od4qo ปีที่แล้ว
@@yuzaR-Data-Science thank you sir but this is applicable on glm model or Lm model?
@abhisheksingh-od4qo ปีที่แล้ว
@@yuzaR-Data-Science I am using linear mixed model
@yuzaR-Data-Science ปีที่แล้ว ⁺¹
both and more ... you can specify family of models, like in the code above
@yuzaR-Data-Science ปีที่แล้ว ⁺¹
like the code above too, don't specify the family there and use lmer instead of glmer
@AdrianEcology ปีที่แล้ว
Thank you so much for this video! I tried to include a random effect using the script you used but it didn't work and I can't find anything in the package's documentation to help. Could I email you?
@yuzaR-Data-Science ปีที่แล้ว
No worries! I used mixed effects models with random effect already. It needs an extra function, which I did actually find on internet. But here is the general form:
glmer.glmulti
@galan8115 2 ปีที่แล้ว
Thank you for sharintg this awesome package and making this tutorial! Regarding the multinomial (actually binomial) model of predicting the "jobclass", is it there a way to extract the ROCS and AUCS (and plot them)? Because I think it would be really interesting for binomial/multinomial classifications (wich tend to be the analysis people do in medic fields)
@galan8115 2 ปีที่แล้ว
Jsut seen the post on the blog with the section for multinomial (but the roc question stays). Thank you for all your work, sir.
@yuzaR-Data-Science 2 ปีที่แล้ว ⁺¹
I use glmulti only for finding the best models, and then, when I know what the predictors are, I go and redo the model in a native code, for example glm() or nnet::multinom() then, it is possible to extract ROC. For the multinom you might try tidymodels, then, I think, produce ROC.
@bestcongressmoneycanbuy9704 2 ปีที่แล้ว ⁺¹
Great presentation on glmulti. However, is there an option in glmulti that eliminates highly colinear terms during fitting? If not, it would be nice if that functionality were added so as to pick the best model that didn't exceed a user-specified VIF. For two-way interaction models, I typically use standardized main effects terms so as not to induce colinearity with added two-way interactions.
Also, glmulti doesn't appear to leave both main effect terms in the model for each significant two-way interaction term regardless of main effect significance, which is what's typically done in best practices. Is there a way to force glmulti to keep these main effect terms?
@yuzaR-Data-Science 2 ปีที่แล้ว ⁺¹
Hey hey, thanks for the feedback. Sorry for a late reply, was on vacation. First, no, glmulti does not have any collinearity check, as far as I know. I usually check that before variable selection, so that all the predictors are sane. VIF is a good option with all the predictors, then I usually remove the highest VIF predictor, remodel etc. Secondly, I then trust the two-way interaction results from glmulti even if the VIF for interactions is high. Lastly, I use glmulti for variable selection, but then remodel the end result with a classic function, like glm or glmer or lm ... you get the idea. There I only write the interactions and R gets the main effects by default. Besides, the end result of glmulti is an object and there are several models, you could get the best but it's not too intuitive to get the estimates, pvalues or visualize predictions, that's why I remodel the best model and get all I want from it. May be not too convinient, but glmulti often gets a better result then backwards or forwards selections. Hope that helps! Cheers.
@buraktiras93 2 ปีที่แล้ว
This is great, thanks! But how can we name the best performing model? I mean is it logistics regression, random forest or knn etc ?
@yuzaR-Data-Science 2 ปีที่แล้ว
It is what the family you use. Family binomial means logistic regression, Gaussian means linear regression, etc. I used it with glmer package, also binomial family. Poison and other classic inference models should also work. I did not use it with ML models like KNN or RF. Please, let me know if you’ll find a way
@aravind460 2 ปีที่แล้ว
Hi, thanks a lot for the video, it was very informative. One doubt, what if we use both back ward and forward selection method together. Which is something I have commonly seen, where we give the argument “both” rather than “backward” or “forward” in the stepaic function. Will it be a better approach like gmulti
@yuzaR-Data-Science 2 ปีที่แล้ว
Yes, you can use "both" for sure, but I stopped using it one day, because it mostly produced the result identical to the backwards selection. Sometimes to the forwards. And very rarely if at all, something completely new. Anyways, the principle remains different to the glmulti, where glmulti uses all models, while stepwise selection not. For instance I now will hardly use stepwise selection since there is brute force approach, except of the cases with too many predictors, or if some type of models is not supported by glmulti. Cheers
@aravind460 2 ปีที่แล้ว
@@yuzaR-Data-Science Got it, Thank you for the response 👍.
@mayurwabhitkar2041 ปีที่แล้ว
can we use the logictic regression model via glmulti ?
@yuzaR-Data-Science ปีที่แล้ว
yes, it's definitely possible, you just have to specify the family and you are good to go
@mayurwabhitkar2041 ปีที่แล้ว
@@yuzaR-Data-Science can you make a video on functional response analysis ?
@yuzaR-Data-Science ปีที่แล้ว
@@mayurwabhitkar2041 hey man, we do not use FRA in the medicine, so I can not promise, that I'll come up with it very soon. I put it on the list though, since I always look for new ideas. Thanks
@abhisheksingh-od4qo ปีที่แล้ว
Hello sir can you suggest me how to make a multivariate linear mixed model in which multiple fixed and random effects. Can I make using lme package or any other package.please suggest it
@yuzaR-Data-Science ปีที่แล้ว ⁺¹
Sure, you have to use the "fit-function", here is the code example. But, please, don't just use it, but adapt to your data and your needs, otherwise it won't work:
glmer.glmulti
@abhisheksingh-od4qo ปีที่แล้ว
@@yuzaR-Data-Science and Sir in case of multiple responses we do same?
@yuzaR-Data-Science ปีที่แล้ว ⁺¹
@@abhisheksingh-od4qo for multiple response variables I would do multiple models
@abhisheksingh-od4qo ปีที่แล้ว
@@yuzaR-Data-Science thank you sir , it's bit hard to derive a multivariate linear mixed model with multiple responses , multiple fixed and random effects to take together. I have tried to make all the packages.but it creates problem. Can you suggest me which package which is the best
@milliontesfaye 2 ปีที่แล้ว
Thank you for this great video!
I tried to select the best model using glmulti function, particularly using genetic algorithm method, but the results are not reproducible, how can I fix that in glmulti function?
Thank you in advance for your time
@yuzaR-Data-Science 2 ปีที่แล้ว
Thank for the feedback, mate! Sorry, for the delay with the answer, I was on holidays.
Now, I personally do not use genetic algorithm, because my datasets aren't that big. Having said this, I tried to stabilize the output with set.seed(1), but could not. It still sometimes produced different result. However, the model would still be good, doesn't matter which. So that, if you can avoid using the genetic algorithm, do not use it, but if you are forced to do so, you have to kind of accept the result, or run it sereral times and compare lets say 10 best models of every output and take the model which came out in the most runs.
@yuzaR-Data-Science 2 ปีที่แล้ว
by the way here is the code I used to compare two results of a genetic algorithm:
library(glmulti)
set.seed(1)
test_g
@milliontesfaye 2 ปีที่แล้ว
Many thanks for your reply, I will try to use an exhaustive screening method even if it takes longer time, but I think the results are reproducible.
@milliontesfaye 2 ปีที่แล้ว
Thank you for your time, I will try to use it.
@yuzaR-Data-Science 2 ปีที่แล้ว ⁺¹
@@milliontesfaye yes, with exhaustive screening for sure reproducible, still, don't forget to use set.seed()
@hashimalikhan775 หลายเดือนก่อน
Please make a video on GEE too.
@yuzaR-Data-Science หลายเดือนก่อน ⁺¹
Great suggestion! It’s already on my list! Please, stay tuned
@arseneadou9602 2 ปีที่แล้ว
You saved me!!!
@yuzaR-Data-Science 2 ปีที่แล้ว
You are welcome!
@cyruskavwele5304 2 ปีที่แล้ว
Masterstroke for statisticians.
@yuzaR-Data-Science 2 ปีที่แล้ว
Thanks for the feedback!
@hansmeiser6078 2 ปีที่แล้ว
When I feed glmulti with the following formula...
Start: AIC=35.2
y ~ n1Cat_8Lag1_x_n1Cat_24Lag1 + n1Cat_5Lag1_x_n1Cat_6Lag1 +
n1Cat_3Lag1_x_n1Cat_6Lag1
It gives me:
*Best model: y~1*
...This is not the original formula. What does it mean?
@yuzaR-Data-Science 2 ปีที่แล้ว
It means you only get the intercept value, which is really just the mean of the outcome.
y
@hansmeiser6078 2 ปีที่แล้ว
@@yuzaR-Data-Science Thank you for the answer Yuri. Could it also mean, that there is a relationship between outcome and predictors- but it is not linear and therefor not relevant for glmulti?
@yuzaR-Data-Science 2 ปีที่แล้ว
No, the opposite is true - there is absolutely no relationship, because there are no predictors in "y ~ 1", it's just a response. Linear or not is a completely different story. I never tried glmulti with non-linear models, like gam. If you do, please, let me know whether it worked. Cheers
@hansmeiser6078 2 ปีที่แล้ว
@@yuzaR-Data-Science The thing is, the predictors are actually top variable importances of xgboost (tree-based, though different sample size). I will test it with gam-wrapper-function too and would guess one could use lasso/glmnet although.
@yuzaR-Data-Science 2 ปีที่แล้ว
@@hansmeiser6078 sounds good, but I am still hesitant to use to much ML because I do inferential stats for the living. So, everytime I have some cool ML result, I wonder how to interpret them, make them useful and publish in a scientific journal.

ต่อไป

เล่นอัตโนมัติ

Effective Resampling for Machine Learning in Tidymodels {rsample} R package reviews