- 17
- 76 058
statsguidetree
เข้าร่วมเมื่อ 3 พ.ค. 2006
Aim is to provide tutorials, insights, and discussions on topics related to applied statistics.
Random Forest in R for Classification and Regression
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8
For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site
I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com
For rcode and dataset: gist.github.com/musa5237
In this video I go over conducting a random forest analysis in R. I provide brief description of what random forest is, the r packages/functions used, how to handle missing data, how to interpret the output, classification and regression examples, and also discuss strengths and weaknesses. I use the pokemon dataset.
For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site
I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com
For rcode and dataset: gist.github.com/musa5237
In this video I go over conducting a random forest analysis in R. I provide brief description of what random forest is, the r packages/functions used, how to handle missing data, how to interpret the output, classification and regression examples, and also discuss strengths and weaknesses. I use the pokemon dataset.
มุมมอง: 133
วีดีโอ
Propensity Score Analysis in R with Nearest Neighbor, Optimal Pair, and Optimal Full Matching
มุมมอง 15K3 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Tutorial video...
What is P-Hacking & P-Values
มุมมอง 6313 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Video presenta...
Data Splitting using Cross Validation and Bootstrap in R
มุมมอง 2K3 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 This video is ...
Multidimensional IRT and DIF in R with mirt
มุมมอง 10K3 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Tutorial video...
Estimate Reliability in R with Alpha, Omega, and Kappa
มุมมอง 6K3 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 This is a tuto...
Reduce Test Length with IRT models in R
มุมมอง 8703 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 This video is ...
Multilevel Modeling in R Predicting NYC Vaccination Rates
มุมมอง 4283 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Video tutorial...
Logistic Regression with Variable Selection and Categorical Data Analysis in R
มุมมอง 14K3 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Correction for...
General Comparison of Psychometric Models IRT & CDM
มุมมอง 1.2K4 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 This video is ...
IRT GRM model and DIF for Ordinal Polytomous data in R
มุมมอง 8K4 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Tutorial video...
Differential Item Functioning DIF in R with IRT & non-IRT (Detecting Bias Items)
มุมมอง 4.7K4 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For consultation/questions: statsguidetree@gmail.com Tutorial of how to to detect Differential Item Functioning (DIF; i.e., bias items) in R with difR package. The DIF methods reviewed are the non-IRT MH method (Mantel & Haenszel, 1959) and the 2PL IRT Lord's Chi-sqare method (Lord, 1980). The former can ...
Hierarchical Clustering in R for UFC Fighter Analytics
มุมมอง 2104 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Tutorial video...
IRT Models (Rasch, 2PL, & 3PL) in R with ltm package
มุมมอง 10K4 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Tutorial of ho...
MC-DINA (CDM) model in R with GDINA
มุมมอง 6994 ปีที่แล้ว
☕If you would like to support, consider buying me a coffee ☕: buymeacoffee.com/statsguide8 For one-on-one tutoring/consultation services: guide-tree-statistics-consultation.square.site I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning. You can also email me statsguidetree@gmail.com For rcode and dataset: gist.github.com/musa5237 Tutorial of ho...
DINO (CDM) model in R for Depression Symptoms
มุมมอง 2264 ปีที่แล้ว
DINO (CDM) model in R for Depression Symptoms
What a wonderful well-explained video! I'm so thankful to you for creating this channel. I have a question. My scale is a four-point Likert scale, starting from 0 to 3. When I do the plotting I only get the monotonic function for the first-point=0 but not for the last=3. So I am wondering if I need to adjust my function (to tell the program it is a 4-point scale instead of 5) or if it is a plotting issue. Thanks for any advice!
Can you please tell me how to select outcome means on what basis?
By select outcome means are you referring to checking balance and the ranges I used for the Standardized mean difference of between -.1 and +.1 and Variance ratios between .8 and 1.25? Those ranges were are general recommendations. Zhang et al. (2019) suggested something similar.
Thank you so much for this amazing tutorial. Should I run GoF.rasch to assess the Rasch model?
Excuse the delayed response. The GoF.rash() function can also be used to assess your rash model. I used the anova() function to assess the rash model, you would use the anova() function when you have 2 nested models and you want to compare them with a Likelihood Ratio Test Great to see whether the more complex model improves fit (e.g., comparing rasch to a 2PL model). However, if you only want to assess one model (e.g., one rasch model) the GoF.rasch() will will compare your data (i.e. observed data) to a simulated dataset that based on what you would expect to see if your Rasch model was true (i.e. expected values). Based on that a significant p-value would suggest you have a poor fitting model.
Incredibly helpful and clear.
Hi... I interested in MC-DINA model. Is there MC for DINO or Bug-DINO model?
Just to clarify, the first covariate is the same as your dv?
Very good guide! Thanks
The rcode to load and clean the dataset and to run the analysis is available on my github: gist.github.com/musa5237/2e41fa4ec8fe36b34d374e0879523754 Some additional updates to the video: 03:06 -- To load a csv into your environment, the code should be read.csv(file.choose(), sep= ‘,’, header=T) 03:31 -- Type.1 is the pokemon type (e.g., water, fire, etc.) not the name. I left the name out of the dataset with select(). 29:43 -- I should clarify that the mean decrease in accuracy and mean decrease in gini agreed only on which feature is most important. 41:57 -- The hash notes should read default arguments for regression not classification.
excellent tutorial i watched 3x
thank you sm!!
very informative. Can I use this MIRT for an instrument with more than 2 dimensions? I am working on validating a research instrument that is polytomous with 4 dimensions but 1 latent trait.
Thank you very much for the video. I have a question regarding the ability scale/x-axis in the plots. It is always ranging from -4 to 4. Is there a possibility to change resp. rescale it, e.g., to -3 to 3? So that all values (also the extrimity parameters and the discrimination parameters) correspond to the new scale (-3 to 3).
Hmm. I haven't tried adjusting the horizontal/x scale for this specfic plot. But generally adding xlim argument in the plot() function could work. E.g., plot(mod2, xlim=c(-3,3)). Hopefully that will work.
Thank you for your great video on DIF with ‘mirt’. I managed to do DIF analysis with your detailed instructions. I have a few questions, though. I’ll be very thankful if you answer them. 1. When we examine DIF, we are basically interested in the item difficulties across the two groups. I don’t follow when the manual says, “Determine whether it's a1 or d parameter causing DIF”. How can the slope parameter cause DIF? Isn’t DIF a property of the intercept? 2. What is the ‘RMSD_DIF’ function? How is it used and interpreted? 3. I’m working with large scale international educational data where around 50 countries take part. I need to examine country DIF. I know that in this context we don’t do pairwise comparisons, but we estimate a pooled international ICC (all countries combined) and then compare the country ICCs with the pooled international ICC. Can you please provide the codes for this? Thanks for your help in advance. Afshin
Would it be correct to say that while the distractors should have 0 in at least one of the three attributes, the key (correct option) should have mastery of all three attributes? I don't think that the key should have 0s in any attributes.
I've got nominal data (dependent variable outcome of 0/1/2], how do you run LOOCV on multinom model? Any help is appreciated.
I noticed the [method = "glm" ] was used in the LOOCV method, but what if you have a nominal dependent variable [outcome of 0/1/2], how can we run LOOCV on that? Any help is appreciated.
Very informative video. I am trying to train an RF model where I have 40+ independent variables. I am currently using k-fold CV with 3 repeats. It is taking a lot of time. How can I reduce the model training time? I am afraid if I will use bootstrap method, it may take even longer time.. 2-3days!! Any suggestions??
how about for real data how to plug in real data into the command as q matrix
Hi, Thank you for video. I loaded dataset coll from the link that you have pinned and then ran the script from identify field names to adjust units for continuous variables. After running it makes all values as NULL in coll and makes coll2 as o obs. of 6 variables. what should i do?
and also at line 136 #no psa, just regression if i run mod_test1<- glm(selective ~ MEDIAN_HH_INC+ STEM+ PCTPELL+ UG25ABV, data = coll2, family="binomial"), it says Error in model.matrix.default(mt, mf, contrasts) : variable 1 has no levels
I suppose problem is here at line 22: coll<-subset(coll, PREDDEG == 3 & MAIN == 1 & (CCUGPROF==5|CCUGPROF==6|CCUGPROF==7| CCUGPROF==8 |CCUGPROF==9 |CCUGPROF==10 | CCUGPROF==11 |CCUGPROF==12 | CCUGPROF==13 |CCUGPROF==14 |CCUGPROF==15), select=c(selective, MD_EARN_WNE_P10, MEDIAN_HH_INC, STEM, PCTPELL, UG25ABV)) after running this code, it gives null values for coll. please if you can guide how can this be fixed?
this is such a terrific video. thanks. I wish that you would have more similar pieces. I have 2 questions: a) I am trying to examine DIF between 4 latent classes. Can the command: sex_a<-subset(sex_a,(group ==1 | group == 2| group == 3| group == 4), select = c(A1:A11,group)) be followed by: plot(genDIF, labels = c('1', '2', '3', '4')) to give me 4 groups plotted for each item. It seems to limit me to 2 groups (as in your example). b) genDIF <- lordif(sex_a[,1:11],sex_a[,12], criterion = 'Chisqr', alpha = 0.01): Error in collapse(resp.data[, selection[i]], group, minCell) : items must have at least two valid response categories with 5 or more cases. I know that there are more than 5 cases per group. Thanks again
What an amazing explanation!!! Hats off. You even provided the R-script. Super helpful! You saved my thesis, thank you so very much.
genDIF <- lordif(IPC[,1:7],IPC[,'Gender'], criterion ='Chisqr', alpha = 0.01) Iteration: 34, Log-Lik: -5580.797, Max-Change: 0.00009 (mirt) | Iteration: 1, 5 items flagged for DIF (1,2,4,6,7) Error in `vec_equal()`: ! Can't combine `..1` <character> and `..2` <double>. Run `rlang::last_trace()` to see where the error occurred. this happened after running this code can you help?
You may need to change the gender field to a different data type. Is the gender field in your dataset a character? If so try converting it to a factor or numeric.
can you help me interpret the interaction term logit(DEATH_EVENT)=−1.698+0.0385×age+0.8267×serum_creatinine−0.0006520×ejection_fraction×time
Generally the interaction term would be defined as the effect ejection fraction has on death is conditional on values of time controlling for the other variables in the model. When you include interactions, it is often also a good idea to include the main effect of each variable also in the model. In addition, to make it easier to interpret you can center each variable before multiplying them together to form the interaction. Here is a good resource for working with interactions that go into more detail: www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=www3.nd.edu/~rwilliam/stats2/l55.pdf&ved=2ahUKEwjGvaaX5vSCAxWgmIkEHah6AfcQFnoECCUQAQ&usg=AOvVaw3KaKU8apAO-VaPq4RXqmYS
please how to do step with glmer? Is there a package?
I did the the first step (design phase: selecting covariates) but only 3 out of 14 are significant. And I want to know if it is considered balanced or not and what to do.
So if covariates are significant it won't be related to whether the values of those covariates are balanced across treatment conditions. To check balance you have to look at standardized mean difference and/or variance ratios values to see whether they are in some threshold you decide to use.
Do you include both the quadratic and non quadratic terms in your propensity match? For example, if my quadratic term had a lower SDM, should I remove the non quadratic term and just include the quadratic one in my final model?
This depends on your data and the type of relationships you want to capture and what makes sense specifically for the data you are working with. If you have a quadratic term and quadratic term for your explanatory variable in the model, you are saying that the relationship between your response and the explanatory variables is quadratic and linear (i.e., your model captures both), but just keeping the quadratic term you are saying the relationship is just quadratic. Generally, if you want to capture wider scope of relationships you can leave both but be mindful this could lead to overfitting.
What about SEM ? Can we do it in this package ????
I am not too familiar with that package. There may not be a specific function for irt models but it may have a more round about way of estimating irt parameters.
I needed to update the rcode to load and clean the dataset to get the data ready for the analyses. Please use the updated rcode here to follow along: gist.github.com/musa5237/78a694bd6663a92a82e45e684e616724
I've watched many tutorials explaining propensity score matching on TH-cam, and I can tell that this video is the best I've ever seen. Well done, sir. You helped me a lot. ❤❤❤❤
It was super helpful! Truly appreciate your clear demonstration and explanation. I have a quick question: if there are missing data in responses, how CDM package impute these missing data? Looking forward to hearing from you!
The data doesnt work anymore!
My apology for the delayed response, you can use the following code to load it into r: coll<-read.csv("gist.githubusercontent.com/musa5237/78a694bd6663a92a82e45e684e616724/raw/132430c291f72fc20a7df0ba951e9ce6a77e4902/Most%2520Recent%2520Cohorts%2520All%2520Data%2520Elements",sep=",", header=T)
when you type IRTpars = TRUE, simplify = TRUE in the coef() command, you will receive the actual threshold parameters b. The intercepts cannot be interpreted properly.
Thank you for the video. Please, I have a question about "factor.scores()", it doesn't work for me because I receive an error of a missing argument "f". Do you have any solutions to solve this problem ?
Thank you very much for a great video! What is the link between the MH chi square statistic and the effect size (A-C)? In my dataset the item with the largest MH-chi square statistic (629,49) has an effect size in the category "B", while another item has a MH-chi square statistic of 56.14 and is in the effect size category "C". I hope you can shed light on this mystery for me :)
Amazing videos! Just a question: I have a questionnaire with a 5likert scale. Shall I use the Rasch or GRM model?
For likert scale data you generally want to use a GRM.
Thank you very much for this clear explanation! I have a small question: would you use PSM to match patients to healthy controls in a cross-sectional case-controled study? I want to look at the difference in physical activity expressed in minutes per day (dependent variable) between these two groups. thank you!
Yes, PSM should always work when you have a control group.
Thank you for a lecture. I would like to ask some additional questions: 1. Model fit - do you recommend to use M2 statistics and global fit indices such as CFI, TLI, RMSEA etc. as a measure of model appropriateness? Or for a model comparison? 2. What measures do you recommend for the decision whether to use a bifactor or a two-dimensional model? 3. How can you in the bfactor function specify the other polytomous IRT models? For example, generalized partial credit model - does it allow the argument itemtype="gpcm" such as in the case of mirt function? Thank you very much!
"uh" (the video is good you need to train to avoiding repeating that 'word')
great video!!! what will you advise I do if I have more 'treated than control' and the matching approach to use if treatment is not randomized; take for example a state legislation
You can try using K to 1 matching and optimization or you can try full matching. You can run both and compare which gives you better balance across your covariates.
Very helpful, especially the logistic regression section. Thank you.
You have no idea how much you helped me with mirt. I used to play with other irt packages but this one is much more complex than others and often I get error messages that I don't when using other packages. Great thanks to you again, and for including bifactor model example (which I'm studying too). Rewatching your video, I noticed something. You added more d parameters than I typically found in other examples using mirt. You have d1-d4, which I assume is because of your dependent var having 5 categories? On this topic too, I would like to confirm one thing. Because in one psychological paper, I have seen the author testing for DIF using only which.par('c=a1') but not using 'd'. It wasn't explained so I wonder what's the difference? I believe 'a1' tests for only non-uniform DIF, while 'd' tests for uniform DIF, and 'a1'+'d' tests for both types of DIFs. If that's the case, would it be best to test for both types of DIFs by having 'a1'+'d'? I generally see no reason why you would test for just one type of DIF? what's your take on this?
Why do we suppress it to 0.25 when looking at the factors?
It's more or less just a convention for the sake of clarity of results. Personally, I prefer the 0.3 cut-off. This can be interpreted to mean that a factor with a loading of < 0.25 on a particular item can be considered a "factor of negligible influence". However, remember that this is an exploratory analysis, so we use this to estimate what the possible structure might be. In the confirmatory case, we explicitly assume that (if the particular item is an indicator of just one factor) the factor loading of the other factor is zero.
Thank you for this! Still I am unsure, which of these omegas should be reported in a methods part of a paper?
Deciding whether to report the Omega or Omega Hierarchical should depend on your instrument. Is the underlying factor structure a bifactor model, then you could report Omega Hierarchical to get an overall estimate of reliability despite the multi factor structure. If not, you can report Omega total. I linked to another source: journals.sagepub.com/doi/full/10.1177/2515245920951747
while im installing "MatchIt" it shows "There is no package called MatchIt". How to solve it?
Hello, just saw your post. Did you run the code library(MatchIt) first with out running install.packages("MatchIt") I did not install it again because I already installed it before. I kept that line in the code but put the hash sign # first so it was there as a note. Try running it without the hash sign.
@@statsguidetree Yes that's solved. Thanks!
Awesome job! Thank you for doing it, it's very helpful.
Thank you so much for this video! I hope I can get your consultation as I work on my analysis
Many thanks for the very instructive video. I am following your lectures. At the end the plot(genDIF, labels, etc etc) seems to automatically plot in a new device, opens new window, however only the 'last' plot is available, as it seems to overwrite all previous ones. I am using Windows 10 and R Studio. Having had a look at stackoverflow I don't seem to be able to find the answer. It is suggested that the plot function is intended to plot automatically in a new device, but this is not clear. Unfortunately, unless fixed this renders the whole plotting useless. Any ideas/suggestions?
I just ran into the same issue and haven't found a fix. Have you had any luck with this?
Thank you for informative video. I did full matching based on your video, and ran comparisons after propensity matching. But, mean, standard deviations and p score did not change at all compared to unmatched data. How can I solve this problem?
That is a good question, I assume you are talking about p-values in your final model post matching -- if that is the case, ultimately with PS matching you are attempting to just balance the data between your treatment and control groups to make more reliable interpretations of your final model. It could be that after balancing your data you find no average treatment effect.
Thank you for the great video.
for how many features logistic regression works well? I have over 300 features, deos logistic regression work or other model is suggested? thank you
I do not see much of a limit it is just your run time will be longer the larger the number of features you have. You may want to consider reviewing your data for like features, i.e., are there a cluster of features in your dataset that all provide the same information?