Quantile Regression as The Most Useful Alternative for Ordinary Linear Regression

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ก.ย. 2024

ความคิดเห็น • 100

  • @transportation-talk
    @transportation-talk ปีที่แล้ว +5

    Lots of ideas packed in a short video! Thank you for creating useful content and providing R code.

  • @LeSaboteur3981
    @LeSaboteur3981 7 หลายเดือนก่อน

    such a great, easy to understand explanation! way to few views fort that video. thanks a lot!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      Thanks you soo much! Glad you enjoyed it!

  • @marinal2705
    @marinal2705 ปีที่แล้ว +6

    This is such a great channel, I just started watching; I love that you run through the code quickly and present everything so well. Will keep tuning in to learn more!! Very interesting model, will try to implement in my practice.

  • @andycharles5127
    @andycharles5127 9 หลายเดือนก่อน

    Question: What is the minimum sample size for conducting quantile regression? I suspect this might be even more important when analyzing lower and higher quantiles where it is more likely to have fewer data points. Thx.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 หลายเดือนก่อน +1

      I am not sure, whether the is such thing as a limited sample size for QR. The model might collaps, not converge or produce huge confidence intervals. If you don't need p.values, then go with bayesian QR (brm function from brms package, it's in my article, link in the video denscription), then you are better off with small sample sizes. Cheers

  • @alexandregareau9120
    @alexandregareau9120 ปีที่แล้ว +1

    Does quantile regression requires greater sample size to make those split have enough statistical power? And is quantile regression be combined with interaction of a third variable?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว +2

      Good question! Yes, definitely, the more data the better QR works! Especially on the thin ends of the distribution. With smaller data sets, my approach is to use only the median regression (tau = 0.5) and even that usually makes a better job. For a highly skewed distributions, like a lot of values near 0, you can choose to study only the lower quantiles - that's the flexibility and I really love it ... don't know how I, as a statistician, lived so long without QR :)

    • @alexandregareau9120
      @alexandregareau9120 ปีที่แล้ว +1

      @@yuzaR-Data-Science Yes it is quite simple and can give much more nuanced results. Just this week, I found a positive linear relationship between AGE and QUALITY OF LIFE, but realized that the linear effect was quite different across age group, and even negative for the younger group. I used a grouping variable to find this, but with QR I could be more precise on the distribution of AGE. I remember learning this in graduate school but did not understand at that time the usefulness of it. Thank you for the amazing technical dissemination.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Cool, I am glad it is useful for more than just me :) Thanks for watching, mate!

  • @alijanbain2852
    @alijanbain2852 ปีที่แล้ว +3

    Before watching the video, I just want to say thanks a lot for your amazing work and fabulous videos.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Thanks, Ali! That means a lot to me! Hope you'll like the video also after watching it ;) Feel free to give any feedback. And thanks for watching!

  • @ahmedmohamed-i4d2w
    @ahmedmohamed-i4d2w หลายเดือนก่อน

    Please professor;
    What are the steps and conditions for applying time Series Quantile Regression?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน

      Oh, that would a good topic for another video! I do not use time-series much yet, but want to dive into it in the future. Thus, please, stay tuned! Thanks!

  • @LightInside-id1fm
    @LightInside-id1fm 5 หลายเดือนก่อน

    Honestly I had awful cringy feeling starting from the 1st minute. The utter fakeness of the voice kills , even slaughters otherwise decent content. The fashionable softness or radio host voice is a trap. Be yourself, stop following the hype, have authentic voice.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  5 หลายเดือนก่อน

      😂😂😂 bro, what are you talking about??? 😂😂😂 that's my voice and the only one I have ... thanks for calling my content "decent"! appreciate that! cheers!

  • @ahmedmohamed-i4d2w
    @ahmedmohamed-i4d2w 2 หลายเดือนก่อน

    Thank you for this useful information, please where can I find the R code?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 หลายเดือนก่อน +1

      thank you for watching. I only send code to members of the channel. thus, if you wish, hit the "join" button and ask for a code in the comments sections of a video you want the code for. cheers. but it's of coarse no must, you can just stop the video and write down the code at any time for free!

  • @SabinaMonti
    @SabinaMonti 5 หลายเดือนก่อน

    Very easy to follow, thak you. If you could share the R code will be great! (please note that the provided link to the code is not currently functioning)

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  5 หลายเดือนก่อน

      Thank you Sabina for you posistive feedback! Unfortunately my blog was blocked due to increased traffic. They want me to pay for it. I refuse since I do it for free. I’ll try to reopen it ASAP with free alternative, but in the meanwhile please just rewatch the videos, because my blog is the script for them, so you won’t miss anything. However, if you wanna get the R code now, consider to join my channel to become a member, because I already published the code for members in the community posts. Cheers

  • @ahmedmohamed-i4d2w
    @ahmedmohamed-i4d2w หลายเดือนก่อน

    What is meant Simplex methods in quantile regression?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน

      I don't the word "Simplex" in my script. What do you mean?

  • @Charlotte_lpy
    @Charlotte_lpy 4 หลายเดือนก่อน

    QR seems similar to heterogeneity analysis of OLS, doesn't it?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  4 หลายเดือนก่อน +1

      well, they both try to solve problems with unsatisfied assumptions, but I did not use heterogenety till now, while I love QR, which not only solves problems, but also delivers cool and a lot of results ;)

  • @manny1manito2
    @manny1manito2 ปีที่แล้ว +1

    Amazing just what i was looking for thank you for this video!

  • @jalepezo
    @jalepezo 6 หลายเดือนก่อน +1

    Amazing explanation bro ! Best regards from Peru!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  6 หลายเดือนก่อน

      Glad you liked it! Thank you for watching!

  • @zane.walker
    @zane.walker ปีที่แล้ว +1

    Well that noble prize is one step closer! Seriously, very impressive - I wonder why I haven't come across quantile regression in the past? Definitely something I will consider in the future.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Thanks, Zane! Yeah, totally, I used robust, bootstrapping, log-transform and other "tricks" to survive in a statistical way ... and I was looking for a median regression for a long time, and somehow did not come across QR. Now I'll use it almost always instead of OLS, because OLS often does not satisfy assumptions. Thanks for watching. Cheers

  • @dr.barunbiswas7132
    @dr.barunbiswas7132 5 หลายเดือนก่อน

    Love your videos. Great explanation. Btw, your blog post webpage is not working.😢 Please resolve.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  5 หลายเดือนก่อน

      Thank your Dr. for you generous feedback! Unfortunately my blog was blocked due to increased traffic. They want me to pay for it. I refuse since I do it for free. I’ll try to reopen it ASAP with free alternative, but in the meanwhile please just rewatch the videos, because my blog is the script for them, so you won’t miss anything. However, if you wanna get the R code now, consider to join my channel to become a member, because I send the code to members. Cheers

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  5 หลายเดือนก่อน

      Hi Dr.Barunbiswas, thanks for becoming a member! I posted the whole R code for the members in the community posts. Let me know whether you can access it. Kind regards!

  • @bartoszkedziora3256
    @bartoszkedziora3256 3 หลายเดือนก่อน

    Absolutely amazing

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 หลายเดือนก่อน

      Thanks so much 🙏 hope you enjoy other topics too!

  • @StregAnders
    @StregAnders ปีที่แล้ว +1

    I am absolutely going to try this out the next time I need to do some regression. Thanks a lot for all these amazing videos. I feel like I learn more and more about statistics every time I watch.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Glad, my content is useful! Thank for a nice feedback! And thanks for watching!

  • @gotobed7456
    @gotobed7456 ปีที่แล้ว +1

    This is such a great video, thank you!!!

  • @eimienwanlanibhagui4859
    @eimienwanlanibhagui4859 21 วันที่ผ่านมา

    Thank you for your video. I have a question. What happens when visual/qualitative and quantitative/numeric inspections of regression assumptions are in disagreement? Residuals, Std. Residuals, and Sqrt(Std. Residual) plots are all not horizontal (they show a pattern) yet BP test (check_heteroscedasticity)) is reporting that p > 0.05. Which test should you go for because it is suggested that you perform both visual and numerical tests.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  16 วันที่ผ่านมา

      Good question! You'll rarely meet all the assumptions perfectly. Visual diagnostics is more important than tests with their p-values. Take shapiro wilk normality test. For lots of data it will always be significant even for the perfectly normal distribution. You might then remove outliers, transform your data or even consider different type of model, like QR or GAM.

    • @eimienwanlanibhagui4859
      @eimienwanlanibhagui4859 16 วันที่ผ่านมา

      @@yuzaR-Data-Science Thank you. I read under the comments of this video or that of QR where you said you don't transform data because of the difficulty in interpretation. But in the literature, almost everyone does data transformation of some sort: logx, log(x+1), Box-Cox, square root, 1/x etc. I have seen only few studies where they re-transformed the data to its original form and then applied bias-correction technique. I think I would explore QR, GAM, and one I just recently came across: MARS - Multiple Adaptive Regression Spline (I think this is what the acronym represents). It is quite interesting that a non-normal dataset subjected to regression would comply with the assumptions of regression through quantitative tests yet fail the visual tests, especially the one for homogeneity of variance. Well my sample size is < 30 (28 or 29) and I think a sample size of 30 is the minimum often recommended for adequate statistical power. The variables in the data are environmental types and they are known not to be normally distributed. Finally, these variables represent mean values.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  16 วันที่ผ่านมา

      sure, most of the literature I know (and I know by far not all!) say you can transform, but don't build up on explaining that you need to transform back or any other next step. MARS is interesting, I'll look it up, thanks for the tipp. if you have

    • @eimienwanlanibhagui4859
      @eimienwanlanibhagui4859 16 วันที่ผ่านมา

      @@yuzaR-Data-Science Thank you once again. The variables are not normally distributed. Histograms show that and both Shapiro-Wilk and Anderson-Darling tests confirmed that. Then why LM? You have to follow orders 😁. There are high and low values of dependent variables in the data which cannot be removed. The data has been aggregated [mean] to a study location [N = 28]. Which is why I want to explore QR which I learnt from your video. I had thought about GLM and GAM prior to your comment on the latter. By the way, let me thank you for your video on the R package that allows you to compare the performances of model. The one that enables you to do a spider plot of the model evaluation metrics. Great work 🙌.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  16 วันที่ผ่านมา

      thanks again, mate! greatly appreciate your feedback. jea, I know those stupid orders you have to follow :) one of my favorite - we'll always done it like that. brrr, guise bumps. QR might help, but it's pretty data hungry. so, don't give up on QR when it produces huge CIs of if some quantile do not work. I use it for my next paper and the results it delivers is massive and insightful. May be try to use some simpler non-parametric methods, (mann-whitney or just median regression = 0.5 quantile in QR) so that your order-givers are not overwhelmed.

  • @ldsharma6546
    @ldsharma6546 หลายเดือนก่อน

    Sir, what is the difference between the cubist and quantile regression?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน

      Cubist regression builds regression models by recursively partitioning the data into smaller subsets and fitting piecewise linear models within each subset. Thus it estimates averages. QR estimates quantiles, especially median as 0.5th quantile.

    • @ldsharma6546
      @ldsharma6546 หลายเดือนก่อน

      ​@@yuzaR-Data-ScienceSir, Could you kindly demonstrate the Cubist model like you explained QR model

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน +1

      @@ldsharma6546 that's a good idea, I'll put it on my list. thanks!

    • @ldsharma6546
      @ldsharma6546 หลายเดือนก่อน

      @@yuzaR-Data-Science Sir, eagerly waiting, thank you

  • @TinaTina-xn9on
    @TinaTina-xn9on ปีที่แล้ว

    Hello Dear Smart Sir, do you know how to perform any of the applicable methods of quantile panel regression with fixed effects (Penalized quantile fixed effect, quantile correlated random effect or Canay 2011 method) in R studio or Stata?
    My panel is short where n or id=167 and t=8, so n/t=20.875. I am thinking of analyzing 9 quantiles {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}, or what the appropriate quantiles you think.
    In R studio, I tried "rqpd" package. Until now I have issues in the three models.
    1) The code for penalized fixed effect is as follows:
    rqpd(y ~ x1 + x2 + x3 ... x13 | id ,panel (method = "pfe", taus = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, tauw = c(*), lambda = **, ztol = ***), data = data)
    I do not know how to choose the appropriate weights in tauw. I do not know how to choose lambda or the penalty value (several choosing ways are explained in articles but no one shows how to code it in R), and I do not know how to choose ztol.
    2) The code for quantile correlated random effect is as follows:
    rqpd(y ~ x1 + x2 + x3 ... x13 | id | ??z?? ,panel(method = "cre", cre="ad", taus = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), data = data)
    The model forces me to include instrumental variables in ??z??. The problem is I assume that I do not have Endogeneity because I do not use treatment variable, and I do not know how to find Endogeneity so I assumed I do not need instrumental variables.
    3) The code for Canay 2011 method is unwritten in R. Although it is said that this method is the easiest one.
    If you know how to analyze my model in one of the proposed three methods or if you know a good method works with short panels to overcome the incidental parameters and asymptotic biases; except the "xtqreg" method in Stata because it gives me bad results . let us talk please.
    If you help me in analyzing it I will be glad to leave a tip .

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Unfortunately, I don’t know how to do that. If you find out, please, don’t hesitate to post it here for the community

  • @musicalive1782
    @musicalive1782 5 หลายเดือนก่อน

    Thank you

  • @alexdee9080
    @alexdee9080 ปีที่แล้ว

    Great video! I got the code for github and I ran into an issue with this line:
    cars %
    select(mpg, cylinders, displacement, horsepower, acceleration, origin)
    "Auto" is not defined anywhere. I figured it's mtcars with the columns renamed, but I'm having issues with "orign"...

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว +1

      Thanks, Alex!
      Sure. Just load library(ISLR), since "Auto" dataset coms from ISLR package.
      Cheers

  • @mubangansofu7469
    @mubangansofu7469 ปีที่แล้ว +1

    great insights...

  • @Luminun
    @Luminun ปีที่แล้ว +1

    What a great video!!

  • @cuysaurus
    @cuysaurus ปีที่แล้ว

    Dude. I

  • @elem2627
    @elem2627 ปีที่แล้ว

    thank you for the video, i have a question, if our variables include unit root, namely if they are I(1), should we use first differences in the model ?In some studies I saw that they used variable's first differences rather than variables itself since variables are I(1).

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Sorry for late replay mate! I was on holidays. Hmm, I never used "differences" in a model. But if I got your question correct (did I?) then it does not matter what is the predictor, a difference of something, or a power of something. The model checks the association between the predictors and outcome. Another thing is, I never use the power or roots of predictors because the interpretation suffers a lot. So, I try to stay close to the real data and sometimes may be use the log of the outcome, almost never the log of predictors. Cheers

  • @datascience1274
    @datascience1274 ปีที่แล้ว

    Hi, thank you for your videos!!! They are great! I actually would like to ask more of general statistics question. I was wondering if pratical significance can likely give good predictions. It's counterintuitive how statistical significance and predictions are often not realated. Based on the MSE decomposition, a decent trade off between bias and variance of the estimates should reduce the error. So, if they are both unbiased, predictions should be at least decent. Am I wrong on this?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว +1

      Hey man, thanks! I did not look into this good enough, but generally practical and statistical significance are two different concepts. Practical significance refers to the real-world importance or relevance of an observed relationship between variables, while statistical significance is a measure of the likelihood that an observed relationship is real, rather than being due to chance. Statistical significance does not necessarily imply that the observed relationship will be a good predictor of future outcomes, while having practical significance does not guarantee statistical significance. I think there is a trade-off between bias and variance in prediction error, and striking a balance between the two can improve the accuracy of predictions. However, other sources of error, such as measurement error or omitted variables, can also impact the accuracy of predictions. I as I said before, I am not the expert on that. Thanks for watching!

  • @soylentpink7845
    @soylentpink7845 ปีที่แล้ว

    Very good video - really liked how you motivated the topic and came back to the motivation in the end. Thank you! Really good.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Glad you liked it! Thanks for the feedback and for watching!

  • @ssardo
    @ssardo ปีที่แล้ว

    Thank you for a great video packed with interesting ideas and a realistic example and code!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      You are welcome 🙏 hope the rest of the channel is useful too!

  • @luizclaudiolouzada8741
    @luizclaudiolouzada8741 ปีที่แล้ว

    Uma outra sugestão, se é que me permite seria: ANOVA two-Way com enfase em interações

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว +1

      ANOVA two-Way com interações é, na verdade, um modelo usual de mínimos quadrados com interação. então, eu já fiz um pouco disso no visualize models parte 1, onde você também vê alguns conteúdos bônus, como emmeans e contrastes.. dê uma olhada no vídeo, se quiser

  • @luizclaudiolouzada8741
    @luizclaudiolouzada8741 ปีที่แล้ว

    Genial! Parabens pelo material!!!!
    Poderia pensar na possibilidade de montar um material sobre diff-in-diff?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว +1

      Obrigado pelo bom feedback e sua ótima ideia! Não posso prometer que vai aparecer logo, mas vou colocar na lista.

  • @Human2023v1
    @Human2023v1 8 หลายเดือนก่อน

    Very Nice video. Keep it up.

  • @JJGhostHunters
    @JJGhostHunters ปีที่แล้ว

    This is great! How can this be done in Python instead of R? Specifically, how can non-linear quantile regression be performed with Python?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      I don’t know yet. I specialize on R for the next time. Python might be coming later.

    • @JJGhostHunters
      @JJGhostHunters ปีที่แล้ว

      @@yuzaR-Data-Science Do you know how the following curves can be generated in R for a given data set? The figure is shown on wikipedia:
      en.wikipedia.org/wiki/Quantile_regression

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      sure:
      library(quantregGrowth)
      library(ISLR)
      set.seed(1)
      o % sample_n(1000), tau=seq(.25,.75,l=3))
      # par(mfrow=c(1,2)) # for several plots
      plot(o, legend=TRUE, conf.level = .95, shade=TRUE, lty = 1, lwd = 3, col = -1, res=TRUE)

    • @JJGhostHunters
      @JJGhostHunters ปีที่แล้ว

      @@yuzaR-Data-Science Thank you! I received the following error:
      Error in Wage %>% sample_n(1000) : could not find function "%>%"

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      load (install first if don't have it yet) "tidyverse" package. I do not know how to do that in Python. If you figure it out, please, let me know

  • @MannyBernabe
    @MannyBernabe ปีที่แล้ว

    Excellent.

  • @sreelakshmis2095
    @sreelakshmis2095 ปีที่แล้ว

    Great video! Please do the same on unconditional quantile regression

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      Thanks! There was only one package on CRAN or uqr, but it was removed. Besides, I have read about some limitations of uqr and never read any paper using it, not in my field. I even don't see many papers using a classic quantile regression, while more should. So, though, your idea is good, but instead of saying - cool, I'll put it on my to-do list - I wanna be honest and say I don't think I will produce a video on uqr any time soon. Kind regards, Yury

    • @sreelakshmis2095
      @sreelakshmis2095 ปีที่แล้ว

      @@yuzaR-Data-Science totally understand.. looking forward to more videos 👍👍

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      @@sreelakshmis2095 being produced ;)

  • @КостадинКостадинов-ц8е
    @КостадинКостадинов-ц8е ปีที่แล้ว

    Love these videos 🎉

  • @lawjef
    @lawjef ปีที่แล้ว

    Or… you can divide your sample into low, medium, high and construct different OLS models. Most of the benefits of quantile regression are that you dont need to think about your dataset before you start running regressions. But you need to think about your dataset at some point once you start running quantile regressions. Let’s be honest, its main use if for analysts who forgot to prep their data before running their models, as it allows them to perform after the fact adjustments which should have been done before you ran your models.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  ปีที่แล้ว

      :) What do you mean by "prepare the data"? and "think about the dataset"? I actually have nothing against OLS, in fact I would love to use it all the time .... but it never satisfies all the (it's own) assumptions with real world data. Stratifying data rarely solves those problems, like heteroskedasticity, outliers, dodgy distributions etc. Transforming data reduces interpretability. But I am very keep on learning new things and oven to suggestions. So, please, feel free to discuss!