R demo | Correlation | Pearson, Spearman, Robust, Bayesian | How to conduct, visualise and interpret

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ม.ค. 2022
  • Having two numeric variables, we often wanna know whether they are correlated and how. One simple command can answer both questions by visualizing the data and conducting frequentists and bayesian correlation analysis at the same time. So, let’s learn how to do that, how to interpret all these results and how to choose the right correlation method in the first place.
    Here is a quick R code:
    install.packages("ggstatsplot")
    library(ggstatsplot)
    ggscatterstats(
    data = mtcars,
    x = mpg,
    y = hp,
    type = "p") # or "np" or "r"
    ?ggscatterstats
    If you only want more code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
    Enjoy! 🥳

ความคิดเห็น • 50

  • @Rumil_
    @Rumil_ 2 ปีที่แล้ว +2

    Wow this is golden. I truly appreciate the awesome editting and reasons and explanations behind the interpretations. Thank you and look forward to watching more!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 ปีที่แล้ว

      You're very welcome, Rumil! :) I am glad it is useful not only to me :)

  • @oousmane
    @oousmane 2 ปีที่แล้ว

    Amazing Yury, always clear. Love your tuts !

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 ปีที่แล้ว +1

      Thanks, Ousmane! Glad you like them! 😊 More to come!

  • @so4ragb
    @so4ragb 2 ปีที่แล้ว +2

    you always have the best and very clearly understandable tuts. Always eagerly waiting for the next. 1000x thanks

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 ปีที่แล้ว +1

      1000x thanks for the feedback! 😊 More to come!

  • @hikeaway1596
    @hikeaway1596 หลายเดือนก่อน

    top content, very concise and to the point! thanks!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน

      waw, thanks for such a generous feedback!

  • @joeyoviedo5202
    @joeyoviedo5202 7 หลายเดือนก่อน

    Subscribed! I am very excited to explore your video playlists. Thank you!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      Awesome, thank you! :) hope you like the rest! Cheers

  • @user-sm1se3sq5x
    @user-sm1se3sq5x 7 หลายเดือนก่อน

    REALLY AWESOME . Very clear tut.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน +1

      Glad you think so! Thanks 🙏 You might also like the rest

  • @ErickAmkoa
    @ErickAmkoa ปีที่แล้ว

    This is good. Thank you

  • @Dewisd2002
    @Dewisd2002 4 หลายเดือนก่อน

    Thank you soo much!!

  • @Dergicetea
    @Dergicetea หลายเดือนก่อน

    This video has been awesome to watch, Sir.
    I have, though, 2 small questions. Where could I find the step before of a shapiro-wilk or kolmogorov-smirnoff test for normality? I'm new in R, by the way. And a little question about the aesthetic appearence of the present correlation graph, is it possible to change the colours within this function ggstatsplot? I mean, if it could be, for example, one just simple colour but with different tonality for the variable x and y. Is that possible?
    I thank you so much for the answers in advance, Sir.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน +1

      Hi man, "normality" is the function which conducts shapiro-wilk. I do not recommend kolmogorov-smirnoff. you can change colors. for that just write ?ggscatterstats in the console of RStudio, hit enter and explore the possibilities. Cheers

  • @WilForDataScience
    @WilForDataScience 9 วันที่ผ่านมา

    Hey there! I'm wondering where you get the information about the conventional thresholds for interpretation (like for p-values, Bayes, etc). There are so many different versions from different authors out there, which one should we trust? I'm really struggling to make up my mind! I already know about the effectsize package, but should we trust their frames of reference? Thanks in advance sir.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 วันที่ผ่านมา +1

      Oh man, I totally get it! It made me also crazy when there are two different interpretations of the same effectsize. Where is the truth? Not even statistitians know, they only can defend their opinion. Thus, I also decided to take the one which make the most sense for me with the reference to it. The reference is important, because then you have the source you trust and the others can reproduce and build on your knowledge. When you ask RStudio in this way "?interpret_eta_squared()" you'll get all the references you need. Hope that helps! Cheers

    • @WilForDataScience
      @WilForDataScience 7 วันที่ผ่านมา

      @@yuzaR-Data-Science Thank you so much for the response. I know that feeling too man. I am going to check that right away!

  • @paoloemiliobartolucci9844
    @paoloemiliobartolucci9844 3 หลายเดือนก่อน

    Wow , super clear explanation. If my variables x and y are non linear and I use spearman's instead of Pearson's, how can I graphically justified that? I mean, using scatterstats I see a lm model blue line, how can I replace with a monotonic curve that describes better my association ?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 หลายเดือนก่อน +1

      thanks. in case of non-linearity the best way is to get a gam model. but be careful with the interpretatino of the coeffitient, it's not a linear slope anymore. What you can also do after you have seen the pattern, you can split the predictor into several categories and do anova or kruskal wallis with this.

    • @paoloemiliobartolucci9844
      @paoloemiliobartolucci9844 3 หลายเดือนก่อน

      Thanks for the explanation :)@@yuzaR-Data-Science

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 หลายเดือนก่อน

      you are very welcome!

  • @motomarx
    @motomarx 9 หลายเดือนก่อน

    Can't get started after installing. I'm returned 'no package called dplyr' on command of line 2 and at line 4. I installed it successfully but not sure if I missed you mentioning another package to install

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 หลายเดือนก่อน

      yes, it seems to be a package problem. generally, keep r, rstudio and most of the packages uptodate. espetially install, or update the {tidyverse} the {easistaty} and {ggstatsplots}. If the error message says that some other package is missing, install those too. hope that helps!

  • @andredasilvapereira150
    @andredasilvapereira150 2 ปีที่แล้ว

    cool!

  • @jtwest8
    @jtwest8 3 หลายเดือนก่อน

    Hi! I'm trying to replicate the analysis you showed but the package no longer exists. Can you share where this function can now be found?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 หลายเดือนก่อน

      I works perfectly on my PC. Have you installed and loaded the package?
      install.packages("ggstatsplot")
      library(ggstatsplot)
      ggscatterstats(mtcars, mpg, wt)

  • @samihahzura4735
    @samihahzura4735 2 ปีที่แล้ว

    Hi, nice video. How about stats for 3 variables?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 ปีที่แล้ว +3

      Thanks, Samihah! It really depends on these 3 variables. If you mean a correlation, you can check out my video on correlation matrix. If not, you can check out my very first video (don't expext a good quality there please), where I showed a small table of 4 variables and explained what kind of analysis you can do with them. Starting with a categorical goodness off fit test and finishing up with the linear and logistic regression.

    • @samihahzura4735
      @samihahzura4735 2 ปีที่แล้ว

      @@yuzaR-Data-Science Thanks for suggestions. I'll checked on it!

  • @ekaterinanikitina1092
    @ekaterinanikitina1092 2 ปีที่แล้ว

    У вас очень понятные видео для новичков! Спасибо! Не могли бы вы посоветовать курсы или специализацию онлайн по биостатистике?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 ปีที่แล้ว +1

      Спасибо! Приятно слышать! Мне помогли курсы на стэпике. Особенно курсы Anatoliy Karpovа. У него походу уже свой сайт где есть (CEO KarpovCourses). Он очень хорошо объясняет.

    • @ekaterinanikitina1092
      @ekaterinanikitina1092 2 ปีที่แล้ว

      @@yuzaR-Data-Science да, у Анатолия я прошла статистику. А R вы где изучали?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 ปีที่แล้ว +1

      в основном сам по книгам, которых много онлайн и бесплатныx. Но если бы я начинал сначала, я бы сам себе посоветовал сконцентрироваться на одной книге - R4DS r4ds.had.co.nz/ . Кроме того можешь посмотреть на мой блог > yuzar-blog.netlify.app/ этих двух рессурсов более чем достаточно для начала

    • @ekaterinanikitina1092
      @ekaterinanikitina1092 2 ปีที่แล้ว

      @@yuzaR-Data-Science спасибо!

  • @mayurwabhitkar2041
    @mayurwabhitkar2041 11 หลายเดือนก่อน

    can we do multiple correlation using this sir ?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  11 หลายเดือนก่อน

      of coarse: use "grouped_ggscatterstats" function. Moreover, I make a 4 minutes Video on correlation matrix in R. I think it's exactly what you need.

    • @mayurwabhitkar2041
      @mayurwabhitkar2041 10 หลายเดือนก่อน

      @@yuzaR-Data-Science yes plzz sir, would really appreciate it and also, i like your videos a lot,.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  10 หลายเดือนก่อน

      @@mayurwabhitkar2041 sorry, mispelled, I wanted to say "I made", so the video is already online for ages ;)

    • @mayurwabhitkar2041
      @mayurwabhitkar2041 10 หลายเดือนก่อน

      @@yuzaR-Data-Science which one is it sir can you share the link, i mam unable to find it

  • @WilForDataScience
    @WilForDataScience 6 วันที่ผ่านมา

    Hey sir, just for information, it seems like the package is under maintenance or remission because the feature no longer works. I tried several datasets and variables, even copied your example character by character, but it just always shows the same error:
    `stat_xsidebin()` with `bins = 30`. Choose a better value with `binwidth'.
    `stat_ysidebin()` with `bins = 30`. Choose a better value with `binwidth`.
    Error in `plot_theme()`:
    ! The `ggside.axis.minor.ticks.length' theme element must be a object.
    I've tried to troubleshoot it but no success jet, and I know it's out of your control, but I just wanted to give you a heads up.
    PD: I noticed one drawback to this feature: it only has 4 types of correlations, and you cannot use e.g. Kendall's, Gaussian's or Shepherd's correlation, which is not bad in itself, but it would be great to test these other types of correlations as well.
    PD2: I found a sort of alternative with the easystats correlation package (easystats.github.io/correlation/), which offers a large number of methods and a very similar plot (like plot(cor_test(iris, "Sepal.Width", "Sepal.Length")), but it only shows the frequentist calculation at once (as far as I know). would you consider doing a review of this package or even the other easystats packages (you have already done some 😉)?
    As always, thank your for your labor and fast responses.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  6 วันที่ผ่านมา +1

      hey man, thanks for the update. Interestingly, the ggstatsplot works perfectly on my computer. So, it might be some dependency which is not updated. Try to update all the packages your have (espetially ggside) and R version. Sure, I also wanted to suggest "correlation" package as I was reading your message. I love the whole easystats environment, and was thinking about doing further packages reviewes, but desided to wait and do modelling first, which is what I working on right now. I might do those packages eventually in the future :) cheers

    • @WilForDataScience
      @WilForDataScience 6 วันที่ผ่านมา

      @@yuzaR-Data-Science it solved my problem: ggside was not to date. Rookie Mistake Hahahaha. thank you so much. I am looking forward to see the modeling reviews. The tidymodels is a marvelous but overwhelming world. Thanks sir

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  6 วันที่ผ่านมา +1

      @@WilForDataScience I've been there ;) one update and all the troubles are gone.