Master Box-Violin Plots in {ggplot2} and Discover 10 Reasons Why They Are Useful

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 พ.ย. 2023
  • Boxplots display a wealth of useful information about the dataset. In this video, we'll start with the most basic boxplot, build every part of this notched box-violin plot in {ggplot2} step by step, and understand why every detail matters 😉
    If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
    Enjoy! 🥳
    Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
    This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

ความคิดเห็น • 44

  • @WilOspinoC
    @WilOspinoC 7 หลายเดือนก่อน +3

    As usual, the content does not disappoint. You always keep expectations high and deliver. Dopamine and serotonin run through my body every time you upload a new video. Once again Me Yury, thank you so much for your educational work.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      Wow, thank you, Wil! That's by far the best feedback I have ever received! I'll try to make sure your dopamine and serotonin levels continue to rise 😉 Thanks for your support!

  • @akanequeen
    @akanequeen 3 หลายเดือนก่อน

    This is sooo great!!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 หลายเดือนก่อน

      Glad you liked it! Thanks for watching!

  • @eliapp
    @eliapp 7 หลายเดือนก่อน

    I love the way you explain these concepts. It's almost as if you live inside the data ❤

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      Glad you enjoy my explanations 😊 I probably sometimes live inside of the data 🙈😂 thank you for such a nice feedback! Much love!

  • @shadyamigo
    @shadyamigo 7 หลายเดือนก่อน +2

    Would you mind checking. In the first part you say the whiskers extend to the maximum and minimum but I think the geom_boxplot doesn’t go all the way to max and minimum-hence why there are outliers. From the documentation “The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called "outlying" points and are plotted individually.”

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      thanks for pointing it out, you are correct: maximum should have been defined as the largest value no further than 1.5 * IQR from the hinge. I guess, I just wanted to first describe the box, then outliers later, and this step by step slow explanation has a cost of not being able to be precise all the time. Being precise immediately would throw several concepts at the learner, like box, outliers, IQR, hinge ... I just hope that I compensated for it later in the video. Thanks again for being attentive!

    • @shadyamigo
      @shadyamigo 7 หลายเดือนก่อน

      @@yuzaR-Data-Science it was all very clear. Thanks for providing this material

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      @@shadyamigo glad you liked it! cheers, mate

  • @hikeaway1596
    @hikeaway1596 7 หลายเดือนก่อน

    I love your tutorials! They are soo informative, that I need to rewatch them in order not to miss any important detail :) thanks for doing this, keep up a great work!

  • @statlab_stat.solution
    @statlab_stat.solution 7 หลายเดือนก่อน

    Great. Keep going

  • @moviezone8130
    @moviezone8130 หลายเดือนก่อน

    You absolutely set the bar dear. I can't wait to watch it again and again. Can you share the codes as pdf or some other method so that I can practice on my own. Thanks.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน

      Thanks again for such a great feedback! I am very happy it's useful! As I sad in the other comment of yours, please, feel free to rewatch and pause the video to write down the code yourself, since it is a good learning strategy. Better then copy-pasting. But if you wish to have the hole code, consider to join the channel (it's the join button below every video) and I'll send you the code. Kind regards!

  • @zane.walker
    @zane.walker 7 หลายเดือนก่อน

    A very informative (and well produced) 17 minute video. I picked up on your trick of wrapping a plot inside of a ggplotly command a video or two ago and find it very useful (wish I had discovered that earlier)! Also, some nice tips on adding mean, CI, etc. to the standard boxplots. I like using the ggbetweenstats command. which I started using after one of your earlier videos, on small sets of groups but they don't always work that well with larger numbers of groups. Adding more information to standard boxplots seems like a good compromise. Very much appreciate your videos and thank you for sharing your insights!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      thanks indeed for such a nice feedback! I very much enjoy creating content and the fact that it's useful for more people than just me, means a lot to me! appreciate your support!

  • @suelook9562
    @suelook9562 5 หลายเดือนก่อน

    Very educative and simple to understand

  • @tarasst6887
    @tarasst6887 7 หลายเดือนก่อน

    super high quality material presentation

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      Thanks a lot, Tarass! I also enjoy creating content!

  • @juliusirungu1363
    @juliusirungu1363 7 หลายเดือนก่อน

    Great and very informative

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      Glad you liked it! Thanks for watching!

  • @Marcosls2015
    @Marcosls2015 หลายเดือนก่อน

    Hi Yuri, really thanks for sharing this knowledge! This was fantastic to open the mind to the possibilities of this plot. Please, I wonder if you could share the code? Thanks

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  หลายเดือนก่อน +1

      Hi Marcos, thanks a ton for joining the channel! Your support is much appreciated! Of coarse you can have the code. I just posted it on the community tab for members only. Please, let me know whether you can see/find it. Kind regards! Yury

  • @Walker-nb9de
    @Walker-nb9de 7 หลายเดือนก่อน

    Great. Thanks for the up.

    • @Walker-nb9de
      @Walker-nb9de 7 หลายเดือนก่อน

      Please upload some tutorial related raster data manipulation In R,. That would be really helpful.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      thanks a lot!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      thanks for the idea, I did not know about the raster data manipulation yet, but I'll have a look at it and put it on my list of tutorials I plan to do. thank you for watching!

    • @Walker-nb9de
      @Walker-nb9de 7 หลายเดือนก่อน

      @@yuzaR-Data-Science Thanks.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      you are very welcome!

  • @MoritzSchorn
    @MoritzSchorn 7 หลายเดือนก่อน

    Hi Youry,
    I really ike your videos and they make me want to learn more of R and Data Science :)
    Do you you have any recommandations for students who want to master both?
    I am looking forward to the next video!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน +1

      Yes, definitely, Moritz. The best start in my opinion is the R4DS book. The best finish is the tidymodels book. Both are online and free. In between you'd need to go through a few classic statistics book, learn and compute statistical tests and models. Some of the topics you'll find on my channel. This will prepare you for machine learning. Thanks for such a nice feedback! I am glad my content is useful!

    • @MoritzSchorn
      @MoritzSchorn 6 หลายเดือนก่อน

      @@yuzaR-Data-Science Thank you for the tips :)

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  6 หลายเดือนก่อน

      you are very welcome @@MoritzSchorn

  • @sebbikankondi5546
    @sebbikankondi5546 6 หลายเดือนก่อน

    Excellent video as always, thank you so much for sharing this. One question, you mentioned replacing or removing incorrect representations of sample sizes on the x.axis that materialize as a result of further splitting the plots into smaller sub-plots. What approach would you use to still display sample sizes on your plot after splitting them into sub-plots i.e., replacing and not simply removing them?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  6 หลายเดือนก่อน +1

      Thanks for the excellent question! I knew I'll get this question, because I asked myself the same one :) I don't have a quick solution for it, to be honest, because there is already a function, which does calculate the sample size and puts the values on the x-axis. So, I never needed to figure it out. It only works with one additional variable, though. Here is this function:
      library(ggstatsplot)
      grouped_ggbetweenstats(data = Wage, x = education, y = wage, grouping.var = health_ins)

    • @sebbikankondi5546
      @sebbikankondi5546 6 หลายเดือนก่อน

      Thank you, grouped_ggbetweenstats() works really well and adds useful additional info. To simply add sample sizes to the already existing plot, adding stats_n_text() from EnvStats package works really well too:
      p6+
      facet_grid(jobclass ~ health_ins)+
      stats_n_text(y.pos=5).
      But that displays sample sizes on the plot and not the x.axis.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  6 หลายเดือนก่อน

      you can also produce separate plots and put them together at any time if this would reduce the complexety of programming. patchwork is there an amazing package, I will release a review of this one very soon.

  • @kennethgottfredsen767
    @kennethgottfredsen767 7 หลายเดือนก่อน

    Hi Youzar,
    Great video, and I really like the random jokes thrown in here and there. Keep it up!
    / Kenneth

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 หลายเดือนก่อน

      Thanks for the feedback, Kenneth! :) It's good to see that people get my jokes. Because I am never sure, whether they are funny to more people than just me 😁

    • @kennethgottfredsen767
      @kennethgottfredsen767 7 หลายเดือนก่อน

      @@yuzaR-Data-Science Do you have any videos on how to connect to a cloud or local SQL-server in R?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  6 หลายเดือนก่อน

      not yet, I might come in a distant future, until then I plan to cover some modelling and machine learning topics.