Master Box-Violin Plots in {ggplot2} and Discover 10 Reasons Why They Are Useful
ฝัง
- เผยแพร่เมื่อ 23 พ.ย. 2023
- Boxplots display a wealth of useful information about the dataset. In this video, we'll start with the most basic boxplot, build every part of this notched box-violin plot in {ggplot2} step by step, and understand why every detail matters 😉
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳
Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!
As usual, the content does not disappoint. You always keep expectations high and deliver. Dopamine and serotonin run through my body every time you upload a new video. Once again Me Yury, thank you so much for your educational work.
Wow, thank you, Wil! That's by far the best feedback I have ever received! I'll try to make sure your dopamine and serotonin levels continue to rise 😉 Thanks for your support!
This is sooo great!!
Glad you liked it! Thanks for watching!
I love the way you explain these concepts. It's almost as if you live inside the data ❤
Glad you enjoy my explanations 😊 I probably sometimes live inside of the data 🙈😂 thank you for such a nice feedback! Much love!
Would you mind checking. In the first part you say the whiskers extend to the maximum and minimum but I think the geom_boxplot doesn’t go all the way to max and minimum-hence why there are outliers. From the documentation “The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called "outlying" points and are plotted individually.”
thanks for pointing it out, you are correct: maximum should have been defined as the largest value no further than 1.5 * IQR from the hinge. I guess, I just wanted to first describe the box, then outliers later, and this step by step slow explanation has a cost of not being able to be precise all the time. Being precise immediately would throw several concepts at the learner, like box, outliers, IQR, hinge ... I just hope that I compensated for it later in the video. Thanks again for being attentive!
@@yuzaR-Data-Science it was all very clear. Thanks for providing this material
@@shadyamigo glad you liked it! cheers, mate
I love your tutorials! They are soo informative, that I need to rewatch them in order not to miss any important detail :) thanks for doing this, keep up a great work!
Glad you like them!
Great. Keep going
Thank you, I will!
You absolutely set the bar dear. I can't wait to watch it again and again. Can you share the codes as pdf or some other method so that I can practice on my own. Thanks.
Thanks again for such a great feedback! I am very happy it's useful! As I sad in the other comment of yours, please, feel free to rewatch and pause the video to write down the code yourself, since it is a good learning strategy. Better then copy-pasting. But if you wish to have the hole code, consider to join the channel (it's the join button below every video) and I'll send you the code. Kind regards!
A very informative (and well produced) 17 minute video. I picked up on your trick of wrapping a plot inside of a ggplotly command a video or two ago and find it very useful (wish I had discovered that earlier)! Also, some nice tips on adding mean, CI, etc. to the standard boxplots. I like using the ggbetweenstats command. which I started using after one of your earlier videos, on small sets of groups but they don't always work that well with larger numbers of groups. Adding more information to standard boxplots seems like a good compromise. Very much appreciate your videos and thank you for sharing your insights!
thanks indeed for such a nice feedback! I very much enjoy creating content and the fact that it's useful for more people than just me, means a lot to me! appreciate your support!
Very educative and simple to understand
Thanks for your nice feedback!
super high quality material presentation
Thanks a lot, Tarass! I also enjoy creating content!
Great and very informative
Glad you liked it! Thanks for watching!
Hi Yuri, really thanks for sharing this knowledge! This was fantastic to open the mind to the possibilities of this plot. Please, I wonder if you could share the code? Thanks
Hi Marcos, thanks a ton for joining the channel! Your support is much appreciated! Of coarse you can have the code. I just posted it on the community tab for members only. Please, let me know whether you can see/find it. Kind regards! Yury
Great. Thanks for the up.
Please upload some tutorial related raster data manipulation In R,. That would be really helpful.
thanks a lot!
thanks for the idea, I did not know about the raster data manipulation yet, but I'll have a look at it and put it on my list of tutorials I plan to do. thank you for watching!
@@yuzaR-Data-Science Thanks.
you are very welcome!
Hi Youry,
I really ike your videos and they make me want to learn more of R and Data Science :)
Do you you have any recommandations for students who want to master both?
I am looking forward to the next video!
Yes, definitely, Moritz. The best start in my opinion is the R4DS book. The best finish is the tidymodels book. Both are online and free. In between you'd need to go through a few classic statistics book, learn and compute statistical tests and models. Some of the topics you'll find on my channel. This will prepare you for machine learning. Thanks for such a nice feedback! I am glad my content is useful!
@@yuzaR-Data-Science Thank you for the tips :)
you are very welcome @@MoritzSchorn
Excellent video as always, thank you so much for sharing this. One question, you mentioned replacing or removing incorrect representations of sample sizes on the x.axis that materialize as a result of further splitting the plots into smaller sub-plots. What approach would you use to still display sample sizes on your plot after splitting them into sub-plots i.e., replacing and not simply removing them?
Thanks for the excellent question! I knew I'll get this question, because I asked myself the same one :) I don't have a quick solution for it, to be honest, because there is already a function, which does calculate the sample size and puts the values on the x-axis. So, I never needed to figure it out. It only works with one additional variable, though. Here is this function:
library(ggstatsplot)
grouped_ggbetweenstats(data = Wage, x = education, y = wage, grouping.var = health_ins)
Thank you, grouped_ggbetweenstats() works really well and adds useful additional info. To simply add sample sizes to the already existing plot, adding stats_n_text() from EnvStats package works really well too:
p6+
facet_grid(jobclass ~ health_ins)+
stats_n_text(y.pos=5).
But that displays sample sizes on the plot and not the x.axis.
you can also produce separate plots and put them together at any time if this would reduce the complexety of programming. patchwork is there an amazing package, I will release a review of this one very soon.
Hi Youzar,
Great video, and I really like the random jokes thrown in here and there. Keep it up!
/ Kenneth
Thanks for the feedback, Kenneth! :) It's good to see that people get my jokes. Because I am never sure, whether they are funny to more people than just me 😁
@@yuzaR-Data-Science Do you have any videos on how to connect to a cloud or local SQL-server in R?
not yet, I might come in a distant future, until then I plan to cover some modelling and machine learning topics.