Boxplots in R with ggplot and geom_boxplot() [R- Graph Gallery Tutorial]

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • In this tutorial I show you how to create Boxplots in R with geom_boxplot() and ggplot(). The examples are based in the R-Graph Gallery. I show how boxplots can be used to visualize multiple different distributions at once. I will also walk you through many different parameters and function arguments that allow you to customize your boxplots in many ways.
    ⏱ Time Stamps ⌚
    0:00 - Intro and video overview
    1:31 - Boxplot theory and outlier rule
    5:30 - Basic boxplots with geom_boxplot()
    6:40 - Function arguments and notching
    8:50 - Change the colors of boxplots
    9:57 - Highlight a single boxplot
    10:54 - Grouping boxplots
    12:08 - Adding the average with stat_summary()
    12:55 - Adding points with geom_jitter and geom_dotplot
    14:06 - Adding boxplots in the margins of a scatterplot
    14:36 - Final example and outro animation
    External Links:
    www.r-graph-gallery.com/boxpl...
    www.data-to-viz.com/caveat/bo...
    Background Music:
    • PARASITE EVE 2 SOUNDTR...
    Outro Animation:
    AA-VFX Motion Backgrounds
    • 4K Relaxing Live Wallp...

ความคิดเห็น • 43

  • @TheDataDigest
    @TheDataDigest  2 ปีที่แล้ว +7

    Below is the code I used for the thumbnail (overlay of boxplot over density plots):
    red

    • @giulianabeltramone8383
      @giulianabeltramone8383 2 ปีที่แล้ว +1

      Thank you so much! I will try them right away! Thank you for sharing!!

    • @bernardrobenson5071
      @bernardrobenson5071 ปีที่แล้ว +1

      Month SDSM CHIRPS GCMs
      Jan 4 0 16
      Feb 1 2.3 28
      Mar 16.8 17 13
      Apr 28 25 89
      May 57 55 98
      Jun 27 23 42
      Jul 17 15 74
      Aug 79 70 130
      Sep 24 20 39
      Oct 19 14 9.3
      Nov 5 0 21
      Dec 8 2 19.5 ... This is my dataset, I wanted to group the way you perfectly did. But I could not make it. I want to build three variables corresponding to each month. I would be thankful if you have time to help ..

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +1

      @@bernardrobenson5071 If you have that data as a data.frame, you might want to you str() to check that. I do recommend to make a bar chart and use facet_wrap to show the three variables and their changes over time. Please try out the code below:
      # you might have to turn Month into a factor for proper order.
      data$Month % pivot_longer(-Month) %>%
      ggplot(aes(x = Month, y = value, fill = name)) +
      geom_col() +
      facet_wrap(~name, ncol = 1)
      # if you really need a boxplot, I can only see it as dots per variable for each month, then please try this:
      data %>% pivot_longer(-Month) %>%
      ggplot(aes(x = name, y = value, fill = name)) +
      geom_boxplot() +
      geom_jitter()
      I will soon have a email address for subscriber questions. I will post it to you next week in case you have further questions.

    • @bernardrobenson5071
      @bernardrobenson5071 ปีที่แล้ว +1

      @@TheDataDigest I sincerely Thank you for the quick answer and assistance. I want to build a boxplot. I tried several times and the result is that the boxplots in each month look like a flat tiny line. I was wondering why.. In your tutorial you described grouped plots through variety, treatment,and value. I have done almost the same and the work was oky but I could not make it. Maybe I am making a mistake with value repeating.. Also the box plot only covered three months and the result of the three months was repeated again on the other months.. ... Thanks again and always you are a great scientist, for your rapid response and all attempts you keep testing to help. You deserve all appreciation and respect .

    • @bernardrobenson5071
      @bernardrobenson5071 ปีที่แล้ว

      @@TheDataDigest Thanks for your help.. I do it this way and It works. The code is shown below:
      ggplot(data_range_long, aes(x = Month, y = Value, fill = Method)) +
      geom_boxplot (width = box_width, alpha=alpha_value) +
      scale_fill_manual(values = c("CHIRPS" = "blue", "SDSM" = "green", "GCMs" = "red"))+
      scale_x_discrete(limits = month.abb)+
      ylab("Precipitaion averages (mm)")+
      xlab("Month")+ theme_bw()+
      theme(legend.position="top")+
      theme(legend.title=element_blank())+
      theme(axis.title.y = element_text(size = 16),
      axis.title.x = element_text(size = 16))+
      theme(legend.text = element_text(size = 14))+
      scale_colour_manual(values = desired_order, labels = desired_labels)+
      theme(plot.margin = margin (0.3,0.6,0.5,0.5,"cm"))+
      theme(axis.text.x = element_text(size = 10))+
      theme(axis.text.y = element_text(size = 12))

  • @mocabeentrill
    @mocabeentrill ปีที่แล้ว +2

    WOW! What a comprehensive tutorial. Thank you very much.

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว

      I am glad that you liked it and left such a nice comment.

  • @katiea2487
    @katiea2487 2 ปีที่แล้ว +5

    This has been the most helpful video while making figures for my dissertation!! Thank you

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว

      Glad to hear that Katie :) Thanks for sharing the compliment by leaving a comment. In which area do you write your thesis if I may ask? My background is in biochemistry and evolutionary biology.

  • @nicolastovar8121
    @nicolastovar8121 2 ปีที่แล้ว +2

    Thanks man :3 I´m from Colombia and your videos are amazing!

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว

      Hi Nicolás, thank you for the comment! I am glad you like my content so far. ^_^
      I love how TH-cam "brings together" people from all around the world.

  • @rachitsingh98
    @rachitsingh98 2 ปีที่แล้ว +2

    Thank you very much, this was really helpful. Lot of useful information packed in a short video and explained clearly as well. Wonderful 👌🏽

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว

      Thanks for the kind words Rachit. Glad you liked it and found it helpful. 😊

  • @pascal3327
    @pascal3327 2 ปีที่แล้ว +2

    You are genius. Thank you so much for this amazing lecture.

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว

      Thanks for the compliment. Glad you enjoyed the content!

  • @DmitryPonomareF
    @DmitryPonomareF ปีที่แล้ว

    wow, super! Thanks!

  • @binhomosta4593
    @binhomosta4593 2 ปีที่แล้ว +1

    Great tutorial! Thanks a lot.

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว +1

      Thanks for the comment. Glad it was helpful for you.

  •  3 ปีที่แล้ว +1

    Thanks 👏👏

  • @bkarim7349
    @bkarim7349 2 ปีที่แล้ว +1

    thanks, very very useful

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว

      Thanks for leaving a comment. That's the goal with this videos. I learn a lot about different ways to plot data and enjoy teaching others along the way.

  • @giulianabeltramone8383
    @giulianabeltramone8383 2 ปีที่แล้ว +3

    I wish I could have seen this video before presenting my thesis! Thank you very much!
    I was wondering how you add the density plots under the boxplots?

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว +1

      Congratulations for presenting/finishing your thesis. I bet it went well even without some of these plots. In R you can add different plot types (geoms) on top of each other ones the aes(x ,y, color, fill) mapping has been done in ggplot(). Then you can do "+ geom_density() + geom_boxplot(). But let my post the code in a separate comment on top

  • @erichideki4994
    @erichideki4994 11 หลายเดือนก่อน +1

    but how can I avoid duplicating the circles on outliers? for example we can see for a single data a red circle and also a black one. Thank you and so useful video!

    • @TheDataDigest
      @TheDataDigest  11 หลายเดือนก่อน

      You can remove the outliers within the geom-function:
      geom_boxplot(outlier.shape = NA)
      Thanks for leaving a comment. Glad you like the videos.

  • @martastaff9186
    @martastaff9186 ปีที่แล้ว +1

    Hi! I seem to be struggling to produce the boxplots per row using the ggplot. My dataframe consists of 10k values in each row that I need to visualise as an individual boxplot. Any suggestions?

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว

      Hi Marta, I think the fastest way to help you is, if you send me a subset (or the whole) of your data with the R-code that you tried out so far. Also an image of a boxplot you want it to look like would be useful. You could email that to: question@thedatadigest.email

  • @bernardrobenson5071
    @bernardrobenson5071 ปีที่แล้ว +1

    Thanks for this great tutorial. However, I have tried to follow your steps but I still facing difficulty building and grouping Boxplot for three columns of data vs months. If you can help I would be appreciate

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +1

      The issue might come from the data structure. Do you have 3 columns, one for category, one for month and one for the actual data, with repeating month? Then you can stack or dodge the categories and have month on the x-axis. Feel free to post the code that gave you error messages.

    • @bernardrobenson5071
      @bernardrobenson5071 ปีที่แล้ว +1

      @@TheDataDigest thanks for the reply. My data consist of a month names column, and three columns of numbers (integers). Each column represents different climate data; ground observations, CMIP5, and CMIP6.. I appreciate your help

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +1

      @@bernardrobenson5071 Can you give this code a try:
      library(tidyverse)
      example % pivot_longer(-month) %>%
      ggplot(aes(x = month, y = value, fill = name)) +
      geom_col(position = "dodge")
      Alternatively you can use geom_col(position = "stack") at the end.
      The pivot_longer function is the crucial step to turn the data into long format. Then you have month, name and value that you can use within the aes() mapping in ggplot().
      Let me know if that helped.

    • @bernardrobenson5071
      @bernardrobenson5071 ปีที่แล้ว

      @@TheDataDigest Thanks for the quick answer. I will run the code and see the result. Thank again

    • @bernardrobenson5071
      @bernardrobenson5071 ปีที่แล้ว +1

      @@TheDataDigest I think this code is for barplot. :: I am looking for boxplot code if you can help.

  • @zainabpirbhai1660
    @zainabpirbhai1660 3 ปีที่แล้ว +1

    Can you help me with R?

    • @TheDataDigest
      @TheDataDigest  3 ปีที่แล้ว

      Hi, I hope these visualization tutorials with ggplot() are already a first start to help you with R.

  • @kyleevalencia1827
    @kyleevalencia1827 2 ปีที่แล้ว +1

    Can you make video about ggmatrix ?

    • @TheDataDigest
      @TheDataDigest  2 ปีที่แล้ว

      Hi Kylee, most definitely. I have planned to make a video about different ways to arrange plots in R (facet_wrap, gridExtra etc.) Thanks for pointing me towards ggmatrix!

  • @WahranRai
    @WahranRai ปีที่แล้ว +1

    Dont use pipe %>% , let your code easy to understantable by everybody even python propgramers etc..
    ggplot2(data = data, aes =(x=names...) is better all needed info are encapsulated inside the function : data and attributes