How to draw nested categorical boxplots in R using ggplot2? | Salaries | StatswithR | Arnab Hazra

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 ก.ค. 2024
  • Here we explain how to generate a presentation/publication-quality nested categorical boxplots in R/R-studio using ggplot2. The codes for the steps explained in the video are as follows. Copy and paste them into R, run them one-by-one and try to understand what each argument is doing.
    #datascience #datavisualization #visualization #ggplot2 #tidyverse #nestedboxplot #categoricalboxplot #boxplot #nested #categorical #Salaries #Salariesdataset #rstudio #rcoding
    library(carData)
    data("Salaries", package = "carData")
    library(writexl)
    write_xlsx(Salaries[ , c(1, 2, 6)], path = "Salaries.xlsx")
    library(readxl)
    data = read_excel("Salaries.xlsx")
    head(data)
    str(data)
    library(ggplot2)
    p = ggplot(data = data, aes(x=rank, y=salary / 1e3, fill=discipline)) + geom_boxplot()
    ggsave(p, filename = "ggplot_nestbox1.pdf", height = 8, width = 8)
    ranks = c("AsstProf", "AssocProf", "Prof")
    data$rank = factor(data$rank, levels = ranks)
    data$discipline = factor(data$discipline, levels = c("A", "B"))
    data$salary = data$salary / 1e3
    str(data)
    p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) + geom_boxplot()
    ggsave(p, filename = "ggplot_nestbox2.pdf", height = 8, width = 8)
    levels(data$rank) = c("Assistant Prof.", "Associate Prof.", "Full Prof.")
    levels(data$discipline) = c("Theoretical", "Applied")
    p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) + geom_boxplot()
    ggsave(p, filename = "ggplot_nestbox3.pdf", height = 8, width = 8)
    p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) +
    stat_boxplot(geom = "errorbar") + geom_boxplot()
    ggsave(p, filename = "ggplot_nestbox4.pdf", height = 8, width = 8)
    p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) +
    stat_boxplot(geom = "errorbar") + geom_boxplot() +
    ggtitle("Salary comparison") + xlab(NULL) + ylab("Nine-month salary (in thousands USD)")
    ggsave(p, filename = "ggplot_nestbox5.pdf", height = 8, width = 8)
    p0 = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) +
    stat_boxplot(geom = "errorbar") + geom_boxplot() +
    ggtitle("Salary comparison") + xlab(NULL) + ylab("Nine-month salary (in thousands USD)") +
    theme(axis.text=element_text(size=18),
    axis.title=element_text(size=18),
    plot.title = element_text(size=20, hjust = 0.5))
    ggsave(p0, filename = "ggplot_nestbox6.pdf", height = 8, width = 8)
    p = p0 + theme(legend.text=element_text(size=18),
    legend.title = element_text(size=18, hjust = 0.5),
    legend.key.height = unit(2,"cm"),
    legend.key.width = unit(2,"cm"),
    legend.position = c(0.2, 0.8)) +
    guides(fill=guide_legend(title="Discipline"))
    ggsave(p, filename = "ggplot_nestbox7.pdf", height = 8, width = 8)
    cols = rep(c("#E69F00", "#56B4E9"), length(levels(data$rank)))
    p = p0 + theme(legend.text=element_text(size=18),
    legend.title = element_text(size=18, hjust = 0.5),
    legend.key.height = unit(2,"cm"),
    legend.key.width = unit(2,"cm"),
    legend.position = c(0.2, 0.8)) +
    guides(fill=guide_legend(title="Discipline")) +
    scale_fill_manual(values=cols)
    ggsave(p, filename = "ggplot_nestbox8.pdf", height = 8, width = 8)

ความคิดเห็น • 20

  • @user-zm4td4wn8i
    @user-zm4td4wn8i 2 ปีที่แล้ว

    This was so explicit and helpful. Thank you very much!!!

  • @valentinouedraogo5298
    @valentinouedraogo5298 ปีที่แล้ว

    Interesting tutorial. Thanks a lot

  • @rosebaulch2666
    @rosebaulch2666 3 ปีที่แล้ว +1

    Omg this video is so good and clear and easy to follow!!

  • @MrRahulukey
    @MrRahulukey 2 ปีที่แล้ว

    Thank you so much.

  • @letsseethenature2423
    @letsseethenature2423 2 ปีที่แล้ว

    This is a nice video to understanding boxplot making. I need to know how to change box and whisker's line width.

  • @aaronm9491
    @aaronm9491 3 ปีที่แล้ว

    thank you so much

  • @salmaasghar1674
    @salmaasghar1674 ปีที่แล้ว

    answer to my every querry solution to my all problems thanks

  • @vincenzo4259
    @vincenzo4259 2 ปีที่แล้ว

    Thanks

  • @antonynjogu4721
    @antonynjogu4721 3 ปีที่แล้ว +1

    Thank you for this tutorial.

    • @statswithr602
      @statswithr602  3 ปีที่แล้ว

      Thank you for your comment Antony 😊

    • @antonynjogu4721
      @antonynjogu4721 3 ปีที่แล้ว

      @@statswithr602 is it possible to add a regression line (abline). Kindly, help.

  • @truongphu7407
    @truongphu7407 2 ปีที่แล้ว

    Thanks for your great tutorial, can you tell us how to reorder the "discipline" after all mentioned steps have been done"?

  • @adeizamomoh2939
    @adeizamomoh2939 2 ปีที่แล้ว

    Hi! Great and easy to follow lesson. I would like to use standard deviation as error bars, how can I do this?

  • @vbhidalgo2508
    @vbhidalgo2508 2 ปีที่แล้ว

    Hi. Instead of two plot per rank. How can tou you do that in one discipline. And plot it using facet?

  • @eaaepro
    @eaaepro 3 ปีที่แล้ว

    Hello sir, what kind of color code is that in the graphics?

  • @letterinoscala8670
    @letterinoscala8670 ปีที่แล้ว

    It's very clear. How can you visualize more than 2 categorical factor?

  • @karenbrodersen3847
    @karenbrodersen3847 3 ปีที่แล้ว +2

    Thank you so much for your tutorial, I can finally make it work properly!! Can I ask you, how can I add more values to my y-axis? In your y-axis it ends at '200' and in my own it ends at '1.2', but my highest value is around '1.4', how would I add this value at the top of my y-axis? Thank you again very much, you saved my day! :)

    • @statswithr602
      @statswithr602  3 ปีที่แล้ว +1

      Thank you for your comment Karen, you can add a layer ylim(l,u), where l is the lowest value you want on the y axis and u is the highest value you want on the y axis... Good luck

    • @karenbrodersen3847
      @karenbrodersen3847 3 ปีที่แล้ว

      @@statswithr602 Thank you, that worked! :))