How to draw nested categorical boxplots in R using ggplot2? | Salaries | StatswithR | Arnab Hazra
ฝัง
- เผยแพร่เมื่อ 23 ก.ค. 2024
- Here we explain how to generate a presentation/publication-quality nested categorical boxplots in R/R-studio using ggplot2. The codes for the steps explained in the video are as follows. Copy and paste them into R, run them one-by-one and try to understand what each argument is doing.
#datascience #datavisualization #visualization #ggplot2 #tidyverse #nestedboxplot #categoricalboxplot #boxplot #nested #categorical #Salaries #Salariesdataset #rstudio #rcoding
library(carData)
data("Salaries", package = "carData")
library(writexl)
write_xlsx(Salaries[ , c(1, 2, 6)], path = "Salaries.xlsx")
library(readxl)
data = read_excel("Salaries.xlsx")
head(data)
str(data)
library(ggplot2)
p = ggplot(data = data, aes(x=rank, y=salary / 1e3, fill=discipline)) + geom_boxplot()
ggsave(p, filename = "ggplot_nestbox1.pdf", height = 8, width = 8)
ranks = c("AsstProf", "AssocProf", "Prof")
data$rank = factor(data$rank, levels = ranks)
data$discipline = factor(data$discipline, levels = c("A", "B"))
data$salary = data$salary / 1e3
str(data)
p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) + geom_boxplot()
ggsave(p, filename = "ggplot_nestbox2.pdf", height = 8, width = 8)
levels(data$rank) = c("Assistant Prof.", "Associate Prof.", "Full Prof.")
levels(data$discipline) = c("Theoretical", "Applied")
p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) + geom_boxplot()
ggsave(p, filename = "ggplot_nestbox3.pdf", height = 8, width = 8)
p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) +
stat_boxplot(geom = "errorbar") + geom_boxplot()
ggsave(p, filename = "ggplot_nestbox4.pdf", height = 8, width = 8)
p = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) +
stat_boxplot(geom = "errorbar") + geom_boxplot() +
ggtitle("Salary comparison") + xlab(NULL) + ylab("Nine-month salary (in thousands USD)")
ggsave(p, filename = "ggplot_nestbox5.pdf", height = 8, width = 8)
p0 = ggplot(data = data, aes(x=rank, y=salary, fill=discipline)) +
stat_boxplot(geom = "errorbar") + geom_boxplot() +
ggtitle("Salary comparison") + xlab(NULL) + ylab("Nine-month salary (in thousands USD)") +
theme(axis.text=element_text(size=18),
axis.title=element_text(size=18),
plot.title = element_text(size=20, hjust = 0.5))
ggsave(p0, filename = "ggplot_nestbox6.pdf", height = 8, width = 8)
p = p0 + theme(legend.text=element_text(size=18),
legend.title = element_text(size=18, hjust = 0.5),
legend.key.height = unit(2,"cm"),
legend.key.width = unit(2,"cm"),
legend.position = c(0.2, 0.8)) +
guides(fill=guide_legend(title="Discipline"))
ggsave(p, filename = "ggplot_nestbox7.pdf", height = 8, width = 8)
cols = rep(c("#E69F00", "#56B4E9"), length(levels(data$rank)))
p = p0 + theme(legend.text=element_text(size=18),
legend.title = element_text(size=18, hjust = 0.5),
legend.key.height = unit(2,"cm"),
legend.key.width = unit(2,"cm"),
legend.position = c(0.2, 0.8)) +
guides(fill=guide_legend(title="Discipline")) +
scale_fill_manual(values=cols)
ggsave(p, filename = "ggplot_nestbox8.pdf", height = 8, width = 8)
This was so explicit and helpful. Thank you very much!!!
Interesting tutorial. Thanks a lot
Omg this video is so good and clear and easy to follow!!
Thank you, Rose!
Thank you so much.
This is a nice video to understanding boxplot making. I need to know how to change box and whisker's line width.
thank you so much
answer to my every querry solution to my all problems thanks
Thanks
Thank you for this tutorial.
Thank you for your comment Antony 😊
@@statswithr602 is it possible to add a regression line (abline). Kindly, help.
Thanks for your great tutorial, can you tell us how to reorder the "discipline" after all mentioned steps have been done"?
Hi! Great and easy to follow lesson. I would like to use standard deviation as error bars, how can I do this?
Hi. Instead of two plot per rank. How can tou you do that in one discipline. And plot it using facet?
Hello sir, what kind of color code is that in the graphics?
It's very clear. How can you visualize more than 2 categorical factor?
Thank you so much for your tutorial, I can finally make it work properly!! Can I ask you, how can I add more values to my y-axis? In your y-axis it ends at '200' and in my own it ends at '1.2', but my highest value is around '1.4', how would I add this value at the top of my y-axis? Thank you again very much, you saved my day! :)
Thank you for your comment Karen, you can add a layer ylim(l,u), where l is the lowest value you want on the y axis and u is the highest value you want on the y axis... Good luck
@@statswithr602 Thank you, that worked! :))