Always, at the begining we should start with loading the packages and setting up the defaults. As you see here, I have my preferences about the theme and legend position. We won’t be using the latter, but it’s a force of habit for me to add the following lines to every notebook I set up.
# load the precious libraries
library(tidyverse)
library(ggsci)
# set the defauts for plots
theme_set(theme_classic())
theme_update(legend.position = "bottom")Now, let’s create a simulated data to recreate the plots.
set.seed(7)
df <-
data.frame(
sample_name = c(rep("sample_A", 15), rep("sample_B", 5)),
gene_1 = c(rnorm(15, 1), rnorm(5, 3)),
gene_2 = c(rnorm(15, 10, 5), rnorm(5, 10, 2)),
gene_3 = c(rnorm(15, 10, 1), rnorm(5, 9, 3))
) %>%
pivot_longer(names_to = "gene_id",
values_to = "expression",
-sample_name)Firstly, let’s create a plot with all observations and their means.
# points with mean
df %>%
ggplot(aes(sample_name, expression, color = gene_id)) +
geom_point(position = position_jitterdodge(), size = 2) +
stat_summary(fun.y = mean, fun.ymin = mean, fun.ymax = mean,
geom = "crossbar", alpha = .7,
width = 0.5, color = "grey60") +
facet_wrap(~gene_id) +
scale_color_nejm() +
labs(title = "Example jitter with mean",
y = "Normalised expression") +
theme(text = element_text(size = 14), title = element_text(size = 18),
axis.title.x = element_blank(), legend.position = "none",
panel.border = element_rect(colour = "black", fill = NA))Now, the plot with the boxplot.
The default ggplot geom_boxplot() will show upper quartile, median and lower quartile in a rectangle. The lines from the upper quartile extend to largest observation less than or equal to upper quartile + 1.5 * IQR. Similarly, for the ymin will extend to smallest observation greater than or equal to lower quartile - 1.5 * IQR. Full documentation, and exmaples of how to modify the plot, is available here.
Additionally, since we are plotting the data under the boxplot, there is no need for us to show the outliers, so we are setting outlier.shape = NA.
# points with boxplot
df %>%
ggplot(aes(sample_name, expression, color = gene_id)) +
geom_point(position = position_jitterdodge(), size = 2) +
geom_boxplot(width = .2, alpha = .4, outlier.shape = NA,
color = "grey60") +
facet_wrap(~gene_id) +
scale_color_nejm() +
labs(title = "Example jitter with boxplot",
y = "Normalised expression") +
theme(text = element_text(size = 14), title = element_text(size = 18),
axis.title.x = element_blank(), legend.position = "none",
panel.border = element_rect(colour = "black", fill = NA))