我通常不会推荐绘制带有误差条的条形图。还有许多其他方式来绘制您的数据,这些数据及其结构显示得更好。
特别是如果您只有极少数情况下,绘图方式与酒吧并不好。一个很好的解释可以在这里找到:Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
我觉得很难给你一个很好的解决方案,因为我不知道你的研究问题。知道你真正想要展示或强调会让事情变得更容易。
我会给你两个建议,一个是小数据集,一个是大数据集。所有这些都是用ggplot2
创建的。我没有用他们的“元素编号”,而是以他们的起源(“数据集1/2”)为他们着色,因为我发现用这种方法来完成一个合适的图形更容易。
小数据集
使用geom_jitter
来显示所有的情况下,避免overplotting。
# import hadleyverse
library(magrittr)
library(dplyr)
library(tidyr)
library(ggplot2)
# generate small amount of data
set.seed(1234)
df1 <- data.frame(v1 = rnorm(5, 4, 1),
v2 = rnorm(5, 5, 1),
v3 = rnorm(5, 6, 1),
origin = rep(factor("df1", levels = c("df1", "df2")), 5))
df2 <- data.frame(v1 = rnorm(5, 4.5, 1),
v2 = rnorm(5, 5.5, 1),
v3 = rnorm(5, 6.5, 1),
origin = rep(factor("df2", levels = c("df1", "df2")), 5))
# merge dataframes and gather in long format
pdata <- bind_rows(df1, df2) %>%
gather(id, variable, -origin)
# plot data
ggplot(pdata, aes(x = id, y = variable, fill = origin, colour = origin)) +
stat_summary(fun.y = mean, geom = "point", position = position_dodge(width = .5),
size = 30, shape = "-", show_guide = F, alpha = .7) + # plot mean as "-"
geom_jitter(position = position_jitterdodge(jitter.width = .3, jitter.height = .1,
dodge.width = .5),
size = 4, alpha = .85) +
labs(x = "Variable", y = NULL) + # adjust legend
theme_light() # nicer theme
“大” 数据集
如果您有更多的数据点,就可以使用geom_violin
来概括他们。
set.seed(12345)
df1 <- data.frame(v1 = rnorm(50, 4, 1),
v2 = rnorm(50, 5, 1),
v3 = rnorm(50, 6, 1),
origin = rep(factor("df1", levels = c("df1", "df2")), 50))
df2 <- data.frame(v1 = rnorm(50, 4.5, 1),
v2 = rnorm(50, 5.5, 1),
v3 = rnorm(50, 6.5, 1),
origin = rep(factor("df2", levels = c("df1", "df2")), 50))
# merge dataframes
pdata <- bind_rows(df1, df2) %>%
gather(id, variable, -origin)
# plot with violin plot
ggplot(pdata, aes(x = id, y = variable, fill = origin)) +
geom_violin(adjust = .6) +
stat_summary(fun.y = mean, geom = "point", position = position_dodge(width = .9),
size = 6, shape = 4, show_guide = F) +
guides(fill = guide_legend(override.aes = list(colour = NULL))) +
labs(x = "Variable", y = NULL) +
theme_light()
版本均值和标绘与标准差的均值SD
如果你坚持,在这里是如何可以做到。
# merge dataframes and compute limits for sd
pdata <- bind_rows(df1, df2) %>%
gather(id, variable, -origin) %>%
group_by(origin, id) %>% # group data for limit calculation
mutate(upper = mean(variable) + sd(variable), # upper limit for error bar
lower = mean(variable) - sd(variable)) # lower limit for error bar
# plot
ggplot(pdata, aes(x = id, y = variable, fill = origin)) +
stat_summary(fun.y = mean, geom = "bar", position = position_dodge(width = .9),
size = 3) +
geom_errorbar(aes(ymin = lower, ymax = upper),
width = .2, # Width of the error bars
position = position_dodge(.9))