2015-03-31 35 views
0

我的数据帧看起来像这样:分割的数据到R基团

plant distance 
one 0 
one 1 
one 2 
one 3 
one 4 
one 5 
one 6 
one 7 
one 8 
one 9 
one 9.9 
two 0 
two 1 
two 2 
two 3 
two 4 
two 5 
two 6 
two 7 
two 8 
two 9 
two 9.5 

我想通过间隔每个电平的距离分成组(例如,间隔= 3),并计算每组的百分比。最后,绘制各组类似这样的每一级的百分比:

enter image description here

我的代码:

library(ggplot2) 
library(dplyr) 

dat <- data %>% 
    mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>% 
    group_by(plant, group) %>% 
    summarise(percentage = n()) %>% 
    mutate(percentage = percentage/sum(percentage)) 
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) + 
    geom_bar(stat = "identity", position = "stack")+ 
    scale_y_continuous(labels=percent) 
p 

但我的情节如下:在group 4失踪。 enter image description here

而且我发现dat错了,group 4NA

enter image description here

可能的原因是,group 4长度均小于interval=3,所以我的问题是如何解决的呢?先谢谢你!

回答

0

我已经解决了这个问题。原因是cut(distance, seq(0, max(distance), 3), F)没有包含最大值和最小值。

这里是我的解决方案:

dat <- my_data %>% 
    mutate(group = factor(cut(distance, seq(from = min(distance), by = 3, length.out = n()/ 3 + 1), include.lowest = TRUE))) %>% 
    count(plant, group) %>% 
    group_by(plant) %>% 
    mutate(percentage = n/sum(n))