2016-12-14 74 views
0

每个唯一标识符(分组元素)创建这些列我有以下数据集:找到最小值和最大值以及R中

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5)) 
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5)) 
Dia <- c(870,"NA", 867.3, "NA", "NA", 890.3,"NA","NA",871.2,"NA",868.7,"NA",866.2, "NA", 
"NA",851,"NA","NA",842,"NA","NA",880,860,851.8,"NA",841) 

df <- data.frame(MC,ASN,Dia) 

df 

我想找到每个MC,最小和最大直径值和设置在所得的表如下所示:

MC   Dia  Min_Dia Max_Dia 
OS000348 870  867.3 890.3 
OS000361 871.2 841  871.2 
OS000375 880  841  880 

我试图使用dplyr包和以下:

result1 <- 
    df %>% 
    group_by(MC) %>% 
    arrange(MC) %>% 
    slice(c(1, n())) %>% 
    mutate(minmax = c("Min", "Max")) %>% 
    gather(var, val, Dia) %>% 
    unite(key, minmax, var) %>% 
    spread(key, val) 

但我没有得到表,我想要的方式(上表第二张表)。

可以选择吗?

+0

不要输入为'“NA”',输入为'NA'代替。聚合函数可以很好地工作:聚合(Dia〜MC,data = df,FUN = function(x)c(head(x,1),min(x,na.rm = T),max(x, na.rm = T)))' – bouncyball

回答

3

首先,您需要输入NA作为NA而不是"NA",否则R将其读作字符向量,并且您不能使用min()函数。这段代码产生所需的输出:

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5)) 
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5)) 
Dia <- c(870,NA, 867.3, NA, NA, 890.3,NA,NA,871.2,NA,868.7,NA,866.2, NA, 
     NA,851,NA,NA,842,NA,NA,880,860,851.8,NA,841) 

df <- data.frame(MC,ASN,Dia) 

library(dplyr) 

df <- df %>% 
    group_by(MC) %>% 
    mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) 

如果使用这个你只是想保持MC的一个观察:

df2 <- df %>% 
    group_by(MC) %>% 
    mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) %>% 
    ungroup() %>% 
    distinct(MC, minDia, maxDia) 
+0

感谢这么快的回应。它显示一个错误:'min'对因素无意义 – ZeekDSA

+1

如果您在第一句中遵循@ yoland的建议,则不会出现此错误消息。 – bouncyball

+0

@bouncyball哈哈哈。谢谢我只是注意到它...... :) – ZeekDSA