列名和列顺序

我data.frame：列名和列顺序

df 
ID Time a b c d e 
WT A 28 56 50 60 15 
WT B 54 77 11 67 34 
WT C 53 8 87 62 55 
WT D 30 73 47 82 1 
KO A 24 83 14 17 36 
KO B 91 83 72 41 4 
KO C 79 17 76 21 54 
KO D 41 40 77 49 92

我子集，平均团体DF：

use_col=3:ncol(df) 
mymean<-aggregate(df[, use_col],by= list(df$ID, df$Time),FUN = function(X)mean(X,na.rm=T)) 

Group.1 Group.2 a b c d e 
    WT  A 51 52 49 29 47 
    KO  A  8 17 78 64 96 
    WT  B 79 5 45 83 56 
    KO  B 53 47 37 99 17 
    WT  C 72 38 56 63 40 
    KO  C 51 3 30 47 52 
    WT  D  3 30 75 53 73 
    KO  D 13 72 16 52 13

为什么我失去了前两名的名字，我怎么可能让他们？另外，我试图通过对因子mymean data.frame：

mymean$Group.1=factor(mymean$Group.1, c("WT","KO"))

，但它不工作。感谢您的帮助。

其实意味着不应该有时间，应该是这样的：

mymean<-aggregate(df[, use_col],by= list(df$ID,),FUN = function(X)mean(X,na.rm=T))

但是输出是：

ID  a  b  c  d e 
KO 75.75 44.25 61.75 52.50 39.0 
WT 56.00 57.00 84.25 58.75 39.5

但它应该是倒过来是这样的：

ID  a  b  c  d e 
WT 56.00 57.00 84.25 58.75 39.5 
KO 75.75 44.25 61.75 52.50 39.0

来源

2015-08-15 Al14

'通过=名单（ID = DF $ ID，时间= DF $时间）'和第一列应该已经被考虑 –

谢谢你，但仍因素不起作用 – Al14

然后尝试'mymean $ ID < - as.factor（mymean $ ID））''汇总后 –

在您的aggregate()调用中，更改by argume NT已命名的列表元素

by = list(ID = df$ID, Time = df$Time)

然后为你更新的问题，你可以使用

use_col = 3:ncol(df) 
mymean <- aggregate(df[, use_col], by = list(ID = df$ID), mean, na.rm=TRUE) 
mymean[order(mymean$ID, decreasing = TRUE), ] 
# ID  a  b  c  d  e 
# 2 WT 41.25 53.50 48.75 67.75 26.25 
# 1 KO 58.75 55.75 59.75 32.00 46.50

想必这些值与你的不同，因为你使用的是不同的数据集。

你也可以用data.table

library(data.table) 
## convert to data table 
dt <- as.data.table(df) 
## order by decreasing ID 
setorderv(dt, "ID", -1L) 
## remove the Time column then find the mean of all columns by ID 
dt[, lapply(.SD, mean, na.rm = TRUE), by = ID, .SDcols = use_col] 
# ID  a  b  c  d  e 
# 1: WT 41.25 53.50 48.75 67.75 26.25 
# 2: KO 58.75 55.75 59.75 32.00 46.50

来源

2015-08-15 23:10:05

在aggregate为此痛苦少，另外一个选项是公式法，在这里我们可以对~的LHS与.指定所有非分组列在RHS上分组栏。因为我们不需要mean中的'时间'列，所以我们可以使用数据集的subset，得到mean，指定附加参数na.rm=TRUE,na.action=NULL（如果存在NA元素则避免移除整行）和order基于'ID'列的输出。

res <- aggregate(.~ID, subset(df, select=-Time), 
         FUN=mean, na.rm=TRUE, na.action=NULL) 
    res[order(res$ID, decreasing=TRUE),] 
    # ID  a  b  c  d  e 
    #2 WT 41.25 53.50 48.75 67.75 26.25 
    #1 KO 58.75 55.75 59.75 32.00 46.50

我们也可以使用summarise_each从dplyr作为另一种选择，在这里我们按“ID”栏，并得到所有其他列的mean除了“时间”，并责令使用ID输出。

library(dplyr) 
df %>% 
    group_by(ID) %>% 
    summarise_each(funs(mean=mean(., na.rm=TRUE)), -Time) %>% 
     arrange(desc(ID)) 
# ID  a  b  c  d  e 
#1 WT 41.25 53.50 48.75 67.75 26.25 
#2 KO 58.75 55.75 59.75 32.00 46.50

来源

2015-08-16 02:17:08 akrun

列名和列顺序

回答

相关问题