平均r中

基于单个列的所有其他其他列我有一个有4万多列，我遇到了类似这样的 Sum by distinct column value in R 平均r中

shop <- data.frame( 
    'shop_id' = c('Shop A', 'Shop A', 'Shop A', 'Shop B', 'Shop C', 'Shop C'), 
    'Assets' = c(2, 15, 7, 5, 8, 3), 
    'Liabilities' = c(5, 3, 8, 9, 12, 8), 
    'sale' = c(12, 5, 9, 15, 10, 18), 
    'profit' = c(3, 1, 3, 6, 5, 9))

我有一列shop_id一个问题的大数据帧这是重复多次。我有与shop_id相关的其他值，比如资产，负债，利润，亏损等。我现在想对所有具有相同shop_id的变量求平均值，即，我想要唯一的shop_id，并且要求所有具有同一shop_id。因为有数千个变量（列）分别与每列（变量）一起工作非常繁琐。

shop_id Assets Liabilities  sale profit  
Shop A 8.0  5.333333 8.666667 2.333333 
Shop B 5.0  9.000000 15.000000 6.000000 
Shop C 5.5 10.000000 14.000000 7.000000

我目前使用嵌套的for循环为我的答案应该是以下几点：一样多功能R是的，我认为应该有这样做

idx <- split(1:nrow(shop), shop$shop_id) 

newdata <- data.frame() 

for(i in 1:length(idx)){ 
    newdata[i,1]<-c(names(idx)[i]) 
    for (j in 2:ncol(shop)){ 
     newdata[i,j]<-mean(shop[unlist(idx[i]),j]) 
    } 
}

来源

2015-05-26 discipulus

尝试data.table

library(data.table) 
setDT(shop)[, lapply(.SD, mean), shop_id] 
# shop_id Assets Liabilities  sale profit 
#1: Shop A 8.0 5.333333 8.666667 2.333333 
#2: Shop B 5.0 9.000000 15.000000 6.000000 
#3: Shop C 5.5 10.000000 14.000000 7.000000

或者

library(dplyr) 
shop %>% 
    group_by(shop_id)%>% 
    summarise_each(funs(mean)) 
# shop_id Assets Liabilities  sale profit 
#1 Shop A 8.0 5.333333 8.666667 2.333333 
#2 Shop B 5.0 9.000000 15.000000 6.000000 
#3 Shop C 5.5 10.000000 14.000000 7.000000

或者

aggregate(.~shop_id, shop, FUN=mean) 
# shop_id Assets Liabilities  sale profit 
#1 Shop A 8.0 5.333333 8.666667 2.333333 
#2 Shop B 5.0 9.000000 15.000000 6.000000 
#3 Shop C 5.5 10.000000 14.000000 7.000000

40000列，我会用data.table或可能dplyr。

来源

2015-05-26 07:04:49 akrun

更快的方法

使用plyr包中的ddply功能：

> require("plyr") 
> ddply(shop, ~shop_id, summarise, Assets=mean(Assets), 
     Liabilities=mean(Liabilities), sale=mean(sale), profit=mean(profit)) 

    shop_id Assets Liabilities  sale profit 
1 Shop A 8.0 5.333333 8.666667 2.333333 
2 Shop B 5.0 9.000000 15.000000 6.000000 
3 Shop C 5.5 10.000000 14.000000 7.000000

来源

2015-05-26 07:04:44

尝试用dplyr：

library("dplyr") 
shop %>% group_by(shop_id) %>% summarise_each(funs(mean)) 

# shop_id Assets Liabilities  sale profit 
# 1 Shop A 8.0 5.333333 8.666667 2.333333 
# 2 Shop B 5.0 9.000000 15.000000 6.000000 
# 3 Shop C 5.5 10.000000 14.000000 7.000000

来源

2015-05-26 07:05:04 Victorp

rowsum可能是有益的，在这里：

rowsum(shop[-1], shop[[1]])/table(shop[[1]]) 
#  Assets Liabilities  sale profit 
#Shop A 8.0 5.333333 8.666667 2.333333 
#Shop B 5.0 9.000000 15.000000 6.000000 
#Shop C 5.5 10.000000 14.000000 7.000000

来源

2015-05-26 07:35:41

这是一些创新的想法 – akrun

回答

相关问题