计算小计（总和，stdev，平均等）

我一直在寻找这一段时间，但一直没能找到明确的答案。可能一直在寻找错误的条件，但也许这里有人可以快速帮助我。这个问题是基本的。计算小计（总和，stdev，平均等）

的样本数据集：

set <- structure(list(VarName = structure(c(1L, 5L, 4L, 2L, 3L), 
.Label = c("Apple/Blue/Nice", 
"Apple/Blue/Ugly", "Apple/Pink/Ugly", "Kiwi/Blue/Ugly", "Pear/Blue/Ugly" 
), class = "factor"), Color = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Blue", 
"Pink"), class = "factor"), Qty = c(45L, 34L, 46L, 21L, 38L)), .Names = c("VarName", 
"Color", "Qty"), class = "data.frame", row.names = c(NA, -5L))

这给出了一个数据集，如：

set 


     VarName  Color Qty 
1 Apple/Blue/Nice Blue 45 
2 Pear/Blue/Ugly Blue 34 
3 Kiwi/Blue/Ugly Blue 46 
4 Apple/Blue/Ugly Blue 21 
5 Apple/Pink/Ugly Pink 38

我想这样做是相当直截了当。我想总结（或平均或stdev）数量列。但是，我也想这样做在下列条件下相同的操作：

VarName中包含“苹果”
VarName中包括“丑陋”
颜色等于“蓝”

任何人这可以让我快速介绍如何执行这种计算？

我知道一些它可以通过聚合（）函数来完成，例如：

aggregate(set[3], FUN=sum, by=set[2])[1,2]

不过，我相信有这样做那么这更直接的方式。是否有一些过滤器可以添加到像sum()这样的功能？

来源

2012-09-27 Jochem

这是你在找什么？

# sum for those including 'Apple' 
apple <- set[grep('Apple', set[, 'VarName']), ] 
aggregate(apple[3], FUN=sum, by=apple[2]) 
    Color Qty 
1 Blue 66 
2 Pink 38 

# sum for those including 'Ugly' 
ugly <- set[grep('Ugly', set[, 'VarName']), ] 
aggregate(ugly[3], FUN=sum, by=ugly[2]) 
    Color Qty 
1 Blue 101 
2 Pink 38 

# sum for Color==Blue 
sum(set[set[, 'Color']=='Blue', 3]) 
[1] 146

的最后一笔可以通过使用subset

sum(subset(set, Color=='Blue')[,3])

来源

2012-09-27 10:09:53

最简单的方法来拆分VarName列，然后子集变得非常容易。所以，让我们创建一个对象被varName已经分开：

##There must(?) be a better way than this. Anyone? 
new_set = t(as.data.frame(sapply(as.character(set$VarName), strsplit, "/")))

简要说明：

我们使用as.character因为set$VarName是一个因素
sapply依次取每个值和适用strplit
strsplit功能拆分元素
W Ë转换为数据帧
移调，以获得正确的旋转

接下来，

##Convert to a data frame 
new_set = as.data.frame(new_set) 
##Make nice rownames - not actually needed 
rownames(new_set) = 1:nrow(new_set) 
##Add in the Qty column 
new_set$Qty = set$Qty

这给

R> new_set 
    V1 V2 V3 Qty 
1 Apple Blue Nice 45 
2 Pear Blue Ugly 34 
3 Kiwi Blue Ugly 46 
4 Apple Blue Ugly 21 
5 Apple Pink Ugly 38

现在，所有的操作都作为标准配置。例如，

##Add up all blue Qtys 
sum(new_set[new_set$V2 == "Blue",]$Qty) 
[1] 146 

##Average of Blue and Ugly Qtys 
mean(new_set[new_set$V2 == "Blue" & new_set$V3 == "Ugly",]$Qty) 
[1] 33.67

，一旦它在正确的形式，你可以用它每次你想要的ddply（及以上）

library(plyr) 
##Split the data frame up by V1 and take the mean of Qty 
ddply(new_set, .(V1), summarise, m = mean(Qty)) 

##Split the data frame up by V1 & V2 and take the mean of Qty 
ddply(new_set, .(V1, V2), summarise, m = mean(Qty))

来源

2012-09-27 10:08:05 csgillespie

很好的解释+1来完成。 –

谢谢你的解释。在学习期间，我发现了一些东西。这似乎给了一个NaN答案：“mean（new_set [new_set $ V2 ==”Blue“&& new_set $ V3 ==”Ugly“，] $ Qty）”。不确定为什么会发生这种情况。 – Jochem

@Jochem Opps，我有&&'而不是'＆'。 '&&'与媒介不搭配。 – csgillespie

计算小计（总和，stdev，平均等）

回答

相关问题