使用reshape + cast来聚合多个列

在R中，我有一个数据框，其中包含Seat（factor），Party（factor）和Votes（numeric）的列。我想创建一个概要数据框，其中包含Seat，Winning party和Vote共享列。例如，从数据帧使用reshape + cast来聚合多个列

df <- data.frame(party=rep(c('Lab','C','LD'),times=4), 
       votes=c(1,12,2,11,3,10,4,9,5,8,6,15), 
       seat=rep(c('A','B','C','D'),each=3))

我想要得到的输出

seat winner voteshare 
1 A  C 0.8000000 
2 B Lab 0.4583333 
3 C  C 0.5000000 
4 D  LD 0.5172414

我可以计算出如何实现这一目标。但我相信肯定有更好的方法，可能是一个狡猾的单线使用哈德利韦克姆的reshape包。有什么建议么？

为了什么是值得的，我的解决方案使用我的包中的函数 djwutils_2.10.zip，并按如下方式调用。但是有各种各样的特殊情况它不涉及，所以我宁愿依赖别人的代码。

aggregateList(df, by=list(seat=seat), 
       FUN=list(winner=function(x) x$party[which.max(x$votes)], 
         voteshare=function(x) max(x$votes)/sum(x$votes)))

来源

2010-05-06 DamonJW

非常流行的数据集！ – 2010-05-06 14:16:05

确实！我已经生成了显示上一次选举结果的图表，以及最新的betfair.com赔率，我想为今晚做好准备。情节在http://www.cs.ucl.ac.uk/staff/d.wischik/Interests/Stats/Election/uk2010.html – DamonJW 2010-05-06 14:47:36

哈德利的plyr包可以帮助你：

ddply(df, .(seat), function(x) data.frame(winner=x[which.max(x$votes),]$party, voteshare=max(x$votes)/sum(x$votes)))

来源

2010-05-06 14:52:45 kohske

谢谢。这正是我想要的。 – DamonJW 2010-05-06 17:18:28

或者更简洁（即将更快）： 'ddply（df，。（seat），summary，winner = party [which.max（votes）]，voteshare = max（votes）/ sum（votes）））' – hadley 2010-05-08 03:50:31

你可能是正确的，有一个狡猾的一行代码。我倾向于赞成可以理解的方式比聪明更好，尤其是当你第一次看到某些东西时。这是更详细的选择。

votes_by_seat_and_party <- as.matrix(cast(df, seat ~ party, value="votes")) 

    C Lab LD 
A 12 1 2 
B 3 11 10 
C 9 4 5 
D 6 8 15 

seats <- rownames(votes_by_seat_and_party) 
parties <- colnames(votes_by_seat_and_party) 

winner_col <- apply(votes_by_seat_and_party, 1, which.max) 
winners <- parties[winner_col] 
voteshare_of_winner_by_seat <- apply(votes_by_seat_and_party, 1, function(x) max(x)/sum(x)) 

results <- data.frame(seat = seats, winner = winners, voteshare = voteshare_of_winner_by_seat) 

    seat winner voteshare 
1 A  C 0.8000000 
2 B Lab 0.4583333 
3 C  C 0.5000000 
4 D  LD 0.5172414 

# Full voteshare matrix, if you're interested 
total_votes_by_seat <- rowSums(votes_by_seat_and_party) 
voteshare_by_seat_and_party <- votes_by_seat_and_party/total_votes_by_seat

来源

2010-05-06 15:36:42

您可以将缺失值（某个特定座位的特定参与者没有候选人）视为“0”或“NA”。 – 2010-05-06 16:00:04

好了，3个解决方案...这里是使用原始R.这是4个稀疏行代码另一个更紧凑的解决方案。我假设缺失的值是0，或者只是缺少，因为它没关系。我的猜测是，这将是您的一大组数据的最快代码。

#get a sum for dividing 
s <- aggregate(df$votes, list(seat = df$seat), sum) 
#extract the winner and seat 
temp <- aggregate(df$votes, list(seat = df$seat), max) 
res <- df[df$seat %in% temp$seat & df$votes %in% temp$x,] 
res$votes <- res$votes/s$x

重命名列，如果你想...

资源$名称< - C（ '党'， 'voteshare'， '赢家'）

（这将返回一个错误一个领带的事件...你将能够看到它在临时数据帧）

来源

2010-05-06 17:16:32 John

使用reshape + cast来聚合多个列

回答

相关问题