如何从一个data.frame列获得最大价值，并得到所有记录

我有data.frame，我希望得到一个包含给定列Total如何从一个data.frame列获得最大价值，并得到所有记录

Txn_date Cust_no Acct_no cust_type Credit Debit Total 
09DEC2013 17382 601298644   I 1500  0 1500 
16DEC2013 17382 601298644   I 500  0 500 
17DEC2013 17382 601298644   I  0 60 60 
18DEC2013 17382 601298644   I  0 200 200 
19DEC2013 17382 601298644   I 1500  0 1500 
20DEC2013 17382 601298644   I  0 60 60 
20DEC2013 17382 601298644   I  0 103 103 
30DEC2013 17382 601298644   I 500  0 500

最大值的行所以，我写了一个简单的SQL查询，使用sqldf()如下解析：

s1<-paste("SELECT Txn_date, Cust_no,Credit,Debit,Total,max(Total) as 'MaxTxnAmt' FROM sample GROUP BY Cust_no") 
sample_t1<-sqldf(s1)

这一点让我

Txn_date Cust_no Acct_no cust_type Credit Debit Total 
09DEC2013 17382 601298644   I 1500  0 1500

我得到如上图所示的精确输出，如果我使用base - R功能：

sample_t1<-do.call(rbind, 
     lapply(split(sample,sample$Cust_no), 
       function(data) data[which.max(data$Total),]))

，我想知道，我怎么得到所有这一切都为列Total最大值表sample行。

所需的输出：

Txn_date Cust_no Acct_no cust_type Credit Debit Total 
09DEC2013 17382 601298644   I 1500  0 1500 
19DEC2013 17382 601298644   I 1500  0 1500

的样本数据：

sample <- structure(list(Txn_date = c("09DEC2013", "16DEC2013", "17DEC2013", 
"18DEC2013", "19DEC2013", "20DEC2013", "20DEC2013", "30DEC2013" 
), Cust_no = c(17382L, 17382L, 17382L, 17382L, 17382L, 17382L, 
17382L, 17382L), Acct_no = c("601298644", "601298644", "601298644", 
"601298644", "601298644", "601298644", "601298644", "601298644" 
), cust_type = c("I", "I", "I", "I", "I", "I", "I", "I"), Credit = c(1500, 
500, 0, 0, 1500, 0, 0, 500), Debit = c(0, 0, 60, 200, 0, 60, 
103, 0), Total = c(1500, 500, 60, 200, 1500, 60, 103, 500)), .Names = c("Txn_date", 
"Cust_no", "Acct_no", "cust_type", "Credit", "Debit", "Total" 
), row.names = c(16303L, 29153L, 31174L, 33179L, 35388L, 38750L, 
38751L, 53052L), class = "data.frame")

来源

2015-06-09 sunitprasad1

尝试

library(dplyr) 
sample %>% 
    group_by(Cust_no) %>% 
    filter(Total==max(Total)) 
# Txn_date Cust_no Acct_no cust_type Credit Debit Total 
#1 09DEC2013 17382 601298644   I 1500  0 1500 
#2 19DEC2013 17382 601298644   I 1500  0 1500

或者

library(data.table) 
setDT(sample)[, .SD[Total==max(Total)] ,Cust_no] 
# Cust_no Txn_date Acct_no cust_type Credit Debit Total 
#1: 17382 09DEC2013 601298644   I 1500  0 1500 
#2: 17382 19DEC2013 601298644   I 1500  0 1500

或者

setkey(setDT(sample), Total)[J(max(Total)), .SD,Cust_no] 
# Cust_no Txn_date Acct_no cust_type Credit Debit Total 
#1: 17382 09DEC2013 601298644   I 1500  0 1500 
#2: 17382 19DEC2013 601298644   I 1500  0 1500

或者使用base R

sample[with(sample, ave(Total, Cust_no, FUN=max)==Total),] 
# Txn_date Cust_no Acct_no cust_type Credit Debit Total 
#1 09DEC2013 17382 601298644   I 1500  0 1500 
#5 19DEC2013 17382 601298644   I 1500  0 1500

来源

2015-06-09 09:52:45 akrun

1）相关子查询试试这个：

sqldf("select * 
     from sample a 
     where Total = (select max(Total) 
         from sample b 
         where b.Cust_no = a.Cust_no)")

，并提供：

Txn_date Cust_no Acct_no cust_type Credit Debit Total 
1 09DEC2013 17382 601298644   I 1500  0 1500 
2 19DEC2013 17382 601298644   I 1500  0 1500

2）与子查询或此加入：

sqldf("select * 
     from sample 
     join (select Cust_no, max(Total) as 'MaxTxnAmt' 
      from sample 
      group by Cust_no) 
     using(Cust_no) 
     where Total = MaxTxnAmt")

，并提供：

Txn_date Cust_no Acct_no cust_type Credit Debit Total MaxTxnAmt 
1 09DEC2013 17382 601298644   I 1500  0 1500  1500 
2 19DEC2013 17382 601298644   I 1500  0 1500  1500

来源

2015-06-09 13:43:58

如何从一个data.frame列获得最大价值，并得到所有记录

回答

相关问题