2017-01-17 54 views
3

我想要为data.frame中的每列找到第n个最小值。R中的data.frame中每列的第N个最小值

在下面的示例中,我使用dcast第n个函数实际指定了第二小的值。有人可以帮助编码的功能?

library(vegclust) 
library(dplyr) 
data(wetland) 
dfnorm = decostand(wetland,"normalize") 
dfchord = dist(dfnorm, method = "euclidean") 
dfchord = data.frame(as.matrix(dfchord) 
number_function = function(x) nth(x,2) # can change 2 to any number.. 

answer_vector = apply(dfchord, 2, number) # here, 2 specifying apply on columns 

实际的答案是这样的..

ans = c(0.5689322,0.579568297,0.315017693,0.315017693,0.632246369, 0.868563003, 0.704638684, 0.35827587, 0.725220337, 0.516397779) # length of 1:38 
+0

听起来有点陌生的事情。为了让您和您的同事在将来更易于阅读,您可能需要[解散](http://seananderson.ca/2013/10/19/reshape.html),然后[split-apply-combine]( http://stackoverflow.com/questions/26664644/use-dplyrs-group-by-per-perform-split-apply-combine) – citynorman

回答

1

这是我的例子;

num_func <- function(x, n) nth(sort(x), n) 
sapply(dfchord, num_func, n = 2) # edited (thanks for @thelatemail's comment) 
+0

@thelatemail;你是对的,谢谢。 – cuttlefish44

0

所以这里是一个答案,让人们在任何data.frame的任何列第n个值你只需要改变的X, y [x]。

x = dfchord 

for (i in (1:ncol(x))) { 
    y = sort(x[,i], decreasing=FALSE) 
    ans$small[i] = y[2] # this is the second biggest number, replace the value with whatever you want 
    ans$rel = rownames(x) 
} 

answer = data.frame('nth' = ans$small, 'rel' = ans$rel) 
1

既然你已经喜欢dplyr这里是我现在在做天purrr

purrr::map_dbl(mtcars, ~nth(., 2, order_by = .)) 
    mpg cyl disp  hp drat  wt qsec  vs  am gear carb 
10.400 4.000 75.700 62.000 2.760 1.615 14.600 0.000 0.000 3.000 1.000 

或只是dplyr,因为它已经加载了nth()

summarise_all(mtcars, funs(nth(., 2, order_by = .)) 
    mpg cyl disp hp drat wt qsec vs am gear carb 
1 10.4 4 75.7 62 2.76 1.615 14.6 0 0 3 1 
+1

没有包 - 'mtcars [sapply(mtcars,rank,ties.method =“first”)== 2]' – thelatemail

1

只是一个警告,如果您没有指定dplyr的订单nth(),它实际上不会进行排序:

例如,

> sapply(mtcars, dplyr::nth, 2) 
    mpg  cyl disp  hp drat  wt qsec  vs  am gear carb 
21.000 6.000 160.000 110.000 3.900 2.875 17.020 0.000 1.000 4.000 4.000 

这实际上只是数据的第二行:

> sapply(mtcars, Rfast::nth, 2) 
    mpg cyl disp  hp drat  wt qsec  vs  am gear carb 
10.400 4.000 75.700 62.000 2.760 1.615 14.600 0.000 0.000 3.000 1.000 

如果:

> mtcars[2,] 
       mpg cyl disp hp drat wt qsec vs am gear carb 
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 

在Rfast的nth功能默认排序呢你对性能很敏感,Rfast版本是通过使用一部分来写得很好的基于sortorderrank(包括dplyr::nth)的解决方案不适用。

0

随着dplyr::summarize_each

n <- 2 
dfchord %>% summarize_each(funs(nth(sort(.),n))) 
#   X5  X8  X13  X4  X17  X3  X9  X21  X16  X14  X2  X15  X1  X7 
# 1 0.5689322 0.5795683 0.3150177 0.3150177 0.6322464 0.868563 0.7046387 0.3582759 0.7252203 0.5163978 0.3651484 0.5163978 0.3582759 0.4222794 
#   X10  X40  X23  X25  X22  X20  X6  X18  X12  X39  X19  X11  X30  X34 
# 1 0.4222794 0.507107 0.6206017 0.4536844 0.4536844 0.654303 0.5126421 0.338204 0.338204 0.5126421 0.5393651 0.5804794 0.7270723 0.5242481 
#  X28  X31  X26  X29  X33  X24  X36  X37  X41  X27  X32  X35  X38 
# 1 0.735765 0.5242481 0.7270723 0.8749704 0.5715592 0.4933355 0.4933355 0.574123 0.7443697 0.6333863 0.6333863 0.7296583 0.6709442 
相关问题