如何检索熊猫数据框中的k个最高值？

例如，给定的数据帧：

   b   d   e 
Utah 1.624345 -0.611756 -0.528172 
Ohio -1.072969 0.865408 -2.301539 
Texas 1.744812 -0.761207 0.319039 
Oregon -0.249370 1.462108 -2.060141

与生成：

import numpy as np 
import pandas as pd 
np.random.seed(1) 
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), 
        index=['Utah', 'Ohio', 'Texas', 'Oregon']) 
print(frame)

在数据帧中的3个最高值是：

1.744812
1.624345
1.462108

来源

2017-08-16 Franck Dernoncourt

您可以使用pandas.DataFrame.stack + pandas.Series.nlargest，例如：

In [183]: frame.stack().nlargest(3) 
Out[183]: 
Texas b 1.744812 
Utah b 1.624345 
Oregon d 1.462108 
dtype: float64

或：

乐趣

In [184]: frame.stack().nlargest(3).reset_index(drop=True) 
Out[184]: 
0 1.744812 
1 1.624345 
2 1.462108 
dtype: float64

来源

2017-08-16 15:54:09 MaxU

谢谢，我已经错过了['pandas.DataFrame.stack']（https://pandas.pydata.org/pandas-docs/stable /generated/pandas.DataFrame.stack.html） –

@FranckDernoncourt，很高兴我可以帮助:) – MaxU

numpy

np.partition(df.values.ravel(), df.size - 3)[-1:-4:-1] 

array([ 1.744812, 1.624345, 1.462108])

击穿

np.partition拆分1-d阵列到最小k和最大n - k
我需要在df的值在这种情况下获得一个1-d方式
n是的df总规模，k是3种
[-1:-4:-1]手段，开始在-1，走了一路-4但不包括-4通过采取步骤的大小-1 ...翻译成最后3个元素开始与最后第一个。

# 1     2   3  4 
# |     |   |  | 
# v     v   v  v 
np.partition(df.values.ravel(), df.size - 3)[-1:-4:-1]

来源

2017-08-16 15:59:40 piRSquared

哇 - 它看起来非常好！它会带我一些时间来了解它... – MaxU

我只是让事情变得更糟（ - ：？？ – piRSquared

有一件事我不明白 - 为什么'df.size - 3'能否请您解释一下 – MaxU

另一种方式：

a = frame.values.flatten() 
a.sort() 
a[-3:]

来源

2017-08-16 16:11:26

除了其他很好的解决方案，这也适用：

>>>df_values = frame.values.ravel()           
>>>df_values[df_values.argsort()[:3]] 
array([-2.3015387 , -2.06014071, -1.07296862]) 
>>>

来源

2017-08-16 16:13:26 MedAli

可以在帧中的所有项目进行排序，选择最后3项。

最后，翻转数组的顺序。

np.flipud(
    np.sort(frame, axis=None)[-3:])

来源

2017-08-16 16:43:04

，也可以使用operator，functools

sorted(functools.reduce(operator.concat, df.values.tolist()),reverse=True)[0:3]

来源

2017-08-16 19:27:21 Wen

如何检索熊猫数据框中的k个最高值？

回答

相关问题