2016-11-09 91 views
2

我有一个数组数组,我想通过ID获得最大数。在下一个示例中,第2列代表id,第4列代表值。 当id = 1时,最大值为308.45。当id = 2时,最大值是310.508474。按ID过滤最大数

输入:

[['X', '1', '0', '303.016666'], 
['X1', '1', '1', '305.516666'], 
['X2', '1', '2', '308.45'], 
['X3', '2', '0', '309.409836'], 
['X4', '2', '1', '310.508474'], 
['X5', '2', '2', '308.728813']] 

输出:

[['X2', '1', '2', '308.45'], 
['X4', '2', '1', '310.508474']] 

我怎么能这样做?

回答

4

使用pandas

import pandas as pd 

df = pd.DataFrame([ 
     ['X', 1, 0, 303.016666], 
     ['X1', 1, 1, 305.516666], 
     ['X2', 1, 2, 308.45], 
     ['X3', 2, 0, 309.409836], 
     ['X4', 2, 1, 310.508474], 
     ['X5', 2, 2, 308.728813]] 
) 

print(df.values[df.groupby(1)[3].idxmax()]) 

[['X2' 1 2 308.45] 
['X4' 2 1 310.508474]] 

numpy

a = np.array([ 
     ['X', 1, 0, 303.016666], 
     ['X1', 1, 1, 305.516666], 
     ['X2', 1, 2, 308.45], 
     ['X3', 2, 0, 309.409836], 
     ['X4', 2, 1, 310.508474], 
     ['X5', 2, 2, 308.728813] 
    ], dtype=object) 

ids = np.unique(a[:, 1]) 
grp = np.where(ids == a[:, [1]], 1, np.nan) 
expanded_value_column = grp * a[:, [3]].astype(float) 
max_positions = np.nanargmax(expanded_value_column, axis=0) 

print(a[max_positions]) 

[['X2' 1 2 308.45] 
['X4' 2 1 310.508474]] 

时机
enter image description here

2

最简单,最直观的解决方案,我可以想像:

>>> l = [['X', '1', '0', '303.016666'], 
... ['X1', '1', '1', '305.516666'], 
... ['X2', '1', '2', '308.45'], 
... ['X3', '2', '0', '309.409836'], 
... ['X4', '2', '1', '310.508474'], 
... ['X5', '2', '2', '308.728813']] 
>>> result = {} 
>>> for a, b, c, d in l: 
...  if b not in result or float(d) > float(result[b][2]): 
...   result[b] = (a, c, d) 
... 
>>> result 
{'1': ('X2', '2', '308.45'), '2': ('X4', '1', '310.508474')} 
>>> result = [(a, b, c, d) for b, (a, c, d) in result.items()] 
>>> result 
[('X2', '1', '2', '308.45'), ('X4', '2', '1', '310.508474')] 
+0

简单而有效,稍微调整以返回OP需要的内容。 – Saksow

0

您可以用set()使用沿着写字典理解表达用于存储唯一的标识:

my_data = [ 
    ['X', '1', '0', '303.016666'], 
    ['X1', '1', '1', '305.516666'], 
    ['X2', '1', '2', '308.45'], 
    ['X3', '2', '0', '309.409836'], 
    ['X4', '2', '1', '310.508474'], 
    ['X5', '2', '2', '308.728813']] 

# Unique ids 
my_id = set([data[1] for data in my_data]) 

my_max = {id: max([val for _, i, _, val in my_data if i==id]) for id in my_id} 
# Content of 'my_max': {'1': '308.45', '2': '310.508474'}