Python的大熊猫 - 应用功能，以分组数据帧

我有一个数据帧如下：Python的大熊猫 - 应用功能，以分组数据帧

A  B   C  
0 foo 1.496337 -0.604264 
1 bar -0.025106 0.257354 
2 foo 0.958001 0.933328 
3 foo -1.126581 0.570908 
4 bar -0.428304 0.881995 
5 foo -0.955252 1.408930 
6 bar 0.504582 0.455287 
7 bar -1.076096 0.536741 
8 bar 0.351544 -1.146554 
9 foo 0.430260 -0.348472

我想获得各组（当A分组）的B列的最大值，并将其添加在列C。因此，这里是我的尝试：

组由A：

df = df.groupby(by='A')

得到最大B列，然后试图将其应用到列“C”：

for name in ['foo','bar']: 
    maxi = df.get_group(name)['B'].max() 
    df.get_group(name)['C'] = df.get_group(name)['C']+maxi

此时熊猫建议Try using .loc[row_indexer,col_indexer] = value instead。这是否意味着我必须在行上使用for循环，并在列A上使用if并逐个修改C数据？我的意思是，这似乎不是熊猫，我觉得我失去了一些东西。我怎么能更好地解决这个分组数据框？

来源

2016-02-26 Prikers

此类操作使用变换或聚合完成。在你的情况，你需要transform

# groupby 'A' 
grouped = df.groupby('A') 

# transform B so every row becomes the maximum along the group: 
max_B = grouped['B'].transform('max') 

# add the new column to the old df 
df['D'] = df['A'] + max_B

或者在同一行：

In [2]: df['D'] = df.groupby('A')['B'].transform('max') + df['C'] 

In [3]: df 
Out[3]: 
    A   B   C   D 
0 foo 1.496337 -0.604264 0.892073 
1 bar -0.025106 0.257354 0.761936 
2 foo 0.958001 0.933328 2.429665 
3 foo -1.126581 0.570908 2.067245 
4 bar -0.428304 0.881995 1.386577 
5 foo -0.955252 1.408930 2.905267 
6 bar 0.504582 0.455287 0.959869 
7 bar -1.076096 0.536741 1.041323 
8 bar 0.351544 -1.146554 -0.641972 
9 foo 0.430260 -0.348472 1.147865

欲了解更多信息，请参阅 http://pandas.pydata.org/pandas-docs/stable/groupby.html

来源

2016-02-26 09:05:28 MaxNoe

Python的大熊猫 - 应用功能，以分组数据帧

回答

相关问题