跨不同分组的行熊猫MAX式

我有数据帧，看起来像这样：跨不同分组的行熊猫MAX式

Auction_id bid_price min_bid rank 
123   5   3  1 
123   4   3  2 
124   3   2  1 
124   1   2  2

我想创建返回MAX（秩1 MIN_BID，秩2 bid_price）另一列。我不在乎什么排名2列值。我希望结果看起来像这样：

Auction_id bid_price min_bid rank custom_column 
123   5   3  1  4 
123   4   3  2  NaN/Don't care 
124   3   2  1  2 
124   1   2  2  NaN/Don't care

我应该迭代通过分组的auction_ids吗？有人能提供人们需要熟悉的话题来解决这类问题吗？

来源

2015-05-24 Christopher Jenkins

是否有每个拍卖的ID只有两行的排名始终为1或2 –

是 - ？。我最终会清除等级2行以及 –

首先，将索引设置为Auction_id。然后，您可以使用loc为每个Auction_id选择适当的值，并在其值上使用最大值。最后，重置您的索引以返回到您的初始状态。

df.set_index('Auction_id', inplace=True) 
df['custom_column'] = pd.concat([df.loc[df['rank'] == 1, 'min_bid'], 
           df.loc[df['rank'] == 2, 'bid_price']], 
           axis=1).max(axis=1) 
df.reset_index(inplace=True) 
>>> df 
    Auction_id bid_price min_bid rank custom_column 
0   123   5  3  1    4 
1   123   4  3  2    4 
2   124   3  2  1    2 
3   124   1  2  2    2

来源

2015-05-24 20:23:24 Alexander

此方法在处理更大的数据集时尤其有用（我使用函数获取内存错误）。还有一个问题 - 如果rank2出价低于rank 1出价价格，是否可以运行级联？如果rank2 bid_price> rank1 bid_price，我想返回rank1 min_bid。 –

下面是一个简单的方法来做到这一点。

创建maxminbid()功能，它创建了一个val= MAX（秩1个MIN_BID，秩2 bid_price）并分配这grp['custom_column']，以及用于rank==2它存储与NaN

def maxminbid(grp): 
    val = max(grp.loc[grp['rank']==1, 'min_bid'].values, 
       grp.loc[grp['rank']==2, 'bid_price'].values)[0] 
    grp['custom_column'] = val 
    grp.loc[grp['rank']==2, 'custom_column'] = pd.np.nan 
    return grp

然后在Auction_id分组的对象适用maxminbid功能

df.groupby('Auction_id').apply(maxminbid) 


    Auction_id bid_price min_bid rank custom_column 
0   123   5  3  1    4 
1   123   4  3  2   NaN 
2   124   3  2  1    2 
3   124   1  2  2   NaN

但是，我怀疑，一定有一些比这个优雅的解决方案。

来源

2015-05-24 19:49:13 Zero

我不认为这是原油在所有 –

下面是做一些整形与支点的方法（）

Auction_id bid_price min_bid rank 
0   123   5  3  1    
1   123   4  3  2   
2   124   3  2  1    
3   124   1  2  2

然后重塑你的帧（DF）

pv = df.pivot("Auction_id","rank") 
pv 
        bid_price min_bid 
rank    1 2  1 2 
Auction_id       
123    5 4  3 3 
124    3 1  2 2

添加到PV列包含最大。我“米使用ILOC获得PV数据帧的切片。

pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1) 
    pv 

       bid_price min_bid custom_column 
rank    1 2  1 2    
Auction_id          
123    5 4  3 3    4 
124    3 1  2 2    2

，然后通过映射最大添加到原始帧（DF），以我们的光伏帧

df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"]) 
df 

    Auction_id bid_price min_bid rank custom_column 
0   123   5  3  1    4 
1   123   4  3  2   NaN 
2   124   3  2  1    2 
3   124   1  2  2   NaN

所有组合的步骤

pv = df.pivot("Auction_id","rank") 
pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1) 
df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"]) 
df 

    Auction_id bid_price min_bid rank custom_column 
0   123   5  3  1    4 
1   123   4  3  2   NaN 
2   124   3  2  1    2 
3   124   1  2  2   NaN

来源

2015-05-24 20:36:49

跨不同分组的行熊猫MAX式

回答

相关问题