2015-05-15 149 views
-1

在熊猫中,我有一个数据帧,它由两组在每组中有多个样本组成。每个组都有一个内部参考值,我想从该组内的所有样本值中减去。从熊猫中的行中减去组特定值

s = u"""Group sample value 
group1 ref1 18.1 
group1 smp1 NaN 
group1 smp2 20.3 
group1 smp3 30.0 
group2 ref2 16.1 
group2 smp4 29.2 
group2 smp5 19.9 
group2 smp6 28.9 
""" 
df = pd.read_csv(io.StringIO(s), sep='\s+') 
df = df.set_index(['Group', 'sample']) 
df 

Out[82]: 

       value  
Group sample 
group1 ref1 18.1 
     smp1 NaN 
     smp2 20.3 
     smp3 30.0 
group2 ref2 16.1 
     smp4 29.2 
     smp5 19.9 
     smp6 28.9 

我想要做的是添加一个新的列,其中从各个组中的所有样本(smp)中减去参考(ref)。像这样:

    value deltaValue 
SampleGroup sample    
Group1  ref  18.1 0 
       smp1  NaN  NaN 
       smp2  20.3 2.2 
       smp3  30.0 11.9 
Group2  ref2  16.1 0 
       smp4  29.2 13.1 
       smp5  19.9 3.8 
       smp6  28.9 12.8 

有没有人知道如何做到这一点?谢谢!

回答

0

将您的数据框按sample列分组。然后遍历每个组并获取ref样本值。然后减去整个列。

> df = pd.read_csv(io.StringIO(s), sep='\s+') 
> df['diff'] = 0 
> df_group = df.groupby('Group') 
> for index, group in df_group: 
     df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value'] 
> print df 
    Group sample value diff 
0 group1 ref1 18.1 0.0 
1 group1 smp1 NaN NaN 
2 group1 smp2 20.3 -2.2 
3 group1 smp3 30.0 -11.9 
4 group2 ref2 16.1 0.0 
5 group2 smp4 29.2 -13.1 
6 group2 smp5 19.9 -3.8 
7 group2 smp6 28.9 -12.8 
0

这里有一个办法做到这一点没有循环

首先创建一个func功能标识sampleref开始,然后计算delta值。

In [33]: def func(grp): 
    ref = grp.ix[grp['sample'].str.startswith('ref'), 'value'] 
    grp['delta'] = grp['value'] - ref.values[0] 
    return grp 

使用此func并应用在了dff.groupby('Group')

In [34]: dff.groupby('Group').apply(func) 
Out[34]: 
    Group sample value delta 
0 group1 ref1 18.1 0.0 
1 group1 smp1 NaN NaN 
2 group1 smp2 20.3 2.2 
3 group1 smp3 30.0 11.9 
4 group2 ref2 16.1 0.0 
5 group2 smp4 29.2 13.1 
6 group2 smp5 19.9 3.8 
7 group2 smp6 28.9 12.8 

当您dff开始应该是这样的,它可以像dff = df.reset_index()

In [35]: dff 
Out[35]: 
    Group sample value 
0 group1 ref1 18.1 
1 group1 smp1 NaN 
2 group1 smp2 20.3 
3 group1 smp3 30.0 
4 group2 ref2 16.1 
5 group2 smp4 29.2 
6 group2 smp5 19.9 
7 group2 smp6 28.9 
创建