2017-10-05 47 views
0

我有问题: 我有数据(〜300K行)计算和除法基于多标准的大熊猫

Date Column 1 Column 2 Column 3 Value 
1/1/2017 A  Jonas  Station1 8 
1/1/2017 A  Greg  Station1 5 
1/1/2017 A  Anton  Station1 1 
1/1/2017 B  Jonas  Station1 4 
1/1/2017 B  Greg  Station1 4 
1/1/2017 B  Mick  Station2 8 
1/1/2017 B  Anton  Station3 7 
1/1/2017 C  Jonas  Station4 2 
1/1/2017 C  Greg  Station1 7 

我需要计算下一个步骤: 通过每一个日期,由2栏中,栏3: 我有例如 '格雷格':

Date Column 1 Column 2 Column 3 Value 
1/1/2017 A  Greg  Station1 5 
1/1/2017 B  Greg  Station1 4 
1/1/2017 C  Greg  Station1 7 

萨姆所有C /计数(A,B),对于本例7/2 = 3.5 添加值 '3.5' 对每一行。

Date Column 1 Column 2 Column 3 Value 
1/1/2017 A  Greg  Station1 5+3.5 = 8.5 
1/1/2017 B  Greg  Station1 4+3.5 = 7.5 

pandas/python如何做到这一点? 更新:项目就像一个 '格雷格' 我有〜100天〜500列1,列2〜1K

+0

你可以在数据框中添加不同的日期并显示预期的输出吗? – Dark

回答

1

用途:

#filter values A, B 
df1 = df[df['Column 1'].isin(['A','B'])] 
#get count per groups, reindex by original index for original size 
b = (df1.groupby(['Date','Column 2', 'Column 3'])['Column 1'] 
     .transform('size') 
     .reindex(df.index)) 

#filter value C and rename for merge 
df2 = df.loc[df['Column 1'] == 'C', ['Date','Column 2','Column 3','Value']] 
     .rename(columns={'Value':'a'}) 

#merge to original a select only new column a 
a = pd.merge(df, df2, 'left')['a'] 
#divide and add column Value - if divide return NaN let original values 
df['Value'] = a.div(b).add(df['Value'], fill_value=0) 
print (df) 
     Date Column 1 Column 2 Column 3 Value 
0 1/1/2017  A Jonas Station1 8.0 
1 1/1/2017  A  Greg Station1 8.5 
2 1/1/2017  A Anton Station1 1.0 
3 1/1/2017  B Jonas Station1 4.0 
4 1/1/2017  B  Greg Station1 7.5 
5 1/1/2017  B  Mick Station2 8.0 
6 1/1/2017  B Anton Station3 7.0 
7 1/1/2017  C Jonas Station4 2.0 
8 1/1/2017  C  Greg Station1 7.0 

使用自定义功能应该是slowier:

def f(x): 
    a = x.loc[x['Column 1'] == 'C', 'Value'] 
    #mask of all A, B values per group 
    m = x['Column 1'].isin(['A','B']) 
    c = a/m.sum() 
    #if-else, because if no C exist per group get empty Series 
    c = 0 if c.empty else c.item() 
    x.loc[b, 'Value'] +=c 
    return x 


df = df.groupby(['Date','Column 2', 'Column 3']).apply(f) 
print (df) 
     Date Column 1 Column 2 Column 3 Value 
0 1/1/2017  A Jonas Station1 8.0 
1 1/1/2017  A  Greg Station1 8.5 
2 1/1/2017  A Anton Station1 1.0 
3 1/1/2017  B Jonas Station1 4.0 
4 1/1/2017  B  Greg Station1 7.5 
5 1/1/2017  B  Mick Station2 8.0 
6 1/1/2017  B Anton Station3 7.0 
7 1/1/2017  C Jonas Station4 2.0 
8 1/1/2017  C  Greg Station1 7.0 
+0

是的,这个功能非常慢 –