2017-10-18 44 views
0

我对3个变量(position,offer,group)作了crosstab。我怎样才能通过总计1个变量offer来计算百分比,而不是利润率(即按列来标准化)?计算高维交叉表中的百分比

df = pd.crosstab(df.group, [df.position, df.offer], margins = True) 

DF

pid offer position group 
1 accept left  group1 
1 accept left  group1 
1 accept right  group2 
1 reject right  group2 
1 reject right  group1 
2 reject right  group1 
2 reject left  group2 
2 accept left  group3 
3 accept right  group3 
3 reject right  group1 
3 reject right  group2 

我目前的交叉表:

position   left     right   All 
offer   accept reject accept reject   
group1   2   0   0  3  5 
group2   0   1   1  2  4 
group3   1   0   1  0  2 
All   3   1   2  5  11 

预期结果:

position   left     right 
offer   accept reject accept reject  
group1   1  0   0  1 
group2   0  1   0.33  0.66 
group3   1  0   1  0 

谢谢!

+1

'df'看起来像什么? –

回答

1

再往下一个步骤,groupby沿着列的第0级,除以sum

c = pd.crosstab(df.group, [df.position, df.offer]) 
df = c/c.groupby(level=0, axis=1).sum() 
print(df) 

position left   right   
offer accept reject accept reject 
group          
group1  1.0 0.0 0.000000 1.000000 
group2  0.0 1.0 0.333333 0.666667 
group3  1.0 0.0 1.000000 0.000000 

如果你像我,你可能想整个数字为整数,可以是这样做的是尽可能多的一个完美主义者:

df = c.div(c.groupby(level=0, axis=1).sum()).astype(object) 
print(df) 

position left   right   
offer accept reject accept reject 
group          
group1  1  0   0   1 
group2  0  1 0.333333 0.666667 
group3  1  0   1   0 
+0

@COLDSPEED,我怎么能groupby多层次? 'groupby([level = 0,level = 1],axis = 1)'似乎不起作用。谢谢! – Kay

+1

@Kay'groupby(level = [0,1],axis = 1)' –

0

你可以使用

In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0) 

In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack() 
Out[4014]: 
offer accept   reject 
position left  right left  right 
group 
group1  1.0 0.000000 0.0 1.000000 
group2  0.0 0.333333 1.0 0.666667 
group3  1.0 1.000000 0.0 0.000000 

你也可以有也可以从pivot_table获得。

df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']