应用功能GROUPBY在python熊猫

我有一个数据帧，看起来像以下对象：应用功能GROUPBY在python熊猫

id salary days_employed category salary_percentile 
1 200000   400   1    14

其中0类表示自己是一个早期半途而废和1说，他一直保持更长的时间。

我的代码如下：

df1['salary_percentile'] = pd.qcut(df1['salary'], 50, labels=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50'])

切割INTP 50水桶和检查落在第37 salary_percentile行之后，这是我的数据框的样子： [在这里输入的形象描述] [ 2] [2]

def f(x): 
    early_quitter = x.loc[(x.category== '0')] 
    non = x.loc[(x.category == '1')] 
    proportion_early_quitters = early_quitter.shape[0]/x.shape[0] 
    return pd.Series({'prop_early_quitters': proportion_early_quitters}) 

bypercentile = df1.groupby('salary_percentile').apply(f) 
bypercentile = bypercentile.reset_index(level='None') 
bypercentile

我希望我的函数返回一个包含early_quitters的每一个组中的比例数据帧。即在每个组中，我想计算（len（early_quitter）/ len（group））。当我使用这个函数时，它为每个组返回一个0比例的数据帧。

有人可以帮助我吗？

在旁注中，我使用上面显示的代码创建了salary_percentile列。

谢谢！

来源

2016-12-22 Gingerbread

定义的变量是你得到这个使用Python 2？如果是这样，请尝试将'from __future__ import division'放在代码的开头。 – BrenBarn

非常感谢！它为我工作！我确实使用Python 2！再次感谢！！ – Gingerbread

首先，你得到零的原因是因为len返回一个整数，当你在python 2中完成整数除以整数时，你会得到一个整数，它的值是带有十进制分量的除法结果。所以“一些小于n的正数”/ n等于零。你可以用float(len(early_quitter))/len(group)

解决这个问题。然而，如果及早戒烟都以0标记，否则为1，早戒烟的比例

float(len(early_quitters))/len(group)

或者

1 - float(len(not_early_quitters))/len(group)

还是因为这些值是len产生与sum相同的值

1 - sum(not_early_quitters)/len(group)

然而，这是not_early_quitters的group内平均的定义。所以

1 - mean(early_quitters)

您应该能够从您与

1 - df1.groupby('salary_percentile').category.mean()

来源

2016-12-22 21:20:50 piRSquared

我不认为这是我正在寻找的。我已经编辑了一下我的问题。你能帮我编辑一下版本吗？ – Gingerbread

你想计算0的比例为什么列？ – piRSquared

你有我可以使用的样本数据吗？你提供了一行，并告诉我们你正在将它切成50个桶。 – piRSquared

应用功能GROUPBY在python熊猫

回答

相关问题