2017-04-13 48 views
1

这是我的数据框找出箱中的一组百分比利用大熊猫

city trips_in_first_30_days bins 
0 King's Landing 4 (3, 125] 
1 Astapor 0 NaN 
2 Astapor 3 (2, 3] 
3 King's Landing 9 (3, 125] 
4 Winterfell 14 (3, 125] 
5 Winterfell 2 (1, 2] 
6 Astapor 1 (0, 1] 
7 Winterfell 2 (1, 2] 
8 Winterfell 2 (1, 2] 
9 Winterfell 1 (0, 1] 
10 Winterfell 1 (0, 1] 
11 Winterfell 3 (2, 3] 
12 Winterfell 1 (0, 1] 
13 King's Landing 0 NaN 
14 Astapor 1 (0, 1] 
15 Winterfell 1 (0, 1] 
16 King's Landing 1 (0, 1] 
17 King's Landing 0 NaN 
18 King's Landing 6 (3, 125] 
19 King's Landing 0 NaN 
20 Winterfell 1 (0, 1] 
21 Astapor 1 (0, 1] 
22 Winterfell 0 NaN 
23 King's Landing 0 NaN 
24 Astapor 4 (3, 125] 
25 Winterfell 1 (0, 1] 
26 Astapor 1 (0, 1] 
27 Winterfell 3 (2, 3] 
28 Winterfell 0 NaN 
29 Astapor 1 (0, 1] 
... ... ... ... 
49970 Winterfell 2 (1, 2] 
49971 King's Landing 0 NaN 
49972 Winterfell 1 (0, 1] 
49973 Astapor 2 (1, 2] 
49974 Winterfell 1 (0, 1] 
49975 Winterfell 11 (3, 125] 
49976 King's Landing 0 NaN 
49977 Astapor 4 (3, 125] 
49978 Winterfell 1 (0, 1] 
49979 Winterfell 0 NaN 
49980 Astapor 1 (0, 1] 
49981 Astapor 0 NaN 
49982 King's Landing 0 NaN 
49983 Winterfell 1 (0, 1] 
49984 Winterfell 1 (0, 1] 
49985 Astapor 1 (0, 1] 
49986 Winterfell 0 NaN 
49987 Winterfell 3 (2, 3] 
49988 King's Landing 1 (0, 1] 
49989 Winterfell 1 (0, 1] 
49990 Astapor 1 (0, 1] 
49991 Winterfell 0 NaN 
49992 King's Landing 1 (0, 1] 
49993 Astapor 3 (2, 3] 
49994 Astapor 1 (0, 1] 
49995 King's Landing 0 NaN 
49996 Astapor 1 (0, 1] 
49997 Winterfell 0 NaN 
49998 Astapor 2 (1, 2] 
49999 Astapor 0 NaN 

df['bins']小sanpshot是绝对的,我已经用pd.cuttrips_in_first_30_days在不同介绍。

现在我有兴趣了解何时按城市分组trips_in_first_30_days多少百分比下降到各个分档?

例如,对于城市astapor百分之多少trips_in_first_30_days下降(0,1];?有多少在(1,2]等

是否有可能做到这一点,就好象D型类别,不能有操作进行,以及如何做到这一点

编辑:??

在尝试建议的解决方案:

def calc_bin_percentage(group_df): 
bins_count = group_df.groupby("bins")["trips_in_first_30_days"].count() 
return 100 * bins_count/len(group_df) 
new_df.groupby("city").apply(calc_bin_percentage) 

的出认沽如下:

bins (0, 1] (1, 2] (2, 3] (3, 125] 
city     
Astapor 31.105601 14.787710 6.973509 14.878432 
King's Landing 22.408687 14.471866 7.541955 20.710760 
Winterfell 28.689578 14.959719 8.017655 20.371957 

每个城市的的百分比之和应为但事实并非如此

+0

你能告诉我们你期望的结果是什么样子吗? – piRSquared

+0

嗨,请现在检查。 –

回答

1

为此,请记住groupbyapply中使用的函数可能会返回一个pd.Series对象(在Pandas文档中称为flexible apply)。

试试下面的代码:

def calc_bin_percentage(group_df): 
    bins_count = group_df.groupby("bins")["trips_in_first_30_days"].sum() 
    return 100 * bins_count/group_df.sum() 

df.groupby("city").apply(calc_bin_percentage).unstack().fillna(0) 

它的工作分两个步骤 - 首先由城市分割的数据,然后对每一个城市,计算出每个仓的百分比。

结果应该是以城市为列,以列为列的表格。

+0

嗨,出于某种原因,在我的数据,当我尝试这样,每个组的百分比的总和不等于100. –

+0

可能是因为整数除法。我猜你正在使用Python 2(都是结果值整数?)。尝试乘以'100.0'而不是'100'(将强制浮点除法)。 (在Python 3中,float division是默认值) – tmrlvi

+0

使用python 3,结果值是float。 –