2014-02-25 31 views
0

我有一个包含密度值的DataFrame。我希望按'小时'值进行分组,将密度分类,然后在我的原始df中添加一个新列,其中包含仓位编号。然而,这是失败的,:加入或合并在分组的熊猫数据框上计算的值

df = pd.DataFrame({ 
    'hours': np.random.randint(0, 24, 10000), 
    'density' : np.random.sample(10000)}) 

def func(df): 
    """"calculates equal intervals of a series or array""" 
    intervals = pysal.esda.mapclassify.Equal_Interval(df.density, 5) 
    # yb is an ndarray containing the bin indices, 0 - 4 in this case 
    return intervals.yb 

df['bins'] = df.groupby(df.hours).transform(func) 

给人AssertionError: length of join_axes must not be equal to 0

如果我只是组中的对象和适用的间隔功能,它看起来像这样:

grp = df.groupby(df.hours).apply(func) 
grp 

Out[106]: 
hours 
0  [2, 4, 3, 4, 0, 4, 2, 2, 0, 1, 0, 0, 2, 2, 0, ... 
1  [4, 1, 0, 4, 0, 2, 2, 3, 2, 3, 0, 3, 4, 3, 2, ... 
2  [4, 1, 0, 2, 3, 4, 1, 1, 0, 3, 4, 4, 2, 4, 0, ... 
3  [3, 0, 0, 4, 0, 0, 0, 1, 2, 2, 0, 2, 2, 2, 1, ... 
4  [0, 1, 1, 2, 1, 3, 1, 3, 2, 2, 1, 4, 0, 4, 2, ... 
5  [2, 0, 2, 1, 3, 1, 1, 0, 4, 4, 2, 1, 4, 1, 2, ... 
6  [1, 2, 3, 3, 3, 2, 4, 1, 2, 1, 2, 0, 3, 2, 0, ... 
7  [3, 0, 3, 1, 3, 1, 2, 1, 4, 2, 1, 2, 1, 1, 1, ... 
8  [0, 1, 4, 3, 0, 1, 0, 0, 1, 0, 2, 1, 0, 1, 1, ... 
9  [4, 2, 0, 4, 1, 3, 2, 3, 4, 1, 1, 4, 4, 4, 4, ... 
10  [4, 4, 3, 3, 1, 2, 3, 0, 2, 4, 2, 4, 0, 2, 2, ... 
11  [0, 1, 3, 0, 1, 1, 1, 1, 2, 1, 2, 0, 3, 3, 4, ... 
12  [3, 1, 1, 0, 4, 4, 3, 0, 1, 2, 1, 1, 4, 2, 0, ... 
13  [1, 1, 0, 2, 0, 1, 4, 1, 2, 2, 3, 1, 2, 0, 3, ... 
14  [2, 4, 0, 2, 1, 2, 0, 4, 4, 2, 3, 4, 2, 1, 1, ... 
15  [2, 4, 3, 4, 1, 0, 3, 1, 2, 0, 3, 4, 2, 2, 3, ... 
16  [0, 4, 2, 3, 3, 4, 0, 3, 2, 0, 1, 0, 0, 2, 0, ... 
17  [3, 1, 4, 4, 0, 4, 1, 0, 4, 3, 3, 2, 3, 1, 4, ... 
18  [4, 3, 0, 2, 4, 2, 2, 0, 2, 2, 1, 2, 1, 0, 1, ... 
19  [3, 0, 3, 1, 1, 0, 1, 1, 3, 3, 2, 3, 4, 0, 0, ... 
20  [3, 0, 1, 4, 0, 0, 4, 2, 4, 2, 2, 0, 4, 0, 0, ... 
21  [4, 2, 3, 3, 1, 2, 0, 4, 2, 0, 2, 2, 1, 2, 2, ... 
22  [0, 4, 1, 1, 3, 1, 4, 1, 3, 4, 4, 0, 4, 4, 4, ... 
23  [4, 1, 2, 0, 2, 0, 0, 0, 2, 3, 1, 1, 3, 0, 1, ... 
dtype: object 

是否有加入的标准方式或合并从分组对象计算的值,或者我应该使用transform的不同?

+0

我没有'pysal',但你应该能够返回一个'pd.Series'并有更好的运气。 '返回pd.Series(intervals.yb)'。 – Justin

+0

@Justin给了我'ValueError:无法从形状(431)广播输入数组到形状(431,2)'(431是'0'组中的数值个数) – urschrei

+0

尝试在这样的列上进行转换 - df ['bins'] = df.groupby(df.hours).density.transform(func) – user1827356

回答

0

尝试在列变换这样的 -

df['bins'] = df.groupby(df.hours).density.transform(func) 

注:FUNC需要改变接收系列以arg