2013-01-13 95 views
0

我试图复制一个例子了韦斯·麦金尼的书对大熊猫的代码是在这里(它假定名称的文件夹下的所有名称的数据文件都)熊猫集团示例错误

# -*- coding: utf-8 -*- 
import numpy as np 
import pandas as pd 

years = range(1880, 2011) 
pieces = [] 
columns = ['name', 'sex', 'births'] 
for year in years: 
    path = 'names/yob%d.txt' % year 
    frame = pd.read_csv(path, names=columns) 
    frame['year'] = year 
    pieces.append(frame) 

names = pd.concat(pieces, ignore_index=True) 
names 

def get_tops(group):  
    return group.sort_index(by='births', ascending=False)[:1000] 

grouped = names.groupby(['year','sex']) 
grouped.apply(get_tops) 

我使用熊猫0.10 Python 2.7。我看到的错误是这样的:

Traceback (most recent call last): 
    File "names.py", line 21, in <module> 
    grouped.apply(get_tops) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 321, in apply 
    return self._python_apply_general(f) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 324, in _python_apply_general 
    keys, values, mutated = self.grouper.apply(f, self.obj, self.axis) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 585, in apply 
    values, mutated = splitter.fast_apply(f, group_keys) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 2127, in fast_apply 
    results, mutated = lib.apply_frame_axis0(sdata, f, names, starts, ends) 
    File "reduce.pyx", line 421, in pandas.lib.apply_frame_axis0 (pandas/lib.c:24934) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2028, in __setattr__ 
    self[name] = value 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2043, in __setitem__ 
    self._set_item(key, value) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2078, in _set_item 
    value = self._sanitize_column(key, value) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2112, in _sanitize_column 
    raise AssertionError('Length of values does not match ' 
AssertionError: Length of values does not match length of index 

任何想法?

+0

对此我很抱歉:我对自己介绍0.10的这个bug非常恼火,它在git repo中得到了修复,我将在熊猫发布过程中添加“测试所有书本代码”。 –

回答

2

我认为这是0.10中引入的一个错误,即issue #2605, “在GroupBy之后使用apply时发生AssertionError”。它从那以后就被修复了。

您可以等待0.10.1版本,这应该不会太久从现在开始,您也可以升级到开发版本(无论是通过git或只需通过下载大师的zip

+0

0.10有什么解决方法吗?在某些情况下,我可以在'groupby'之后'申请'工作,而在其他情况下则不能。 – smci

+0

事实上,这仍然发生在0.10.1 - 我使用0.10.1。但是这个问题被标记为封闭。奇怪的。 – smci

+0

...并固定在0.11 – smci