2015-12-16 107 views
4

这应该是快速的,但是我所做的关键/小组工作都没有提出我需要的东西。Groupby和Pivot Pandas表

我有这样的一个表:

 Letter Period Amount 
YrMnth 
2014-12  B  6  0 
2014-12  C  8  1 
2014-12  C  9  2 
2014-12  C  10  3 
2014-12  C  6  4 
2014-12  C  12  5 
2014-12  C  7  6 
2014-12  C  11  7 
2014-12  D  9  8 
2014-12  D  10  9 
2014-12  D  1  10 
2014-12  D  8  11 
2014-12  D  6  12 
2014-12  D  12  13 
2014-12  D  7  14 
2014-12  D  11  15 
2014-12  D  4  16 
2014-12  D  3  17 
2015-01  B  7  18 
2015-01  B  8  19 
2015-01  B  1  20 
2015-01  B  10  21 
2015-01  B  11  22 
2015-01  B  6  23 
2015-01  B  9  24 
2015-01  B  3  25 
2015-01  B  5  26 
2015-01  C  10  27 

我想它转动,从而使指数基本上是YrMonth和函,该周期内的列,而量的值。

我明白一般的枢轴,但是当我尝试用多个索引来做时会出错。我做了索引中的列,并试图这样:

In [76]: df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period') 

但我想出了这个错误:

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-76-fc2a4c5f244d> in <module>() 
----> 1 df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period') 

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in pivot(self, index, columns, values) 
    3761   """ 
    3762   from pandas.core.reshape import pivot 
-> 3763   return pivot(self, index=index, columns=columns, values=values) 
    3764 
    3765  def stack(self, level=-1, dropna=True): 

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in pivot(self, index, columns, values) 
    331   indexed = Series(self[values].values, 
    332       index=MultiIndex.from_arrays([index, 
--> 333              self[columns]])) 
    334   return indexed.unstack(columns) 
    335 

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath) 
    225          raise_cast_failure=True) 
    226 
--> 227     data = SingleBlockManager(data, index, fastpath=True) 
    228 
    229   generic.NDFrame.__init__(self, data, fastpath=True) 

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath) 
    3734    block = make_block(block, 
    3735        placement=slice(0, len(axis)), 
-> 3736        ndim=1, fastpath=True) 
    3737 
    3738   self.blocks = [block] 

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath) 
    2452 
    2453  return klass(values, ndim=ndim, fastpath=fastpath, 
-> 2454     placement=placement) 
    2455 
    2456 

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath) 
    85    raise ValueError('Wrong number of items passed %d,' 
    86        ' placement implies %d' % (
---> 87         len(self.values), len(self.mgr_locs))) 
    88 
    89  @property 

ValueError: Wrong number of items passed 138, placement implies 2 
+0

那么索引实际上是前两列(YrMnth和Letter),因此如果按照这种方式进行分组,则不应该有任何重复。我只是不知道该方法 – user1610719

回答

2

如果我理解正确的话,pivot_table可能会更接近你需要的东西:

df = df.pivot_table(index=["YrMnth", "Letter"], columns="Period", values="Amount") 

它给你:

Period   1 3 4 5 6 7 8 9 10 11 12 
YrMnth Letter            
2014-12 B  NaN NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN 
     C  NaN NaN NaN NaN 4 6 1 2 3 7 5 
     D  10 17 16 NaN 12 14 11 8 9 15 13 
2015-01 B  20 25 NaN 26 23 18 19 24 21 22 NaN 
     C  NaN NaN NaN NaN NaN NaN NaN NaN 27 NaN NaN 

正如意见建议:

df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount") 


Period   1 3 4 5 6 7 8 9 10 11 12 
YrMnth Letter            
2014-12 B  NaN NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN 
     C  NaN NaN NaN NaN 4 6 1 2 3 7 5 
     D  10 17 16 NaN 12 14 11 8 9 15 13 
2015-01 B  20 25 NaN 26 23 18 19 24 21 22 NaN 
     C  NaN NaN NaN NaN NaN NaN NaN NaN 27 NaN NaN 

也产生同样的,如果有人想阐明如何前者会失败,这将是巨大的。

+0

'pivot_table'的正确语法应该是:'df = pd.pivot_table(df,index = [“YrMnth”,“Letter”],columns =“Period”,values =“金额“)' –

+0

@Fabio,有什么区别? –

+0

正确的语法是'pandas.pivot_table()'不''df.pivot_table()'。 –