2017-03-16 91 views
2

或者也称为长到宽格式。Python堆栈为堆栈格式

我有以下几点:

ID1 ID2 POS1 POS2 TYPE  TYPEVAL 
--- --- ---- ---- ----  ------- 
A  001 1  5  COLOR RED 
A  001 1  5  WEIGHT 50KG 
A  001 1  5  HEIGHT 160CM 
A  002 6  19  FUTURE YES 
A  002 6  19  PRESENT NO 
B  001 26  34  COLOUR BLUE 
B  001 26  34  WEIGHT 85KG 
B  001 26  34  HEIGHT 120CM 
C  001 10  13  MOBILE NOKIA  
C  001 10  13  TABLET ASUS 

,我想给TYPE列浇铸成每每一个独特的价值新列即

ID1 ID2 POS1 POS2 COLOR WEIGHT HEIGHT FUTURE PRESENT MOBILE TABLET 
A  001 1  5  RED  50KG  160CM  NA  NA   NA  NA 
A  002 6  19  NA  NA  NA  YES  NO   NA  NA 
B  001 26  34  BLUE  85KG  120CM  NA  NA   NA  NA 
C  001 10  13  NA  NA  NA  NA  NA   NOKIA  ASUS  

,我曾尝试通过以下方式这样做:

PD.pivot_table(df,index=["ID1","ID2"],columns=["BEGIN","END","TYPE"],values=["TYPEVAL"]) 

但是我得到:

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/local/lib/python2.7/dist-packages/pandas/tools/pivot.py", line 127, in pivot_table 
    agged = grouped.agg(aggfunc) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3690, in aggregate 
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3179, in aggregate 
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/base.py", line 432, in _aggregate 
    return getattr(self, arg)(*args, **kwargs), None 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1009, in mean 
    return self._cython_agg_general('mean') 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3113, in _cython_agg_general 
    how, numeric_only=numeric_only) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3159, in _cython_agg_blocks 
    raise DataError('No numeric types to aggregate') 

其中提示我通过某个数字函数(即,平均或总和)。然而,我不想做这样的事情,我只是想调换TYPE列而没有任何聚合。

任何建议将不胜感激!

回答

3

你可以用所有,但'TYPEVAL'设置索引列然后unstack

df.set_index(
    df.columns.difference(['TYPEVAL']).tolist() 
).TYPEVAL.unstack('TYPE').reset_index().rename_axis(None, axis=1) 

enter image description here

+0

感谢这工作 – brucezepplin

2

我认为你需要pivot_table与聚集first或者多个值joinsum,因为deafult聚合函数是mean,它仅适用于数字:

df1 = pd.pivot_table(df, 
        index=["ID1","ID2","POS1","POS2",], 
        columns="TYPE", 
        values="TYPEVAL", 
        aggfunc='first') 
     .reset_index().rename_axis(None, axis=1) 

print (df1) 
    ID1 ID2 POS1 POS2 COLOR COLOUR FUTURE HEIGHT MOBILE PRESENT TABLET WEIGHT 
0 A 1  1  5 RED None None 160CM None None None 50KG 
1 A 2  6 19 None None YES None None  NO None None 
2 B 1 26 34 None BLUE None 120CM None None None 85KG 
3 C 1 10 13 None None None None NOKIA None ASUS None 

df1 = pd.pivot_table(df, 
        index=["ID1","ID2","POS1","POS2",], 
        columns="TYPE", 
        values="TYPEVAL", 
        aggfunc=','.join) 
     .reset_index().rename_axis(None, axis=1) 
print (df1) 
    ID1 ID2 POS1 POS2 COLOR COLOUR FUTURE HEIGHT MOBILE PRESENT TABLET WEIGHT 
0 A 1  1  5 RED None None 160CM None None None 50KG 
1 A 2  6 19 None None YES None None  NO None None 
2 B 1 26 34 None BLUE None 120CM None None None 85KG 
3 C 1 10 13 None None None None NOKIA None ASUS None