2016-03-03 345 views
2

列数组我有一个熊猫数据框,看起来像这样:行值在大熊猫数据帧

+---+--------+-------------+------------------+ 
| | ItemID | Description | Feedback   | 
+---+--------+-------------+------------------+ 
| 0 | 8988 | Tall Chair | I hated it  | 
+---+--------+-------------+------------------+ 
| 1 | 8988 | Tall Chair | Best chair ever | 
+---+--------+-------------+------------------+ 
| 2 | 6547 | Big Pillow | Soft and amazing | 
+---+--------+-------------+------------------+ 
| 3 | 6547 | Big Pillow | Horrific color | 
+---+--------+-------------+------------------+ 

我想从“反馈”列中的值连接成一个新列,用逗号隔开, ItemID匹配的地方。像这样:

+---+--------+-------------+----------------------------------+ 
| | ItemID | Description | NewColumn      | 
+---+--------+-------------+----------------------------------+ 
| 0 | 8988 | Tall Chair | I hated it, Best chair ever  | 
+---+--------+-------------+----------------------------------+ 
| 1 | 6547 | Big Pillow | Soft and amazing, Horrific color | 
+---+--------+-------------+----------------------------------+ 

我已经尝试了几个变化的枢轴,合并,堆叠等,我卡住了。
认为 NewColumn最终将成为一个数组,但我相当新的Python,所以我不确定。
此外,最终,我要去尝试,并使用这个文本分类(新的“描述”产生一些“反馈”标签[多类问题])

回答

1

我想你可以通过groupbyItemIDDescriptionapplyjoin和最后reset_index

print df.groupby(['ItemID', 'Description'])['Feedback'].apply(', '.join).reset_index(name='NewColumn') 
    ItemID Description       NewColumn 
0 6547 Big Pillow Soft and amazing, Horrific color 
1 8988 Tall Chair  I hated it, Best chair ever 

如果你不需要Description柱:

print df.groupby(['ItemID'])['Feedback'].apply(', '.join).reset_index(name='NewColumn') 
    ItemID       NewColumn 
0 6547 Soft and amazing, Horrific color 
1 8988  I hated it, Best chair ever