行值在大熊猫数据帧

列数组我有一个熊猫数据框，看起来像这样：行值在大熊猫数据帧

+---+--------+-------------+------------------+ 
| | ItemID | Description | Feedback   | 
+---+--------+-------------+------------------+ 
| 0 | 8988 | Tall Chair | I hated it  | 
+---+--------+-------------+------------------+ 
| 1 | 8988 | Tall Chair | Best chair ever | 
+---+--------+-------------+------------------+ 
| 2 | 6547 | Big Pillow | Soft and amazing | 
+---+--------+-------------+------------------+ 
| 3 | 6547 | Big Pillow | Horrific color | 
+---+--------+-------------+------------------+

我想从“反馈”列中的值连接成一个新列，用逗号隔开， ItemID匹配的地方。像这样：

+---+--------+-------------+----------------------------------+ 
| | ItemID | Description | NewColumn      | 
+---+--------+-------------+----------------------------------+ 
| 0 | 8988 | Tall Chair | I hated it, Best chair ever  | 
+---+--------+-------------+----------------------------------+ 
| 1 | 6547 | Big Pillow | Soft and amazing, Horrific color | 
+---+--------+-------------+----------------------------------+

我已经尝试了几个变化的枢轴，合并，堆叠等，我卡住了。
我认为 NewColumn最终将成为一个数组，但我相当新的Python，所以我不确定。
此外，最终，我要去尝试，并使用这个文本分类（新的“描述”产生一些“反馈”标签[多类问题]）

来源

2016-03-03 nacc

呼叫.groupby('ItemID')你的数据帧，然后将拼接反馈栏：

df.groupby('ItemID')['Feedback'].apply(lambda x: ', '.join(x))

请参阅Pandas groupby: How to get a union of strings。

来源

2016-03-03 14:38:15 dbc

我想你可以通过groupby列ItemID和Description，applyjoin和最后reset_index：

print df.groupby(['ItemID', 'Description'])['Feedback'].apply(', '.join).reset_index(name='NewColumn') 
    ItemID Description       NewColumn 
0 6547 Big Pillow Soft and amazing, Horrific color 
1 8988 Tall Chair  I hated it, Best chair ever

如果你不需要Description柱：

print df.groupby(['ItemID'])['Feedback'].apply(', '.join).reset_index(name='NewColumn') 
    ItemID       NewColumn 
0 6547 Soft and amazing, Horrific color 
1 8988  I hated it, Best chair ever

来源

2016-03-03 14:44:47 jezrael

行值在大熊猫数据帧

回答

相关问题