2015-04-28 38 views
0

我有一个数据框,其中包含加入我的网站并进行购买的每个用户的行。根据大熊猫中的行数据创建新行和新列

+---+-----+--------------------+---------+--------+-----+ 
| | uid |  msg   | _time | gender | age | 
+---+-----+--------------------+---------+--------+-----+ 
| 0 | 1 | confirmed_settings | 1/29/15 | M  | 37 | 
| 1 | 1 | sale    | 4/13/15 | M  | 37 | 
| 2 | 3 | confirmed_settings | 4/19/15 | M  | 35 | 
| 3 | 4 | confirmed_settings | 2/21/15 | M  | 21 | 
| 4 | 5 | confirmed_settings | 3/28/15 | M  | 18 | 
| 5 | 4 | sale    | 3/15/15 | M  | 21 | 
+---+-----+--------------------+---------+--------+-----+ 

我想改变数据框使每一行是一个UID独特,有一个叫saleconfirmed_settings列,因为他们行动的时间戳。请注意,并非每个用户都有sale,但每个用户都有一个confirmed_settings。如下所示:

+---+-----+--------------------+---------+---------+--------+-----+ 
| | uid | confirmed_settings | sale | _time | gender | age | 
+---+-----+--------------------+---------+---------+--------+-----+ 
| 0 | 1 | 1/29/15   | 4/13/15 | 1/29/15 | M  | 37 | 
| 1 | 3 | 4/19/15   | null | 4/19/15 | M  | 35 | 
| 2 | 4 | 2/21/15   | 3/15/15 | 2/21/15 | M  | 21 | 
| 3 | 5 | 3/28/15   | null | 3/28/15 | M  | 18 | 
+---+-----+--------------------+---------+---------+--------+-----+ 

什么是最好的熊猫成语/功能来完成呢?

回答

1

不知道这是否是最优化的解决方案,但应该工作:

In [1]: df 
Out[1]: 
    uid     msg _time gender age 
0 1 confirmed_settings 1/29/15  M 37 
1 1    sale 4/13/15  M 37 
2 3 confirmed_settings 4/19/15  M 35 
3 4 confirmed_settings 2/21/15  M 21 
4 5 confirmed_settings 3/28/15  M 18 
5 4    sale 3/15/15  M 21 

In [2]: df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index() 
In [3]: df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid') 

In [4]: df1 
Out[4]: 
    uid confirmed_settings  sale gender age 
0 1   1/29/15 4/13/15  M 37 
2 3   4/19/15  NaN  M 35 
3 4   2/21/15 3/15/15  M 21 
5 5   3/28/15  NaN  M 18 
+0

我其实混得'DF1 = df.pivot这个错误(指数=“UID”,列=“味精”,值='_时间')。reset_index()' - ValueError:索引包含重复的条目,不能重塑' – metersk

+0

这工作正常。你的熊猫版本,'pd .__ version__' – Zero

+0

我的版本是'0.15.2' – metersk

相关问题