2017-08-25 85 views
3

我有一个数据帧,看起来像下面转换数据帧到元组

user        item \ 
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e   The Cove - Jack Johnson 
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia 
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e   Stronger - Kanye West 
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson 
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e  Learn To Fly - Foo Fighters 

rating 
0  1 
1  2 
2  1 
3  1 
4  1 

,并希望实现以下结构的列表的词典:

dict-> list of tuples 
user-> (item, rating) 

b80344d063b5ccb3212f76538f3d9e43d87dca9e -> list((The Cove - Jack 
Johnson, 1), ... ,) 

我可以这样做:

item_set = dict((user, set(items)) for user, items in \ 
data.groupby('user')['item']) 

但这只能让我半途而废。我如何从groupby中获得相应的“评级”值?

回答

2

设置user为指标,转换为使用df.groupby(level=0)使用df.apply,GROUPBY指数数组和使用dfGroupBy.agg得到一个列表,并转换为使用df.to_dict到词典:

In [1417]: df 
Out[1417]: 
             user        item \ 
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e   The Cove - Jack Johnson 
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia 
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e   Stronger - Kanye West 
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson 
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e  Learn To Fly - Foo Fighters 

    rating 
0  1 
1  2 
2  2 
3  2 
4  2 

In [1418]: df.set_index('user').apply(tuple, 1)\ 
      .groupby(level=0).agg(lambda x: list(x.values))\ 
      .to_dict() 
Out[1418]: 
{'b80344d063b5ccb3212f76538f3d9e43d87dca9e': [('The Cove - Jack Johnson', 1), 
    ('Entre Dos Aguas - Paco De Lucia', 2), 
    ('Stronger - Kanye West', 2), 
    ('Constellations - Jack Johnson', 2), 
    ('Learn To Fly - Foo Fighters', 2)]} 
+0

正是我想要的目的。谢谢 –

+1

@OktayGardener没问题。再过几分钟,如果你愿意,你可以[标记我的答案](https://stackoverflow.com/help/someone-answers)。 –