熊猫：使用GROUPBY如果列中的值是字典

我有数据帧熊猫：使用GROUPBY如果列中的值是字典

category dictionary 
moto {'motocycle':10, 'buy":8, 'motocompetition':7} 
shopping {'buy':200, 'order':20, 'sale':30} 
IT {'iphone':214, 'phone':1053, 'computer':809} 
shopping {'zara':23, 'sale':18, 'sell':20} 
IT {'lenovo':200, 'iphone':300, 'mac':200}

我需要GROUPBY类别和结果连击字典，并选择3个按键具有最大的价值。然后获取数据帧，在列category我有独特的类别，并在列data列我有列表中的键。

我知道，我可以使用Counter连接字典，但我不知道，这是如何做类别。欲望输出

category data 
moto ['motocycle', 'buy', 'motocompetition'] 
shopping ['buy', 'sale', 'zara'] 
IT ['phone', 'computer', 'iphone']

来源

2016-10-12 Petr Petrov

你可以给一个工作的例子吗？到目前为止你看到了什么？ – JMat

如果我只有字典，没有数据框，我可以用'a = {1：2，2：5，6：9，u'cat'：2} b = {1：4，4：2， 6：1，u'dog'：11，u'cat'：8} c = {5：2,7：1，u'dog'：19} a =计数器（a） b =计数器（b ） c = Counter（c） d = a + b + c result = dict（d.most_common（3）） list = result.keys（）'@JMat –

您可以nlargest和Index.tolist自定义函数中使用groupby：

df = pd.DataFrame({ 
'category':['moto','shopping','IT','shopping','IT'], 
'dictionary': 
[{'motocycle':10, 'buy':8, 'motocompetition':7}, 
{'buy':200, 'order':20, 'sale':30}, 
{'iphone':214, 'phone':1053, 'computer':809}, 
{'zara':23, 'sale':18, 'sell':20}, 
{'lenovo':200, 'iphone':300, 'mac':200}]}) 

print (df) 
    category           dictionary 
0  moto {'motocycle': 10, 'buy': 8, 'motocompetition': 7} 
1 shopping    {'sale': 30, 'buy': 200, 'order': 20} 
2  IT {'phone': 1053, 'computer': 809, 'iphone': 214} 
3 shopping    {'sell': 20, 'zara': 23, 'sale': 18} 
4  IT   {'lenovo': 200, 'mac': 200, 'iphone': 300} 


import collections 
import functools 
import operator 

def f(x): 
    #some possible solution for sum values of dict 
    #http://stackoverflow.com/a/3491086/2901002 
    return pd.Series(functools.reduce(operator.add, map(collections.Counter, x))) 
      .nlargest(3).index.tolist() 

print (df.groupby('category')['dictionary'].apply(f).reset_index()) 
    category       dictionary 
0  IT   [phone, computer, iphone] 
1  moto [motocycle, buy, motocompetition] 
2 shopping     [buy, sale, zara]

来源

2016-10-12 08:56:28 jezrael

我给你发邮件到'gmail'，请核实。 – jezrael

df = pd.DataFrame(dict(category=['moto', 'shopping', 'IT', 'shopping', 'IT'], 
         dictionary=[ 
          dict(motorcycle=10, buy=8, motocompetition=7), 
          dict(buy=200, order=20, sale=30), 
          dict(iphone=214, phone=1053, computer=809), 
          dict(zara=23, sale=18, sell=20), 
          dict(lenovo=200, iphone=300, mac=200), 
         ])) 

def top3(x): 
    return x.dropna().sort_values().tail(3)[::-1].index.tolist() 

df.dictionary.apply(pd.Series).groupby(df.category).sum().apply(top3, axis=1) 

category 
IT     [phone, computer, iphone] 
moto  [motorcycle, buy, motocompetition] 
shopping      [buy, sale, zara] 
dtype: object

来源

2016-10-12 08:57:27 piRSquared

熊猫：使用GROUPBY如果列中的值是字典

回答

相关问题