2016-10-12 28 views
2

我有数据帧熊猫:使用GROUPBY如果列中的值是字典

category dictionary 
moto {'motocycle':10, 'buy":8, 'motocompetition':7} 
shopping {'buy':200, 'order':20, 'sale':30} 
IT {'iphone':214, 'phone':1053, 'computer':809} 
shopping {'zara':23, 'sale':18, 'sell':20} 
IT {'lenovo':200, 'iphone':300, 'mac':200} 

我需要GROUPBY类别和结果连击字典,并选择3个按键具有最大的价值。然后获取数据帧,在列category我有独特的类别,并在列data列我有列表中的键。

我知道,我可以使用Counter连接字典,但我不知道,这是如何做类别。 欲望输出

category data 
moto ['motocycle', 'buy', 'motocompetition'] 
shopping ['buy', 'sale', 'zara'] 
IT ['phone', 'computer', 'iphone'] 
+0

你可以给一个工作的例子吗?到目前为止你看到了什么? – JMat

+0

如果我只有字典,没有数据框,我可以用'a = {1:2,2:5,6:9,u'cat':2} b = {1:4,4:2, 6:1,u'dog':11,u'cat':8} c = {5:2,7:1,u'dog':19} a =计数器(a) b =计数器(b ) c = Counter(c) d = a + b + c result = dict(d.most_common(3)) list = result.keys()'@JMat –

回答

3

您可以nlargestIndex.tolist自定义函数中使用groupby

df = pd.DataFrame({ 
'category':['moto','shopping','IT','shopping','IT'], 
'dictionary': 
[{'motocycle':10, 'buy':8, 'motocompetition':7}, 
{'buy':200, 'order':20, 'sale':30}, 
{'iphone':214, 'phone':1053, 'computer':809}, 
{'zara':23, 'sale':18, 'sell':20}, 
{'lenovo':200, 'iphone':300, 'mac':200}]}) 

print (df) 
    category           dictionary 
0  moto {'motocycle': 10, 'buy': 8, 'motocompetition': 7} 
1 shopping    {'sale': 30, 'buy': 200, 'order': 20} 
2  IT {'phone': 1053, 'computer': 809, 'iphone': 214} 
3 shopping    {'sell': 20, 'zara': 23, 'sale': 18} 
4  IT   {'lenovo': 200, 'mac': 200, 'iphone': 300} 


import collections 
import functools 
import operator 

def f(x): 
    #some possible solution for sum values of dict 
    #http://stackoverflow.com/a/3491086/2901002 
    return pd.Series(functools.reduce(operator.add, map(collections.Counter, x))) 
      .nlargest(3).index.tolist() 

print (df.groupby('category')['dictionary'].apply(f).reset_index()) 
    category       dictionary 
0  IT   [phone, computer, iphone] 
1  moto [motocycle, buy, motocompetition] 
2 shopping     [buy, sale, zara] 
+0

我给你发邮件到'gmail',请核实。 – jezrael

1
df = pd.DataFrame(dict(category=['moto', 'shopping', 'IT', 'shopping', 'IT'], 
         dictionary=[ 
          dict(motorcycle=10, buy=8, motocompetition=7), 
          dict(buy=200, order=20, sale=30), 
          dict(iphone=214, phone=1053, computer=809), 
          dict(zara=23, sale=18, sell=20), 
          dict(lenovo=200, iphone=300, mac=200), 
         ])) 

def top3(x): 
    return x.dropna().sort_values().tail(3)[::-1].index.tolist() 

df.dictionary.apply(pd.Series).groupby(df.category).sum().apply(top3, axis=1) 

category 
IT     [phone, computer, iphone] 
moto  [motorcycle, buy, motocompetition] 
shopping      [buy, sale, zara] 
dtype: object