2017-05-03 44 views
0

我用这段代码从json读取数据。如何从列表中统计单词?

json_file='report.json' 

json_data=open(json_file) 
data = json.load(json_data) 

t0 = [] 
t1 = [] 
tn = [] 

#counts = Counter(data['behavior']['processes'][3]['calls']) 
print (type(data['behavior']['processes'][3]['calls'])) 

for i in data['behavior']['processes'][3]['calls']: 

    t0 = i['arguments'] 
    print(t0) 

json_data.close() 

它显示这样的数据。

<class 'list'> 
aa 
bb 
aa 
cc 
bb 
cc 
aa 

我要算话的frequentcy结果应该是AA = 3,BB = 2,CC = 2

如果我取消在Counter(data['behavior']['processes'][3]['calls'])它会显示错误。

TypeError: unhashable type: 'dict' 

如何从列表中统计单词?

+0

你能告诉我们你的样本数据? –

回答

0

计数器需要一个列表作为输入。

from collections import Counter 

#create a list from your data 
mylist = [i['arguments'] for i in data['behavior']['processes'][3]['calls']] 

#make a dict of counts 
counter_dict = Counter(mylist) 

#print out counts per item 
for val in counter_dict: 
    print '%i has %i occurrences' % (val, counter_dict[val]) 

(未测试的代码)

0

,因为我没有你正在使用的数据还没有测试。
但我认为这会奏效。

json_file='report.json' 

json_data=open(json_file) 
data = json.load(json_data) 

t0 = [] 
t1 = [] 
tn = [] 

#counts = Counter(data['behavior']['processes'][3]['calls']) 
print (type(data['behavior']['processes'][3]['calls'])) 

data_count = {} 
for i in data['behavior']['processes'][3]['calls']: 

    t0 = i['arguments'] 
    count = data_count.get(t0) 
    if count is None: 
     data_count[t0] = 1 
    else: 
     data_count[t0] = count + 1 

    print(t0) 

json_data.close() 
print(data_count) 
1

你可以做

Counter(map(lambda x:x['argument'], data['behavior']['processes'][3]['calls'])) 
1
counterDict = {} # <== 
json_file='report.json' 
json_data=open(json_file) 
data = json.load(json_data) 

t0 = [] 
t1 = [] 
tn = [] 

#counts = Counter(data['behavior']['processes'][3]['calls']) 
print (type(data['behavior']['processes'][3]['calls'])) 

for i in data['behavior']['processes'][3]['calls']: 

    t0 = i['arguments'] 
    counterDict[t0] = counterDict.get(t0,0)+1 # <=== 

json_data.close() 

print(counterDict)