2016-02-01 48 views
1

对不起,但尝试完成。Python操作json,列表和字典

我试图得到以下数据 - (只从一个更大的JSON文件小部分,结构相同)

{ 
    "count": 394, 
    "status": "ok", 
    "data": [ 
     { 
      "md5": "cd042ba78d0810d86755136609793d6d", 
      "threatscore": 90, 
      "threatlevel": 0, 
      "avdetect": 0, 
      "vxfamily": "", 
      "domains": [ 
       "dynamicflakesdemo.com", 
       "www.bountifulbreast.co.uk" 
      ], 
      "hosts": [ 
       "66.33.214.180", 
       "64.130.23.5", 
      ], 
      "environmentId": "1", 
     }, 
     { 
      "md5": "4f3a560c8deba19c5efd48e9b6826adb", 
      "threatscore": 65, 
      "threatlevel": 0, 
      "avdetect": 0, 
      "vxfamily": "", 
      "domains": [ 
       "px.adhigh.net" 
      ], 
      "hosts": [ 
       "130.211.155.133", 
       "65.52.108.163", 
       "172.225.246.16" 
      ], 
      "environmentId": "1", 
     } 
    ] 
} 

如果“threatscore”超过70我想将它添加到这个json结构 - Ex。 “数据”: { “MD5”: “cd042ba78d0810d86755136609793d6d”, “threatscore”:90,

{ 
"Event": 
    {"date":"2015-11-25", 
    "threat_level_id":"1", 
    "info":"HybridAnalysis", 
    "analysis":"0", 
    "distribution":"0", 
    "orgc":"SOC", 
    "Attribute": [ 
     {"type":"ip-dst", 
     "category":"Network activity", 
     "to_ids":True, 
     "distribution":"3", 
     "value":"66.33.214.180"}, 
     {"type":"ip-dst", 
     "category":"Network activity", 
     "to_ids":True, 
     "distribution":"3", 
     "value":"64.130.23.5"} 
     {"type":"domain", 
     "category":"Network activity", 
     "to_ids":True, 
     "distribution":"3", 
     "value":"dynamicflakesdemo.com"}, 
     {"type":"domain", 
     "category":"Network activity", 
     "to_ids":True, 
     "distribution":"3", 
     "value":"www.bountifulbreast.co.uk"} 
     {"type":"md5", 
     "category":"Payload delivery", 
     "to_ids":True, 
     "distribution":"3", 
     "value":"cd042ba78d0810d86755136609793d6d"}] 
} 
} 

这是我的代码 -

from datetime import datetime 
import os 
import json 
from pprint import pprint 

now = datetime.now() 

testFile = open("feed.json") 
feed = json.load(testFile) 


for x in feed['data']: 
    if x['threatscore'] > 90: 
     data = {} 
     data['Event']={} 
     data['Event']["date"] = now.strftime("%Y-%m-%d") 
     data['Event']["threat_level_id"] = "1" 
     data['Event']["info"] = "HybridAnalysis" 
     data['Event']["analysis"] = 0 
     data['Event']["distribution"] = 3 
     data['Event']["orgc"] = "Malware" 
     data['Event']["Attribute"] = [] 
     if 'hosts' in x: 
      data['Event']["Attribute"].append({'type': "ip-dst"}) 
      data['Event']["Attribute"][0]["category"] = "Network activity" 
      data['Event']["Attribute"][0]["to-ids"] = True 
      data['Event']["Attribute"][0]["distribution"] = "3" 
      data["Event"]["Attribute"][0]["value"] =x['hosts'] 
     if 'md5' in x: 
      data['Event']["Attribute"].append({'type': "md5"}) 
      data['Event']["Attribute"][1]["category"] = "Payload delivery" 
      data['Event']["Attribute"][1]["to-ids"] = True 
      data['Event']["Attribute"][1]["distribution"] = "3" 
      data['Event']["Attribute"][1]['value'] = x['md5'] 
     if 'domains' in x: 
      data['Event']["Attribute"].append({'type': "domain"}) 
      data['Event']["Attribute"][2]["category"] = "Network activity" 
      data['Event']["Attribute"][2]["to-ids"] = True 
      data['Event']["Attribute"][2]["distribution"] = "3" 
      data['Event']["Attribute"][2]["value"] = x['domains'] 
     attributes = data["Event"]["Attribute"] 
     data["Event"]["Attribute"] = [] 
     for attribute in attributes: 
      for value in attribute["value"]: 
        if value == " ": 
         pass 
        else: 
         new_attr = attribute.copy() 
         new_attr["value"] = value 
         data["Event"]["Attribute"].append(new_attr) 
     pprint(data) 

with open('output.txt', 'w') as outfile: 
    json.dump(data, outfile) 

而现在看来要清洗但是数据['md5']在每个字母上被分割,我认为它就像L3viathan早些时候所说的那样,我一直覆盖字典中的第一个元素......但我不确定如何让它保留下来追加???

{'Event': {'Attribute': [{'category': 'Network activity', 
          'distribution': '3', 
          'to-ids': True, 
          'type': 'ip-dst', 
          'value': u'216.115.96.174'}, 
         {'category': 'Network activity', 
          'distribution': '3', 
          'to-ids': True, 
          'type': 'ip-dst', 
          'value': u'64.4.54.167'}, 
         {'category': 'Network activity', 
          'distribution': '3', 
          'to-ids': True, 
          'type': 'ip-dst', 
          'value': u'63.250.200.37'}, 
         {'category': 'Payload delivery', 
          'distribution': '3', 
          'to-ids': True, 
          'type': 'md5', 
          'value': u'7'}, 
         {'category': 'Payload delivery', 
          'distribution': '3', 
          'to-ids': True, 
          'type': 'md5', 
          'value': u'1'}, 

而且还获得最终以下错误: 回溯(最近通话最后一个): 文件 “hybridanalysis.py” 34行,在 数据[ '事件'] [ “属性”] [1] [“category”] =“有效载荷传送” IndexError:列表索引超出范围

最终目标是设置它,以便我可以将事件发布到MISP,但他们必须在时间。

+1

重新考虑后,我认为你的实际问题是你读了一些数据,然后每当'threatscore'超过70时就将其覆盖。 – L3viathan

+0

Hey L3viathan!我认为这完全像你所说的,但是我怎样才能避免覆盖它,并让它在继续执行for循环时追加另一个“事件”。 – Dpitt1968

+0

将数据更改为您在循环外定义的列表。在循环中,只需附加一个新的事件字典。 – L3viathan

回答

1

我认为这应该可以解决您的问题。我一次性添加属性字典,并将数据移动到列表中(这更合适),但是您可能想要移除包装事件的多余列表。

from datetime import datetime 
import os 
import json 
from pprint import pprint 

now = datetime.now() 

testFile = open("feed.json") 
feed = json.load(testFile) 

data_list = [] 

for x in feed['data']: 
    if x['threatscore'] > 90: 
     data = {} 
     data['Event']={} 
     data['Event']["date"] = now.strftime("%Y-%m-%d") 
     data['Event']["threat_level_id"] = "1" 
     data['Event']["info"] = "HybridAnalysis" 
     data['Event']["analysis"] = 0 
     data['Event']["distribution"] = 3 
     data['Event']["orgc"] = "Malware" 
     data['Event']["Attribute"] = [] 
     if 'hosts' in x: 
      data['Event']["Attribute"].append({ 
       'type': 'ip-dst', 
       'category': 'Network activity', 
       'to-ids': True, 
       'distribution': '3', 
       'value': x['hosts']}) 
     if 'md5' in x: 
      data['Event']["Attribute"].append({ 
       'type': 'md5', 
       'category': 'Payload delivery', 
       'to-ids': True, 
       'distribution': '3', 
       'value': x['md5']}) 
     if 'domains' in x: 
      data['Event']["Attribute"].append({ 
       'type': 'domain', 
       'category': 'Network activity', 
       'to-ids': True, 
       'distribution': '3', 
       'value': x['domains']}) 
     attributes = data["Event"]["Attribute"] 
     data["Event"]["Attribute"] = [] 
     for attribute in attributes: 
      for value in attribute["value"]: 
        if value == " ": 
         pass 
        else: 
         new_attr = attribute.copy() 
         new_attr["value"] = value 
         data["Event"]["Attribute"].append(new_attr) 
     data_list.append(data) 

with open('output.txt', 'w') as outfile: 
    json.dump(data_list, outfile) 
+0

L3viathan Grazi !!!!你有data = {}是'scratch'字典的名字,这并不重要,因为这个名字不会在最终的json结构中。在数据的内部创建数据['Event'] = {}这是你添加json的地方,它基本上是一个字典,所以只需输入它 - data ['Event'] [“info”] =“HybridAnalysis”which匹配发布所需的结构。属性你。追加,因为这是一个列表,我们将字典附加为列表的一部分。我不明白为什么MD5仍然被分成单独的字符? – Dpitt1968

+0

换句话说,为什么它会将'MD5'分成单独的字符,而不是与'hosts'和'domains'完全相同的东西?它将这些列表正确地分割在逗号上。 – Dpitt1968

+0

@ mathurin68啊,我明白了。因为它正在迭代该值。如果你遍历一个列表(比如域或主机),你会得到字符串。如果你遍历字符串,你会得到单字符字符串。 – L3viathan

1

在json中,“Attiribute”保存一个列表的值,其中包含1个项目,一个字典,如下所示。

{'Event': {'Attribute': [{'category': 'Network activity', 
         'distribution': '3', 
         'to-ids': True, 
         'type': 'ip-dst', 
         'value': [u'54.94.221.70']}] 
... 

当你调用data['Event']["Attribute"][1]["category"]你所得到的第二项(指数1)在属性列表中,而它只有一个项目,这就是为什么你所得到的错误。

0

谢谢L3viathan!下面是我如何调整它不迭代MD5的。

attributes = data["Event"]["Attribute"] 
    data["Event"]["Attribute"] = [] 
    for attribute in attributes: 
     if attribute['type'] == 'md5': 
      new_attr = attribute.copy()       
      new_attr["value"] = str(x['md5']) 
      data["Event"]["Attribute"].append(new_attr) 
     else: 
      for value in attribute["value"]: 
       new_attr = attribute.copy()       
       new_attr["value"] = value 
       data["Event"]["Attribute"].append(new_attr) 
    data_list.append(data) 

操纵json似乎是学习列表和词典的方法。