2017-01-09 129 views
1

有这个CSV:Python的CSV到JSON阵列与来自CSV唯一值的对象作为一个JSON对象多于一个

Domain,IP,Server,PoweredBy,MetaGenerator,Email 
http://www.example1.com,1.1.1.1,,,, 
http://www.example2.com,2.2.2.2,Apache,PHP/5.5.9-1ubuntu4.20,, 
http://www.example3.com,3.3.3.3,Apache,PHP/5.5.9-1ubuntu4.20,Easy Digital Downloads v2.4.9;Powered by Visual Composer - drag and drop page builder for WordPress.,[email protected];[email protected] 

试图建立对象的JSON阵列,其中每个对象将CSV的唯一组合那里有许多(“;”隔开)值,即

正如我们可以看到,我们有www.example3.com

对于这种情况,不同MetaGenerators和电子邮件,对象的JSON数组应该是这样的,每个组合作为arr中的JSON对象AY:

[{'Domain': 'http://www.example1.com', 
    'Email': '', 
    'IP': '1.1.1.1', 
    'MetaGenerator': '', 
    'PoweredBy': '', 
    'Server': ''}, 
{'Domain': 'http://www.example2.com', 
    'Email': '', 
    'IP': '2.2.2.2', 
    'MetaGenerator': '', 
    'PoweredBy': 'PHP/5.5.9-1ubuntu4.20', 
    'Server': 'Apache'}, 
{'Domain': 'http://www.example3.com', 
    'Email': '[email protected]', 
    'IP': '2.2.2.2', 
    'MetaGenerator': 'Easy Digital Downloads v2.4.9', 
    'PoweredBy': 'PHP/5.5.9-1ubuntu4.20', 
    'Server': 'Apache'}, 
{'Domain': 'http://www.example3.com', 
    'Email': '[email protected]', 
    'IP': '2.2.2.2', 
    'MetaGenerator': 'Powered by Visual Composer - drag and drop page builder for WordPress.', 
    'PoweredBy': 'PHP/5.5.9-1ubuntu4.20', 
    'Server': 'Apache'}, 
{'Domain': 'http://www.example3.com', 
    'Email': '[email protected]', 
    'IP': '2.2.2.2', 
    'MetaGenerator': 'Easy Digital Downloads v2.4.9', 
    'PoweredBy': 'PHP/5.5.9-1ubuntu4.20', 
    'Server': 'Apache'}, 
{'Domain': 'http://www.example3.com', 
    'Email': '[email protected]', 
    'IP': '2.2.2.2', 
    'MetaGenerator': 'Powered by Visual Composer - drag and drop page builder for WordPress.', 
    'PoweredBy': 'PHP/5.5.9-1ubuntu4.20', 
    'Server': 'Apache'}] 

有这个Python代码:

import csv 
import pprint 
import json 

with open("results.csv", 'r') as csvfile: 
    reader = csv.DictReader(csvfile, delimiter=',') 
    out=[] 
    d=dict() 
    for row in reader: 
     if ';' in row['Email']: 
      val = row['Email'].split(';') 
      for v in val: 
      d['Email']=v 
      out.append(d)  
     if ';' in row['MetaGenerator']: 
      val = row['MetaGenerator'].split(';') 
      for v in val: 
      d['MetaGenerator']=v 
      out.append(d) 
     else: 
      d=row 
      out.append(d) 


pprint.pprint(out) 

但它不能正常工作。

如何实现我的目标?伪代码也可以。订单并不重要。我应该使用哪些模块?

感谢,

回答

3

试试这个(支票itertools DOC):

import csv 
import pprint 
import json 
import itertools 

out=[] 
with open("results.csv", 'r') as csvfile: 
    reader = csv.DictReader(csvfile, delimiter=',') 
    for row in reader: 

     Domains = row['Domain'].split(";") 
     Ips = row['IP'].split(";") 
     Servers = row['Server'].split(";") 
     Emails = row['Email'].split(";") 
     MetaGenerators = row['MetaGenerator'].split(";") 
     PoweredBy = row['PoweredBy'].split(";") 

     for comb in itertools.product(Domains, Ips, Servers, Emails, MetaGenerators, PoweredBy): 
      (cDomain, cIp, cServer, cEmail, cMeta, cPowered) = comb 

      out.append({ 
        'Domain': cDomain, 
        'IP': cIp, 
        'Server': cServer, 
        'Email': cEmail, 
        'MeraGenerator': cMeta, 
        'PoweredBy': cPowered 
       }) 

pprint.pprint(out) 

支票本的可读性,但聪明的解决方案,隔离CSV字段:

out=[] 
with open("results.csv", 'r') as csvfile: 
    reader = csv.DictReader(csvfile, delimiter=',') 
    headers = reader.fieldnames 

    for row in reader: 
     fields = [value.split(";") for key, value in row.iteritems()] 
     out += [{headers[key]: value for key, value in enumerate(comb)} for comb in itertools.product(*fields)] 

pprint.pprint(out) 
+1

完美的作品!谢谢。不会用itertools弄清楚... –

相关问题