2017-08-10 123 views
0

可以说,我有一个巨大的名单,包括电话,电子邮件,网址,他们属于特定的组织/公司/人。电话号码,电子邮件或网址会有所不同。有些人可能不具备像一个字典我想他们组电话号码或电子邮件等分配一个巨大的元素列表到dict列表

a_list = [ 
"+99112233", 
"+39383",  
"www.johndoe.com", 
"[email protected]". 
"+9933933", 
"+99883399", 
"www.someother.com", 
"www.tt.com" 
"[email protected]", 
"[email protected]", 
] 

contacts = [ 
{ 'phones': ["+99112233", "+39383"], 
    'websites': ["www.johndoe.com"], 
    'emails': ['[email protected]'], 

}, 
{ 
'phones': ["+9933933","+99883399"], 
'websites': ['www.someother.com'], 
'emails': [] 
}, 
{ 
'phones': [], 
'websites': ['www.tt.com'], 
'emails': ['[email protected]', '[email protected]'] 
} 
] 

这是到目前为止我的代码:

push_flag = False 
contacts = [] 
phones = [] 
emails = [] 
webs = [] 
for contact in a_list: 
    text = contact 
    if text[0]== "+": 
     if push_flag: 
      contacts.append({ 
       'phones': phones, 
       'webs': webs, 
       'emails':emails, 
      }) 
      phones = [] 
      webs = [] 
      emails = [] 
      push_flag = False 
     phones.append(text) 
    elif text[0:3]=="www": 
     push_flag = True 
     webs.append(text) 
    elif "@" in text: 
     push_flag = True 
     emails.append(text) 

contacts.append({ 
      'phones': phones, 
      'webs': webs, 
      'emails':emails, 
      }) 
+1

你的代码有什么问题? – Harsha

+0

为什么生成的'list'包含3个'dicts'?不能将它作为单个字典吗? – voidpro

+0

3个字符意味着他们属于三个不同的公司/个人/组织 – Wasi

回答

1

有几件事可能会帮助您简化此处的逻辑。首先,我将使用正则表达式对的列表来标识每个元素是电话号码,网站还是电子邮件地址。这种方法非常好,因为它可以让您轻松添加其他数据,而不必混淆解析代码的结构。其次,defaultdict(list)对于每个联系人来说都是非常合适的结构。

import re 
from collections import defaultdict 
from more_itertools import peekable 

category_pairs = [ 
    (re.compile('^\+[0-9]+$'), 'phones'), 
    (re.compile('^www\..*?\.[A-Za-z]+$'), 'websites'), 
    (re.compile('^[email protected]+\.[A-Za-z]+$'), 'emails'), 
] 

contacts = [] 
current = defaultdict(list) 
iterator = peekable(a_list) 
entry = next(iterator) 

while iterator.peek(False): 
    for regex, category in category_pairs: 
     while regex.match(entry): 
      current[category].append(entry) 
      if not iterator.peek(False): 
       break 
      entry = next(iterator) 
    contacts.append(current) 
    current = defaultdict(list) 

该代码使得一个假设:即电话号码,网站和电子邮件地址出现的顺序,并将其作为集团等。

+0

整洁!我喜欢使用[more-itertools](https://pypi.python.org/pypi/more-itertools/) – Wasi

1

你可以试试这个:

a_list = [ 
"+99112233", 
"+39383",  
"www.johndoe.com", 
"[email protected]", 
"+9933933", 
"+99883399", 
"www.someother.com", 
"www.tt.com", 
"[email protected]", 
"[email protected]", 
] 


contacts = [] 
contact_dict ={} 

len_of_list = len(a_list) 
for index,contact in enumerate(a_list): 
    if index==0: 
     contact_dict["phones"] = [contact] 
     continue 

    if (a_list[index])[0] == "+": 
     if (a_list[index-1])[0] == "+" : 
      contact_dict["phones"].append(contact) 
     else: 
      contacts.append(contact_dict) 
      contact_dict ={} 
      contact_dict["phones"] = [contact] 

    if contact[0:3]=="www": 
     if (a_list[index-1])[0:3] == "www" : 
      contact_dict["email"] = [] 
      contacts.append(contact_dict) 
      contact_dict ={} 
      contact_dict["website"] = [contact] 
      contact_dict["phone"] =[] 
     else: 
      contact_dict["website"] = [contact] 

    if "@" in contact: 
     if "@" in (a_list[index-1]): 
      contact_dict["email"].append(contact) 
     else: 
      contact_dict["email"] = [contact] 

    if index == len_of_list-1: 
     contacts.append(contact_dict)   

print(contacts) 

产量:

[{ 
    'website': ['www.johndoe.com'], 
    'phones': ['+99112233', '+39383'], 
    'email': ['[email protected]'] 
}, { 
    'website': ['www.someother.com'], 
    'phones': ['+9933933', '+99883399'], 
    'email': [] 
}, { 
    'website': ['www.tt.com'], 
    'phone': [], 
    'email': ['[email protected]', '[email protected]'] 
}] 
相关问题