2010-10-08 25 views
3

提取数据I有一个列表象下面这样:优化过滤逻辑Python的方式/从列表

['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', 
'3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', 
'5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
'7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
'9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS())', 
'22 (UID 3479 FLAGS())', '23 (UID 3480 FLAGS())', '24 (UID 3481 FLAGS())'] 

从这个列表,我想三种不同的列表作为结果。我想要在列表中使用单个迭代的结果。的

  1. 列表中的所有的uid即[3234,3235,3236,3237,3241 ......]的看的UID
  2. 列表即[3234,3235 ...] < - 项目的UID已经\看国旗
  3. 删除的UID
  4. 列表即[3236,3253] < - 项目的UID具有\删除标志
+2

“+ FLAGS”和“-FLAGS”的意义是什么? – PaulMcG 2010-10-08 09:23:06

+0

“FLAGS(看到\\见过”)是什么意思(在入口#1)? – PaulMcG 2010-10-08 09:27:35

+0

你到目前为止有什么? – SilentGhost 2010-10-08 09:50:23

回答

3

做是把最好的东西你将数据映射到dict将UID映射到FLAGS,然后搜索它将很容易。因此,该数据将是这个样子:

{'3254': '', '3304': '', '3236': '\\Deleted', '3237': '-FLAGS \\Seen +FLAGS', '3234': 'seen \\Seen', '3235': '\\Seen', '3430': '\\Seen', '3431': '', '3252': '\\Seen', '3253':'\\Deleted', '3478': '', '3479': '', '3256': '\\Seen', '3481': '', '3480': '', '3318': '\\Seen', '3434': '\\Seen', '3243': '\\Seen', '3242': '\\Seen', '3241': '-FLAGS \\Seen +FLAGS', '3247': '\\Seen', '3245': '\\Seen', '3244': '\\Seen', '3447': '-FLAGS \\Seen +FLAGS'} 

你可以做到这一点using a Regular Expression到列表中的每个条目相匹配。如果我们得到正则表达式返回比赛中的两个组,我们可以轻松构建dict

所以我们最终是这样的:

items = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', '11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', '13 (UID 3254 FLAGS())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS())', '16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', '18 (UID 3431 FLAGS())', '19 (UID 3434 FLAGS (\\Seen))', '20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS())', '22 (UID 3479 FLAGS())', '23 (UID 3480 FLAGS())', '24 (UID 3481 FLAGS())'] 

import re 
pattern = re.compile(r"\d+ \(UID (\d+) FLAGS \(([^)]*)\)\)") 
values = dict(pattern.match(item).groups() for item in items) 

然后我们就可以方便地查询在values的项目得到你想要的东西:

print "All UIDs:",values.keys() 
print "Seen UIDs:",[uid for uid,flags in values.iteritems() if r"\Seen" in flags] 
print "Deleted UIDs:",[uid for uid,flags in values.iteritems() if r"\Deleted" in flags] 
+0

您是不是在迭代项目列表中多次以在您的解决方案中获取“已看见”和“已删除”? – 2010-10-08 09:11:34

+0

@Noufal Ibrahim - 是的。我假设列表不是很长,所以我重视可读性而不是性能。 – 2010-10-08 13:40:42

+0

我完全同意你的方法。提问者要求进行一次迭代。这就是我提出的原因。 – 2010-10-08 14:37:06

1

我不知道关于列表解析因为这些通常将一个列表映射到另一个列表(使用过滤或映射)。我没有看到他们被用来拆分列表。但是,您可以在单次迭代中使用genexp和循环的组合来完成此操作。我已经吹了一下,以便清楚。

import re 
grepper = re.compile(r'[0-9]+ \(UID (?P<uid>[0-9]+) FLAGS (?P<flags>\(.*\))\)') 

t = [..] #your list 

items = (grepper.search(m).groupdict() for m in t) 

all = [] 
seen = [] 
deleted = [] 
for i in items: 
    if "Seen" in i: 
    seen.append(i["uid"]) 
    if "Deleted" in i: 
    deleted.append(i["uid"]) 
    all.append(i["uid"]) 

现在你应该有3个列表。

+0

您正在遍历列表两次:( – slezica 2010-10-08 09:21:30

+0

哪里?[15个字符...] – 2010-10-08 09:59:39

+0

从技术上说,grepper.search然后是我在项目中。 – 2010-10-08 11:41:58

1
all,deleted,seen = [list(filter(None, a)) for a in \ 
    zip(*map(lambda a: (a[2], '\Deleted' in a[-1] and a[2], '\Seen' in a[-1] and a[2]), map(lambda a: a.split(' '), items)))] 

这将更快地使用重新或不重新 - 你需要检查timeit!

+1

哦,男孩,我不确定我想在生产代码中看到它。:) – 2010-10-08 15:43:01

+0

ohhh太多lambda flter地图拉链..... :-) – shahjapan 2010-10-09 04:33:08

0
all=[] 
seen=[] 
deleted=[] 
for item in alist: 
    s=item.split(" ",4) 
    all.append(s[2]) 
    if "seen" in s[-1].lower(): 
     seen.append(s[2]) 
    elif "delete" in s[-1].lower(): 
     deleted.append(s[2]) 
0

我可以想到在一次迭代中做这件事的唯一方法就是生成你要求的三个列表,就是手动迭代。没有我能想出的蟒蛇魔法。

如果您知道关于格式及其生成方式的详细信息,则可以轻松改进此操作。例如,我不知道为什么+ FLAGS和-FLAGS在某些项目中,并且不知道何时会期望括号,所以我不得不使用find()。另外,我也刚刚拆分()将字符串两种,不过话又说回来,我不知道什么旗格式,则意味着,...

def parseList(l): 
    lall = [] 
    lseen = [] 
    ldeleted = [] 

    for item in l: 
     spl = item.split() 

     uid = int(spl[2]) 

     lall.append(uid) 

     for word in spl[4:]: 
      if word.find("\Seen") != -1: 
       lseen.append(uid) 

      elif word.find("\Deleted") != -1: 
       ldeleted.append(uid) 

    return lall, lseen, ldeleted 
2
import re 

data = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', 
'3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', 
'5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
'7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
'9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS())', 
'22 (UID 3479 FLAGS())', '23 (UID 3480 FLAGS())', '24 (UID 3481 FLAGS())'] 

r = re.compile('\d+\s\(UID\s(?P<uid>\d+)\sFLAGS\s\((?P<data>.*)\)\)') 
uid_list = [] 
seen_uid_list = [] 
deleted_uid_list = [] 
for s in data: 
    m = r.match(s) 
    if m: 
     uid_list.append(m.group('uid')) 
     if m.group('data').rfind('Seen') > 0: seen_uid_list.append(m.group('uid')) 
     if m.group('data').rfind('Deleted') > 0: deleted_uid_list.append(m.group('uid')) 

print uid_list 
print seen_uid_list 
print deleted_uid_list 
1

这一个适用于您的数据样本...

uids, seen, deleted = [], [], [] 
for item in myList: 
    uids.append(int(item[7:12])) 
    if 'Se' in item[20:]: seen.append(uids[-1]) 
    elif 'De' in item[20:]: deleted.append(uids[-1])