2012-10-23 44 views
0

我不得不名单,其中有他们一些共同的要素:算法来比较两个列表,并得到相同的元素在python

p = [('link1/d/b/c', 'target1/d/b/c'), ('link2/a/g/c', 'target2/a/g/c'), ..., ('linkn/b/b/f', 'targetn/b/b/f')] 

q = [['target1/d/b/c', 'target1', 123, 334], ['targetn/b/b/f', 'targetn', 23, 64], ... ,['targetx/f/f/f', 'targetx', 999, 888]] 

我试着对它们进行比较,找到共同的元素,然后做一些工作与结果:

do_job('target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c') 

现在即时通讯使用简单,很慢alghortihm:

for item in p: 
    link = item[0] 
    target = item[1] 
    for item2 in q: 
     target2 = item2[0] 
     if target2 == target: 
      do_some_job(...) 

我吼声,那我需要比较这两个列表,并获得创建一个列表将包含所有的元素,如:

pq = [['target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c'], ..., ['targetn/b/b/f', 'targetn', 23, 64, 'linkn/b/b/f']] 

,然后调用do_some_job(pq)与其说这是每次当我发现同一元素

的如何获得它?

问候

+0

那没有Python列表。 link1/d/b/c应该是什么意思? – 2012-10-23 10:15:00

+0

对“target1/d/b/c''等字符串使用引号。 –

回答

5

使用chain()拼合两个列表,然后用set()intersection()得到共同的元素。

In [78]: from itertools import chain 

In [79]: p 
Out[79]: 
[('link1/d/b/c', 'target1/d/b/c'), 
('link2/a/g/c', 'target2/a/g/c'), 
('linkn/b/b/f', 'targetn/b/b/f')] 

In [80]: q 
Out[80]: 
[['target1/d/b/c', 'target1', 123, 334], 
['targetn/b/b/f', 'targetn', 23, 64], 
['targetx/f/f/f', 'targetx', 999, 888]] 

In [81]: set(chain(*p)).intersection(set(chain(*q))) 
Out[81]: set(['target1/d/b/c', 'targetn/b/b/f']) 

或使用列表理解与短路:

In [86]: [j for i in p for j in i if j in (z for y in q for z in y)] 
Out[86]: ['target1/d/b/c', 'targetn/b/b/f'] 

或使用any()

In [87]: [j for i in p for j in i if any (j==z for y in q for z in y)] 
Out[87]: ['target1/d/b/c', 'targetn/b/b/f'] 

timeit

In [93]: %timeit set(chain(*p)).intersection(set(chain(*q))) 
100000 loops, best of 3: 7.38 us per loop      ## winner 

In [94]: %timeit [j for i in p for j in i if j in (z for y in q for z in y)] 
10000 loops, best of 3: 24.9 us per loop 

In [95]: %timeit [j for i in p for j in i if any (j==z for y in q for z in y)] 
10000 loops, best of 3: 27.4 us per loop 

In [97]: %timeit [x for x in chain(*p) if x in chain(*q)] 
10000 loops, best of 3: 12.6 us per loop 
1

你或许应该使用的字典:

target_to_link = dict((v,k) for (k,v) in p) 
for item in q: 
    args = item + [target_to_link[item[0]] 
    do_some_job(*args) 

target_to_link词典让你从你的目标的相应链接。只要确保你没有几个目标共享相同的链接...

for循环,我们刚刚创建的,结合参数args的临时列表您item(例如,['target1/d/b/c', 'target1', 123, 334])与相应的链接,我们使用function(*args)语法...


如果您需要在p循环相反,你可以构建一个字典一样

target_to_args = dict((k[0],k[1:]) for k in q) 

然后像做

for (link, target) in p: 
    args = [target] + target_to_args[target] + [link] 
    do_some_job(*args) 
0

chain列表理解应该工作:

[x for x in chain(*p) if x in chain(*q)] 
+0

如果你指的是itertools.chain,它会返回一个迭代器,因此不确定“in”会起作用吗?无论如何,ashwini解决方案的基于集合的方法可能会更快 – iruvar

+0

@cravoori'in'可迭代地工作正常,它与'any()'类似,并且确实存在短路。见http://pastebin.com/scfnXTyY –

+0

@AshwiniChaudhary,它在这个例子中起作用,因为每次“如果链中的x”被评估,一个新的迭代器被创建,这显然是很昂贵的。迭代器在一次遍历后耗尽,这意味着迭代器不适合包含检查。这里是一个例子,http://pastebin.com/209fFHUn – iruvar

相关问题