2010-09-13 36 views
56
a = [1,2,3,4,5] 
b = [1,3,5,6] 
c = a and b 
print c 

实际输出:[1,3,5,6] 预期输出:[1,3,5]如何查找列表交集?

我们如何才能实现对两个名单布尔与运算(单路口)?

+3

这里的问题是,'和B'就像下面的[从文档语句](http://docs.python.org/reference/expressions.html#boolean-operations)提到的那样:“_The表达'X和y'首先评估'x';如果'x'是假的,它的值被返回;否则,'y'进行评价,将所得值是returned._” – Tadeck 2011-12-06 09:52:27

回答

123

如果顺序并不重要,你不需要担心重复,那么你可以使用交集:

>>> a = [1,2,3,4,5] 
>>> b = [1,3,5,6] 
>>> list(set(a) & set(b)) 
[1, 3, 5] 
0

如果布尔AND,你的意思项目同时出现在列表中,例如那么你应该看看Python的setfrozenset类型。

10

让一组出中较大的:

_auxset = set(a) 

然后,

c = [x for x in b if x in _auxset] 

会做你想要什么(保留b的排序,而不是a的 - 不能必然保留)并做它快速。 (使用if x in a作为列表理解中的条件也是可行的,并且避免需要构建_auxset,但不幸的是,对于长度很长的列表来说,它会慢很多)。

如果你想要的结果进行排序,而不是保留任一列表的排序,甚至更合适的方法可能是:

c = sorted(set(a).intersection(b)) 
27

如果转换两个列表的较大成一组,你可以得到该集合的交集与使用intersection()任何可迭代:

a = [1,2,3,4,5] 
b = [1,3,5,6] 
set(a).intersection(b) 
+1

这是比'列表中的任何不同(组(一)及设定(b)中)' – user1767754 2017-07-03 19:21:43

5

一个= [1,2,3,4,5] b = [1,3,5,6] C =列表(集( a)。交点(set(b)))

应该像梦一样工作。而且,如果可以的话,使用集合而不是列表来避免所有类型的变化!

7

使用列表解析对我来说是非常明显的。不确定的表现,但至少事情留在名单。

[x for x in a if x in b]

或 “是在A中,如果X值是B中的所有x值”。

+0

这似乎这保持顺序的最Python的。不知道为什么这不是upvoted更高! thx为伟大的解决方案! – 2018-03-01 06:15:28

5

下面是一些Python 2/Python 3代码,可以为基于列表和基于集合的方法生成计时信息,以找到两个列表的交集。

纯粹的列表理解算法是O(n^2),因为列表上的in是线性搜索。基于集合的算法是O(n),因为集合搜索是O(1),并且集合创建是O(n)(并且将集合转换成列表也是O(n))。因此,对于足够大的集合,基于集合的算法速度更快,但对于小集合n,创建集合的开销使得它们比纯列表comp算法慢。

#!/usr/bin/env python 

''' Time list- vs set-based list intersection 
    See http://stackoverflow.com/q/3697432/4014959 
    Written by PM 2Ring 2015.10.16 
''' 

from __future__ import print_function, division 
from timeit import Timer 

setup = 'from __main__ import a, b' 
cmd_lista = '[u for u in a if u in b]' 
cmd_listb = '[u for u in b if u in a]' 

cmd_lcsa = 'sa=set(a);[u for u in b if u in sa]' 

cmd_seta = 'list(set(a).intersection(b))' 
cmd_setb = 'list(set(b).intersection(a))' 

reps = 3 
loops = 50000 

def do_timing(heading, cmd, setup): 
    t = Timer(cmd, setup) 
    r = t.repeat(reps, loops) 
    r.sort() 
    print(heading, r) 
    return r[0] 

m = 10 
nums = list(range(6 * m)) 

for n in range(1, m + 1): 
    a = nums[:6*n:2] 
    b = nums[:6*n:3] 
    print('\nn =', n, len(a), len(b)) 
    #print('\nn = %d\n%s %d\n%s %d' % (n, a, len(a), b, len(b))) 
    la = do_timing('lista', cmd_lista, setup) 
    lb = do_timing('listb', cmd_listb, setup) 
    lc = do_timing('lcsa ', cmd_lcsa, setup) 
    sa = do_timing('seta ', cmd_seta, setup) 
    sb = do_timing('setb ', cmd_setb, setup) 
    print(la/sa, lb/sa, lc/sa, la/sb, lb/sb, lc/sb) 

输出

n = 1 3 2 
lista [0.082171916961669922, 0.082588911056518555, 0.0898590087890625] 
listb [0.069530963897705078, 0.070394992828369141, 0.075379848480224609] 
lcsa [0.11858987808227539, 0.1188349723815918, 0.12825107574462891] 
seta [0.26900982856750488, 0.26902294158935547, 0.27298116683959961] 
setb [0.27218389511108398, 0.27459001541137695, 0.34307217597961426] 
0.305460649521 0.258469975867 0.440838458259 0.301898526833 0.255455833892 0.435697630214 

n = 2 6 4 
lista [0.15915989875793457, 0.16000485420227051, 0.16551494598388672] 
listb [0.13000702857971191, 0.13060092926025391, 0.13543915748596191] 
lcsa [0.18650484085083008, 0.18742108345031738, 0.19513416290283203] 
seta [0.33592700958251953, 0.34001994132995605, 0.34146714210510254] 
setb [0.29436492919921875, 0.2953648567199707, 0.30039691925048828] 
0.473793098554 0.387009751735 0.555194537893 0.540689066428 0.441652573672 0.633583767462 

n = 3 9 6 
lista [0.27657914161682129, 0.28098297119140625, 0.28311991691589355] 
listb [0.21585917472839355, 0.21679902076721191, 0.22272896766662598] 
lcsa [0.22559309005737305, 0.2271728515625, 0.2323150634765625] 
seta [0.36382699012756348, 0.36453008651733398, 0.36750602722167969] 
setb [0.34979605674743652, 0.35533690452575684, 0.36164689064025879] 
0.760194128313 0.59330170819 0.62005595016 0.790686848184 0.61710008036 0.644927481902 

n = 4 12 8 
lista [0.39616990089416504, 0.39746403694152832, 0.41129183769226074] 
listb [0.33485794067382812, 0.33914685249328613, 0.37850618362426758] 
lcsa [0.27405810356140137, 0.2745978832244873, 0.28249192237854004] 
seta [0.39211201667785645, 0.39234519004821777, 0.39317893981933594] 
setb [0.36988520622253418, 0.37011313438415527, 0.37571001052856445] 
1.01034878821 0.85398540833 0.698928091731 1.07106176249 0.905302334456 0.740927452493 

n = 5 15 10 
lista [0.56792402267456055, 0.57422614097595215, 0.57740211486816406] 
listb [0.47309303283691406, 0.47619009017944336, 0.47628307342529297] 
lcsa [0.32805585861206055, 0.32813096046447754, 0.3349759578704834] 
seta [0.40036201477050781, 0.40322518348693848, 0.40548801422119141] 
setb [0.39103078842163086, 0.39722800254821777, 0.43811702728271484] 
1.41852623806 1.18166313332 0.819398061028 1.45237674242 1.20986133789 0.838951479847 

n = 6 18 12 
lista [0.77897095680236816, 0.78187918663024902, 0.78467702865600586] 
listb [0.629547119140625, 0.701942443848, 0.63321495056152344] 
lcsa [0.36563992500305176, 0.36638498306274414, 0.38175487518310547] 
seta [0.46695613861083984, 0.46992206573486328, 0.47583580017089844] 
setb [0.47616910934448242, 0.47661614418029785, 0.4850609302520752] 
1.66818870637 1.34819326075 0.783028414812 1.63591241329 1.32210827369 0.767878297495 

n = 7 21 14 
lista [0.9703209400177002, 0.9734041690826416, 1.0182771682739258] 
listb [0.82394003868103027, 0.82625699043273926, 0.82796716690063477] 
lcsa [0.40975093841552734, 0.41210508346557617, 0.42286920547485352] 
seta [0.5086359977722168, 0.50968098640441895, 0.51014018058776855] 
setb [0.48688101768493652, 0.4879908561706543, 0.49204087257385254] 
1.90769222837 1.61990115188 0.805587768483 1.99293236904 1.69228211566 0.841583309951 

n = 8 24 16 
lista [1.204819917678833, 1.2206029891967773, 1.258256196975708] 
listb [1.014998197555542, 1.0206191539764404, 1.0343101024627686] 
lcsa [0.50966787338256836, 0.51018595695495605, 0.51319599151611328] 
seta [0.50310111045837402, 0.50556015968322754, 0.51335406303405762] 
setb [0.51472997665405273, 0.51948785781860352, 0.52113485336303711] 
2.39478683834 2.01748351664 1.01305257092 2.34068341135 1.97190418975 0.990165516871 

n = 9 27 18 
lista [1.511646032333374, 1.5133969783782959, 1.5639569759368896] 
listb [1.2461750507354736, 1.254518985748291, 1.2613379955291748] 
lcsa [0.5565330982208252, 0.56119203567504883, 0.56451296806335449] 
seta [0.5966339111328125, 0.60275578498840332, 0.64791703224182129] 
setb [0.54694414138793945, 0.5508568286895752, 0.55375313758850098] 
2.53362406013 2.08867620074 0.932788243907 2.76380331728 2.27843203069 1.01753187594 

n = 10 30 20 
lista [1.7777848243713379, 2.1453688144683838, 2.4085969924926758] 
listb [1.5070111751556396, 1.5202279090881348, 1.5779800415039062] 
lcsa [0.5954139232635498, 0.59703707695007324, 0.60746097564697266] 
seta [0.61563014984130859, 0.62125110626220703, 0.62354087829589844] 
setb [0.56723213195800781, 0.57257509231567383, 0.57460403442382812] 
2.88774814689 2.44791645689 0.967161734066 3.13413984189 2.6567803378 1.04968299523 

使用2GHz的单核机的Linux上的一个Debian风味运行的Python 2.6.6 RAM 2GB(与Firefox在后台运行)中产生。

这些数字只是粗略的指导,因为各种算法的实际速度是由是在两个源列表元素的比例不同的影响。可使用filterlambda操作者来实现

-2

的功能的方式。

list1 = [1,2,3,4,5,6] 

list2 = [2,4,6,9,10] 

>>> filter(lambda x:x in list1, list2) 

[2, 4, 6] 

编辑:过滤掉存在于两个列表1和表X,差集,也可以使用实现:

>>> filter(lambda x:x not in list1, list2) 
[9,10] 
+0

请使用[编辑]链接来解释这个代码是如何工作的,不只是给的代码,作为解释更容易帮助未来的读者。另见[回答]。 [源(http://stackoverflow.com/users/5244995) – 2017-03-29 14:17:40