2016-08-18 89 views
2

我试图做Python列表过滤性能

In [21]: l1 = range(1,1000000) 

In [22]: l2 = range(100,90000) 

In [23]: l1.append(101) 

In [24]: print(set([x for x in l1 if l1.count(x) - l2.count(x) == 1])) 

在我的Python壳这需要年龄。一般来说,我的目标是从第二个减去一个列表,同时照顾重复。

e.g

[1,2,2,3] - [2,3] = [1,2] 

我会为任何提示如何得到这个最大500毫秒做了常规的单核设备上非常高兴。

+0

你关心结果列表中的元素排序吗? – soon

+0

检查:http://stackoverflow.com/questions/4211209/remove-all-the-elements-that-occur-in-one-list-from-another – pathoren

+0

你是否只想保留一次显示的元素l1比l2?如果是这样,那么在l1中为每个事件保留这个元素? – Moberg

回答

0

我了解到,合并模式是速度更快(读http://openbookproject.net/thinkcs/python/english3e/list_algorithms.html#alice-in-wonderland-again):

6 def substract_list(origin, substract): 
    7  """ 
    8  example: db tells us that user bought shares 1, 2, 2, 3, 4 and also sold 
    9  1, 2, 2. we need to know that he now owns 3, 4 for 10m item 
10 
11  learned from here: 
12  http://openbookproject.net/thinkcs/python/english3e/list_algorithms.html#alice-in-wonderland-again 
13 
14  lists need to be sorted and can contain duplicates 
15  """ 
16  result = [] 
17  xi = 0 # @origin 
18  yi = 0 # @subscract 
19 
20  len_substract = len(substract) 
21  len_origin = len(origin) 
22 
23  while True: 
24   # reached end of substract, append rest of origin, return 
25   if yi >= len_substract: 
26    result.extend(origin[xi:]) 
27    return result 
28 
29   # reached end of origin, return 
30   if xi >= len_origin: 
31    return result 
32 
33   # step throught the values 
34   if origin[xi] == substract[yi]: 
35    # equal values, next one pls 
36    yi += 1 
37    xi += 1 
38   elif origin[xi] > substract[yi]: 
39    yi += 1 
40   else: 
41    result.append(origin[xi]) 
42    xi += 1 

计数器在我的机器上做到了:3-4s,合并模式:0.3s

5

非保序使用collections.Counter

from collections import Counter 

a = Counter([1, 2, 2, 3]) 
b = Counter([2, 3]) 
res = list(a - b) 
# [1, 2] 

此操作,因为一个Counter-方法移除从输出其中b存在的计数大于在a计数等于或大于任何元件。

订购使用OrderedCounter保持,然后手动生成列表,例如:

from collections import Counter, OrderedDict 

class OrderedCounter(Counter, OrderedDict): 
    pass 

a = OrderedCounter([3, 2, 2, 1]) 
b = Counter([2, 3]) 
res = [k for k, v in a.items() if v - b[k] > 0] 
# [2, 1] 

最后,如果原来的范围包含非唯一值,以及你想要的元素重复的次数是减法,那么后遗留下来的:

from collections import Counter, OrderedDict 

class OrderedCounter(Counter, OrderedDict): 
    pass 

a = OrderedCounter([3, 3, 2, 2, 2, 1]) 
b = Counter([2, 3]) 
res = [k for k, v in a.items() for _ in range(v - b[k])] 
# [3, 2, 2, 1] 
+0

您击败了我:) – DeepSpace

+0

我不关心订单。所以我验证了你的第一个方法,并在2015年macbook pro上使用git 1.6s渲染时间。 thx为快速和完美的帮助! – patroqueeet

0

我想我会用Counter,然后将其转换减去两个计数器来排序列表的结果:

from collections import Counter 

l1 = [1, 2, 2, 3] 
l2 = [2, 3] 

counter_l1 = Counter(l1) 
counter_l2 = Counter(l2) 

res = counter_l1 - counter_l2 

print(sorted(res)) 
>> [1, 2] 
0

第一计数的元素:

from collections import defaultdict 

count1 = defaultdict(int) 
count2 = defaultdict(int) 

for x in l1: 
    count1[x] += 1 
for x in l2: 
    count2[x] += 1 

print([x for x, count in count1.iteritems() if count - count2[x] == 1]) 

(不要忘记l1转换为一组的最后一行)

上面的代码需要625ms我的机器上。 (不打印结果到stdout)

+0

请注意:元素可能会在l1中出现多次。或者,我们可以迭代count1键。其实,这是一个更好的主意,我会编辑我的答案。 –

+0

当你拥有'count1'和'count2'后,你不再需要担心原来的列表...... –

+0

是的,这是正确的。原始列表仅用于维护订单。 –

0

执行时间:

import time 
from functools import wraps 
from collections import defaultdict 
from collections import Counter, OrderedDict 
class OrderedCounter(Counter, OrderedDict):pass 

def tefn(fn): 
    t = time.time() 
    set(fn()) 
    return t - time.time() 

def efn(fn): 
    t = time.time() 
    fn() 
    return t - time.time() 

def runTime(tp=0, count=10): 
    def dec(fn): 
     @wraps(fn) 
     def wrap(): 
      print(fn) 
      res = tefn if tp else efn 
      return sum(res(fn) for _ in range(count))/count 
     return wrap 
    return dec 


@runTime(tp=1) 
def fnList(): 
    return [x for x in l1 if l1.count(x) - l2.count(x) == 1] 
@runTime(tp=1) 
def fnIter(): 
    return iter(x for x in l1 if l1.count(x) - l2.count(x) == 1) 
@runTime(tp=1) 
def fnYield(): 
    for x in l1: 
     if l1.count(x) - l2.count(x) == 1: 
      yield x 
@runTime() 
def fnCounter(): 
    a = Counter(l1) 
    b = Counter(l2) 
    return list(a - b) 
@runTime() 
def fnOrderedCounter(): 
    a = OrderedCounter(l1) 
    b = Counter(l2) 
    return [k for k, v in a.items() if v - b[k] > 0] 
@runTime() 
def fnDefaultdict(): 
    count1 = defaultdict(int) 
    count2 = defaultdict(int) 
    for x in l1: count1[x] += 1 
    for x in l2: count2[x] += 1 
    return [x for x, count in count1.items() if count - count2[x] == 1] 


if __name__ == '__main__': 
    l1 = range(1, 1000000) 
    l2 = range(100, 90000) 
    g = globals() 

    result = list((fn.__name__, fn()) for fn in (g[f] for f in g if f.startswith('fn'))) 
    result.sort(key=lambda r: r[1], reverse=True) 
    for e, r in enumerate(result): 
     print(e, r) 

OUT:

<function fnList at 0x02F17540> 
<function fnIter at 0x034C8DF8> 
<function fnCounter at 0x034C8F18> 
<function fnDefaultdict at 0x034CB078> 
<function fnYield at 0x034C8E88> 
<function fnOrderedCounter at 0x034C8FA8> 
0 ('fnYield', -0.8542306900024415) 
1 ('fnList', -0.8605266094207764) 
2 ('fnIter', -0.8655695915222168) 
3 ('fnDefaultdict', -1.054802918434143) 
4 ('fnCounter', -1.3413111925125123) 
5 ('fnOrderedCounter', -5.433168196678162) 
+0

亲爱的,我不确定你的代码。介意添加一些词语来解释你的方法和结果? – patroqueeet