2014-09-22 35 views

回答

2

CPython中使用this check来决定何时调整:

if (!(so->used > n_used && so->fill*3 >= (so->mask+1)*2)) 

这基本上意味着,当2/3满,容器将调整。

调整大小本身一倍的大集空间量和四倍它小的:

return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4); 

阿明·里戈在PyPy实现其套使用词典的评论所指出的,所以relevant resizing code是:

jit.conditional_call(d.resize_counter <= x * 3, 
        _ll_dict_resize_to, d, num_extra) 

这是相同的策略,因为resize_counter是词典中留下的空白空间。


在此之前,我采取了基准。您可以通过查找非常小的暂停来检测调整大小。对于小尺寸这是有些随意的,所以你必须小心去除噪音。下面是做一个脚本:

from collections import Counter 
import time 

iterations = 100 
internal_iterations = 100000 

def main(): 
    resizes = Counter() 

    for i in range(iterations): 
     print("Completion: [{:3.0%}]\r".format(i/iterations), end="") 

     s = set() 
     maxtime_so_far = 0 
     for j in range(internal_iterations): 
      start = time.time() 
      s.add(j) 
      end = time.time() 

      if end-start > maxtime_so_far: 
       maxtime_so_far = end-start 
       resizes[j] += 1 

    print() 

    format_string = "{0:<6} = 0b{0:<18b} [found %: {1:2.0%}]" 

    for index in sorted(resizes): 
     chance = resizes[index]/iterations 

     if chance >= 0.05: 
      print(format_string.format(index, chance)) 

main() 

,这里是对CPython的输出:

Completion: [99%] 
0  = 0b0     [found %: 100%] 
5  = 0b101    [found %: 83%] 
21  = 0b10101    [found %: 12%] 
85  = 0b1010101   [found %: 94%] 
341 = 0b101010101   [found %: 97%] 
1365 = 0b10101010101  [found %: 100%] 
5461 = 0b1010101010101  [found %: 100%] 
21845 = 0b101010101010101 [found %: 100%] 
87381 = 0b10101010101010101 [found %: 100%] 

你可以看到10101...₂模式,这是你从除以3 2的幂,会得到什么这对象将调整大小。 (之后它会调整大小,因为脚本是0索引的)。

在PyPy3,改变迭代的次数是大于(iterations = 1000; internal_iterations = 100000),我得到

Completion: [100%] 
0  = 0b0     [found %: 78%] 
5  = 0b101    [found %: 6%] 
21  = 0b10101    [found %: 5%] 
341 = 0b101010101   [found %: 24%] 
1365 = 0b10101010101  [found %: 66%] 
5461 = 0b1010101010101  [found %: 100%] 
21845 = 0b101010101010101 [found %: 100%] 
87381 = 0b10101010101010101 [found %: 71%] 

这意味着该战略是PyPy一样。根据迭代次数

Completion: [100%] 
0  = 0b0     [found %: 13%] 
5  = 0b101    [found %: 11%] 
21  = 0b10101    [found %: 22%] 
22  = 0b10110    [found %: 6%] 
23  = 0b10111    [found %: 5%] 
24  = 0b11000    [found %: 5%] 
341 = 0b101010101   [found %: 30%] 
1365 = 0b10101010101  [found %: 66%] 
5461 = 0b1010101010101  [found %: 98%] 

奇怪的是,可能是由于JIT或GC,有时我得到更多的东西一样。我想这是由于围绕该点的迭代次数相对较少,并且这可能并不意味着太多。如果GC收集发生在项目20的附近,则容易造成这种压力。

+1

调整设置从未在'setobject.py'中。相反,它在[字典​​的RPython实现](https://bitbucket.org/pypy/pypy/raw/default/rpython/rtyper/lltypesystem/rdict.py)中。集合只是RPython的“void”值的字典。 – 2014-09-22 09:43:52

+0

太好了,谢谢:)。这使事情更加明显。 – Veedrac 2014-09-22 10:16:30

相关问题