Python2字典中的非单调内存消耗

有人可以解释CPython 2.7中字典的非单调内存使用情况吗？Python2字典中的非单调内存消耗

>>> import sys 
>>> sys.getsizeof({}) 
280 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}) 
280 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6}) 
1048 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7}) 
1048 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'e 
ight': 8}) 
664 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'e 
ight': 8, 'nine': 9}) 
664

Python3是合理的在这里，它打印的{'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7}大小为480

我想这在Ubuntu 15.10和OS X 10.11。

来源

2016-04-30 Dmytro Sirenko

https://github.com/python/cpython/blob/2.7/Objects /dictobject.c –

TLDR：6和7条目字典文字预设哈希表严重，然后在调整大小上四倍的大小。

当CPython的2.7评估一个字典文字，它开始在条目填充之前，它使用来创建字典操作码是BUILD_MAP。这需要一个参数，对于字典有多少条目包含一个提示，which it uses to presize the dict：

TARGET(BUILD_MAP) 
    { 
     x = _PyDict_NewPresized((Py_ssize_t)oparg); 
     PUSH(x); 
     if (x != NULL) DISPATCH(); 
     break; 
    }

这是为了尽量减少时间创建过程中字典的大小调整的数量，但由于他们没有考虑的加载因子，它并不完全消除调整大小。

如source code comments所示，_PyDict_NewPresized旨在“创建预先确定大小以容纳估计数量的元素的新字典”。创建的字典中的哈希表的确切大小受到许多实现细节的影响，例如最小大小（#define PyDict_MINSIZE 8）以及大小为2的幂的要求（以避免在实现中需要划分）。

对于dict文字多达7个条目，_PyDict_NewPresized初始化8项哈希表;对于8个条目，它初始化一个16条目的散列表，因为它使用的调整大小例程总是选择比参数大的容量。

Dicts resize on insertion when they become at least 2/3 full.对于6-和7-条目字典的文字，该字典内开始了与8个条目，所以在第六插入发生调整大小。该字典内足够小，调整大小四倍的哈希表的大小：

return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);

mp->ma_used是在这一点上在哈希表中，6中使用的条目的数量。 6小于50000，所以我们称dictresize(mp, 4 * 6)，它将散列表的大小调整为32个条目，其中2的最小功率大于24。

相比之下，对于8项词典文字，散列表从16个条目开始。该字典在创建过程中不会变为2/3，因此最初的16条散列表在字典创建后仍然存在，并且结果字典小于6和7条字典字面值。

的Python 3使用different growth policy，其他字典实现变化中，这就是为什么你在Python看到不同的结果3.

来源

2017-02-09 00:22:53 user2357112

嗯，我已经尝试了一下，让我们来看看：

dct = {'four': 3, 'three': 2, 'two': 1, 'one': 0} 
print(sys.getsizeof(dct))        # = 272 
print(sys.getsizeof(dict(dct)))      # = 272 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 272 

dct = {'four': 3, 'three': 2, 'five': 4, 'two': 1, 'one': 0} 
print(sys.getsizeof(dct))        # = 272 
print(sys.getsizeof(dict(dct)))      # = 272 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 272 

dct = {'six': 5, 'three': 2, 'two': 1, 'four': 3, 'five': 4, 'one': 0} 
print(sys.getsizeof(dct))        # = 1040 
print(sys.getsizeof(dict(dct)))      # = 656 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040 

dct = {'seven': 6, 'six': 5, 'three': 2, 'two': 1, 'four': 3, 'five': 4, 'one': 0} 
print(sys.getsizeof(dct))        # = 1040 
print(sys.getsizeof(dict(dct)))      # = 656 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040 

dct = {'seven': 6, 'six': 5, 'three': 2, 'two': 1, 'four': 3, 'five': 4, 'eight': 7, 'one': 0} 
print(sys.getsizeof(dct))        # = 656 
print(sys.getsizeof(dict(dct)))      # = 1040 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040

我不知道什么样的优化是在这里发生的，但我认为这是因为这些结构使用不同的“最佳实践”。我的意思是何时为散列表分配多少内存。例如，如果你有十个一个或多个元素，你得到另一个奇怪的差异：

dct = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10:10, 11:11} 
print(sys.getsizeof(dct))        # = 1808 
print(sys.getsizeof(dict(dct)))      # = 1040 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040

所以这可能只是某种内存消耗“优化”创建以不同的方式字典的时候，为什么会存在这种不单调异常对于使用6或7个元素时的字面语法：我不知道。也许一些内存优化出了问题，它是一个错误，它分配了太多的内存？我还没有阅读源代码（还）。

来源

2016-04-30 13:57:52 MSeifert

Python2字典中的非单调内存消耗

回答

相关问题