如何让Python使所有相同的字符串使用相同的内存？

可能重复：
What does python intern do, and when should it be used?如何让Python使所有相同的字符串使用相同的内存？

我与Python中的程序，必须以百万计的字符串对象的数组上相关工作。我发现如果它们全都来自同一个引用的字符串，则每个附加的“字符串”仅仅是对第一个主字符串的引用。但是，如果从文件中读取字符串，并且字符串全部相等，则每个字符串仍需要新的内存分配。

也就是说，大约需要存储的14meg：

a = ["foo" for a in range(0,1000000)]

虽然这需要比存储的65meg更多：

现在我可以让内存取空间少得多与此：

s = {"f11":"f11"} 
a = [s["foo".replace("o","1")] for a in range(0,1000000)]

但这似乎很愚蠢。有没有更简单的方法来做到这一点？

来源

2012-08-05 vy32

@Maulwurfn，只是因为答案是一样的并不意味着问题是一样的。 – 2012-08-05 17:16:48

为什么不先储存'replace'操作的值？ – JBernardo 2012-08-05 17:17:05

你怎么测量列表的大小？如果我使用'sys.getsizeof（[“foo”作为范围（0,1000000）]）'我得到与'sys.getsizeof（[“foo”.replace（“o”，“1”）相同的大小）对于范围（0,1000000）]）'' - 至少在Python 3.2中 – 2012-08-05 18:54:32

只是做一个intern()，它告诉Python来存储和从存储器取串：

a = [intern("foo".replace("o","1")) for a in range(0,1000000)]

这也导致周围18MB，相同于第一示例。

另请注意下面的注释，如果您使用python3。 Thx @Abe Karplus

来源

2012-08-05 17:31:57 erikbwork

请注意，在Python 3中，'intern'已被重命名为'sys.intern'。 – 2012-08-05 17:32:35

+1我不知道'intern（）'。 – 2012-08-05 17:54:42

非常感谢。谢谢。我不知道实习生。是的，我使用Python3，所以我需要使用sys.intern（）。 – vy32 2012-08-05 20:50:07

你可以尝试这样的事：

strs=["this is string1","this is string2","this is string1","this is string2", 
     "this is string3","this is string4","this is string5","this is string1", 
     "this is string5"] 
new_strs=[] 
for x in strs: 
    if x in new_strs: 
     new_strs.append(new_strs[new_strs.index(x)]) #find the index of the string 
                #and instead of appending the 
               #string itself, append it's reference. 
    else: 
     new_strs.append(x) 

print [id(y) for y in new_strs]

字符串是相同，现在将有相同的id()

输出：

[18632400, 18632160, 18632400, 18632160, 18651400, 18651440, 18651360, 18632400, 18651360]

来源

2012-08-05 17:21:20

好主意。不幸的是，它是一个O（n ** 2）算法，随着列表变长，它会变得非常慢。 – 2012-08-05 17:23:00

-1

保持所看到的字符串的字典应该工作

new_strs = [] 
str_record = {} 
for x in strs: 
    if x not in str_record: 
     str_record[x] = x 
    new_strs.append(str_record[x])

（未测试）

来源

2012-08-05 17:29:26

如何让Python使所有相同的字符串使用相同的内存？

回答

相关问题