是否有可能增加RAM量的蟒蛇进程正在使用

我正在与64GB的内存在Windows服务器上的分类/特征提取任务，并以某种方式，蟒蛇认为我是运行内存：是否有可能增加RAM量的蟒蛇进程正在使用

[email protected] /cygdrive/c/NaiveBayes 
$ python run_classify_comments.py > tenfoldcrossvalidation.txt 
Traceback (most recent call last): 
    File "run_classify_comments.py", line 70, in <module> 
    run_classify_comments() 
    File "run_classify_comments.py", line 51, in run_classify_comments 
    NWORDS = get_all_words("./data/HUGETEXTFILE.txt") 
    File "run_classify_comments.py", line 16, in get_all_words 
    def get_all_words(path): return words(file(path).read()) 
    File "run_classify_comments.py", line 15, in words 
    def words(text): return re.findall('[a-z]+', text.lower()) 
    File "C:\Program Files (x86)\Python26\lib\re.py", line 175, in findall 
    return _compile(pattern, flags).findall(string) 
MemoryError

因此，重新模块崩溃与64 GB的RAM ......我不这么认为...... 为什么会发生这种情况，我如何配置python使用我的机器上所有可用的内存？

来源

2011-06-14 josephmisiti

是你的Windows 64位版本吗？你的Python版本是64位吗？你是否检查过程实际使用了多少内存？ – 2011-06-14 20:16:45

程序文件（x86）建议Windows是64位，但python不是 – unbeli 2011-06-14 20:19:47

unbeli是正确的 – josephmisiti 2011-06-15 20:59:31

只需重写程序即可一次读取一行文本文件。

def get_all_words(path): 
    return sum((words(line) for line in open(path))

注意括号，这是懒惰，由SUM函数需求将评估使用发电机：这是很容易只是改变get_all_words(path)来完成。

来源

2011-06-14 20:27:53

它看起来好像问题在于使用re.findall（）将整个文本作为单词列表读入内存。你以这种方式阅读超过64GB的文字吗？根据您的NaiveBayes算法的实现方式，您可能会更好地逐步构建您的频率字典，使得只有字典被保存在内存中（而不是整个文本）。有关您的实施的更多信息可能有助于更直接地回答您的问题。

来源

2011-06-15 14:45:45 dmh

我实际上通过在生成功能的循环中调用“del”来修复它（在交叉验证期间） – josephmisiti 2011-06-15 20:58:57

是否有可能增加RAM量的蟒蛇进程正在使用

回答

相关问题