2013-12-22 39 views
4

我正在使用熊猫做一个环形缓冲区,但内存使用量不断增加。我究竟做错了什么?用熊猫创建缓冲区时发生内存泄漏?

下面是代码(编辑了一点点距离问题的第一篇文章):

import pandas as pd 
import numpy as np 
import resource 


tempdata = np.zeros((10000,3)) 
tdf = pd.DataFrame(data=tempdata, columns = ['a', 'b', 'c']) 

i = 0 
while True: 
    i += 1 
    littledf = pd.DataFrame(np.random.rand(1000, 3), columns = ['a', 'b', 'c']) 
    tdf = pd.concat([tdf[1000:], littledf], ignore_index = True) 
    del littledf 
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
    if i% 1000 == 0: 
     print 'total memory:%d kb' % (int(currentmemory)/1000) 

这就是我得到:

total memory:37945 kb 
total memory:38137 kb 
total memory:38137 kb 
total memory:38768 kb 
total memory:38768 kb 
total memory:38776 kb 
total memory:38834 kb 
total memory:38838 kb 
total memory:38838 kb 
total memory:38850 kb 
total memory:38854 kb 
total memory:38871 kb 
total memory:38871 kb 
total memory:38973 kb 
total memory:38977 kb 
total memory:38989 kb 
total memory:38989 kb 
total memory:38989 kb 
total memory:39399 kb 
total memory:39497 kb 
total memory:39587 kb 
total memory:39587 kb 
total memory:39591 kb 
total memory:39604 kb 
total memory:39604 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39608 kb 
total memory:39612 kb 

不知道它是否与此有关:

https://github.com/pydata/pandas/issues/2659

在带有蟒蛇Python的MacBook Air上进行测试

+0

奇怪的是,我复制并粘贴此代码并且没有泄漏。 0.12和0.13rc。 –

+0

我添加了我得到的内容(并稍微更改了一些代码)。你有相同还是不同? – Fra

+0

我得到“总内存:59 kb”一路下降。也许操作系统/设置,可能会添加更多的细节:s。虽然可以更好地作为sep github问题。你有没有尝试像其他问题一样添加gc.collect? –

回答

0

而不是使用concat,为什么不更新DataFrame到位? i % 10将决定您为每个更新写入哪个1000行的插槽。

i = 0 
while True: 
    i += 1 
    tdf.iloc[1000*(i % 10):1000+1000*(i % 10)] = np.random.rand(1000, 3) 
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
    if i% 1000 == 0: 
     print 'total memory:%d kb' % (int(currentmemory)/1000)