2013-08-17 75 views
1

我正尝试使用GS Python库中提供的代码示例从Google云端存储下载大文件(2.5GB)。这适用于较小的文件(我已经在一些1-2KB文件上测试过)。我在Windows 7使用Python从Google云端存储下载大文件

dest_dir = c:\\downloadfolder 
networkbucket = bucketname 

uri = boto.storage_uri(networkbucket,'gs') 
for obj in uri.get_bucket(): 
    print obj.name 
    name=str(obj.name) 
    local_dst_uri = boto.storage_uri(os.path.join(dest_dir, name),'file') 
    object_contents = StringIO.StringIO() 
    src_uri = boto.storage_uri(networkbucket + '/' + name, 'gs') 
    src_uri.get_key().get_file(object_contents) 
    object_contents.seek(0) 
    local_dst_uri.new_key().set_contents_from_file(object_contents) 
    object_contents.close() 

使用Python 2.7.5,我得到一个内存错误:

Traceback (most recent call last): 
File "C:\folder\GS_Transfer.py", line 52, in <module> 
src_uri.get_key().get_file(object_contents) 
File "C:\gsutil\third_party\boto\boto\gs\key.py", line 165, in get_file 
query_args=query_args) 
File "C:\gsutil\third_party\boto\boto\s3\key.py", line 1455, in _get_file_internal 
for bytes in self: 
File "C:\gsutil\third_party\boto\boto\s3\key.py", line 364, in next 
data = self.resp.read(self.BufferSize) 
File "C:\gsutil\third_party\boto\boto\connection.py", line 414, in read 
return httplib.HTTPResponse.read(self, amt) 
File "C:\Python27\lib\httplib.py", line 567, in read 
s = self.fp.read(amt) 
File "C:\Python27\lib\socket.py", line 400, in read 
buf.write(data) 
MemoryError: out of memory 

我可以下载该文件通过与gsutil.py CP命令行确定。不知道该怎么办来修改这段代码?我一直在试图找到一种方法来下载部分,但不知道如何。

+0

您是内存不足:http://docs.python.org/2/library/exceptions .html –

+2

您正在将2.5GB数据读取到内存中对象中。 'StringIO'不支持*磁盘存储。你已经用完了内存。你为什么不在这里使用文件? –

回答

1

问题是你正在用StringIO将整个对象内容读入内存。您可以使用KeyFile类从这里来代替:

from boto.s3.keyfile import KeyFile 

使用它,而不是StringIO

local_dst_uri = boto.storage_uri(os.path.join(dest_dir, name),'file') 
src_uri = boto.storage_uri(networkbucket + '/' + name, 'gs') 
keyfile = KeyFile(src_uri.get_key()) 
local_dst_uri.new_key().set_contents_from_file(keyfile) 
相关问题