0

有没有人成功地将大型数据存储类型备份到云存储?这是一个实验性功能,所以在谷歌上的支持非常粗略。将大型数据存储种类(1TB +)备份到谷歌云存储

我们希望备份到云存储(最终以从云存储摄入大查询为目标)的问题类型目前的容量为1.2TB。

- description: BackUp 
    url: /_ah/datastore_admin/backup.create?name=OurApp&filesystem=gs&gs_bucket_name=OurBucket&queue=backup&kind=LargeKind 
    schedule: every day 00:00 
    timezone: America/Regina 
    target: ah-builtin-python-bundle 

我们一直运行到以下错误消息:

Traceback (most recent call last): 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 182, in handle 
    input_reader, shard_state, tstate, quota_consumer, ctx) 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 263, in process_inputs 
    entity, input_reader, ctx, transient_shard_state): 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 318, in process_data 
    output_writer.write(output, ctx) 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 711, in write 
    ctx.get_pool("file_pool").append(self._filename, str(data)) 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 266, in append 
    self.flush() 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 288, in flush 
    f.write(data) 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 297, in __exit__ 
    self.close() 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 291, in close 
    self._make_rpc_call_with_retry('Close', request, response) 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 427, in _make_rpc_call_with_retry 
    _make_call(method, request, response) 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 250, in _make_call 
    rpc.check_success() 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 570, in check_success 
    self.__rpc.CheckSuccess() 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess 
    raise self.exception 
DeadlineExceededError: The API call file.Close() took too long to respond and was cancelled. 

回答

1

似乎有要的写操作28秒内从GAE的无证时限到云存储。 这也适用于在后端进行的写入操作,因此您可以从云存储中的gae 创建的最大文件大小取决于您的吞吐量。我们的解决方案是分割文件;每当作家任务 接近20秒时,它会关闭当前文件并打开一个新文件,然后我们在本地加入这些文件。对我们来说,这导致大约500KB(压缩)的文件,所以这可能不是您可以接受的解决方案...