在tgz文件上运行python subprocess.call来解压和流输出

我正在使用子进程调用在命令行中解压文件，我需要使用该调用的输出流入临时文件，以便我可以用tgz文件读取“+ CONTENTS”文件夹的内容。在tgz文件上运行python subprocess.call来解压和流输出

我的失败输出是：

./streamContents.py RSH：FTP：没有与主机名焦油（孩子）相关地址：ftp://myftpserver.com/pkgsrc/doxygen_pkgs/test。 TGZ：无法打开：输入/输出错误焦油（子）：错误是不可恢复的：现在退出

的gzip：标准输入：返回儿童状况2 焦油：文件焦油意外结束错误退出之前的延迟错误回溯（最近最后一次通话）：文件 “./streamContents.py”，第29行，在流= proc.stdout.read（8196） AttributeError的： '诠释' 对象有没有属性 '标准输出'

#!/usr/bin/python 

from io import BytesIO 
import urllib2 
import tarfile 
import ftplib 
import socket 
import threading 
import subprocess 

tarfile_url = "ftp://myftpserver.com/pkgsrc/doxygen_pkgs/test.tg 
z" 

try: 
    ftpstream = urllib2.urlopen(tarfile_url) 
except URLerror, e: 
    print "URL timeout" 
except socket.timeout: 
    print "Socket timeout" 


# BytesIO creates an in-memory temporary file. 
tmpfile = BytesIO() 
last_size = 0 
tfile_extract = "" 

while True: 
    proc = subprocess.call(['tar','-xzvf', tarfile_url], stdout=subprocess.PIPE) 
    # Download a piece of the file from the ftp connection 
    stream = proc.stdout.read(8196) 
    if not stream: break 
    tmpfile.write(bytes(stream)) 
    # Seeking back to the beginning of the temporary file. 
    tmpfile.seek(0) 
    # r|gz forbids seeking backward; r:gz allows seeking backward 
    try: 
     tfile = tarfile.open(fileobj=tmpfile, mode="r:gz") 
     print tfile.extractfile("+CONTENTS") 
     tfile_extract_text = tfile_extract.read() 
     print tfile_extract.tell() 
     tfile.close() 
     if tfile_extract.tell() > 0 and tfile_extract.tell() == last_size: 
      print tfile_extract_text 
      break 
     else: 
      last_size = tfile_extract.tell() 
    except Exception: 
     tfile.close() 
     pass 


tfile_extract_text = tfile_extract.read() 
print tfile_extract_text 

# When you're done: 
tfile.close() 
tmpfile.close()

来源

2015-04-06 rayray84

你为什么要反复在循环中调用焦油？ –

对不起，我没有注意到，我看到这是我的问题的一部分。最初我试图直接从tar文件流。 tarfile模块不会让我直接从它流，因为它需要建立索引之前，让我流。 – rayray84

另外，在ftp URL上运行'tar'似乎是错误的。您需要将文件保存到磁盘并在本地文件上运行'tar'。 – vikramls

扩展我上面的评论，你需要做下来使用urllib2和tempfile将tar文件加载到临时文件，然后使用tarfile打开此临时文件。

下面是一些代码开始：

import urllib2 
import tarfile 
from tempfile import TemporaryFile 

f_url = 'url_of_your_tar_archive' 
ftpstream = urllib2.urlopen(f_url) 
tmpfile = TemporaryFile() 

# Download contents of tar to a temporary file 
while True: 
    s = ftpstream.read(16384) 
    if not s: 
     break 
    tmpfile.write(s) 
ftpstream.close() 

# Access the temporary file to extract the file you need 
tmpfile.seek(0) 
tfile = tarfile.open(fileobj=tmpfile, mode='r:gz') 
print tfile.getnames() 
contents = tfile.extractfile("+CONTENTS").read() 
print contents

来源

2015-04-06 20:06:53 vikramls

谢谢！ @vikramls我想我正在尝试/希望能以某种不适合的方式工作。在附注中，是否有任何方法可以阻止转移并阅读第一个1mb？这些文件相当大（每个100-200mb），因此需要一些时间才能复制它，并只读取内容文件夹，该文件夹位于第一个1mb。 – rayray84

在tgz文件上运行python subprocess.call来解压和流输出

回答

相关问题