这是我在Stack Overflow上的第一篇文章,我有一个关于使用GZ压缩从TAR文件中提取单个文件的问题。我不是最好的Python,所以我可能会这样做不正确,任何帮助将不胜感激。处理来自损坏的GZ(TAR)的单个文件提取
场景:
损坏* .tar.gz文件进来,在广州的第一个文件包含了获取系统的SN的重要信息。这可以用来识别机器,以便我们可以向管理员发送文件已损坏的通知。
的问题:
使用常规的UNIX焦油二元我能提取刚刚从归档中的README文件,即使档案是不完整的,在充分提取它会返回一个错误。但是,在Python中,我无法仅提取一个文件,即使我只指定单个文件,它也会返回一个异常。
目前的解决方法:
我使用“os.popen”使用UNIX焦油二进制为了获得公正的README文件。
期望解:
使用Python tar文件包只提取单个文件。
例错误:
UNIX(工程):
[[email protected] tmp]# tar -xvzf bundle.tar.gz README
README
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
[[email protected] tmp]#
[[email protected] tmp]# ls
bundle.tar.gz README
的Python:
>>> import tarfile
>>> tar = tarfile.open("bundle.tar.gz")
>>> data = tar.extractfile("README").read()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib64/python2.4/tarfile.py", line 1364, in extractfile
tarinfo = self.getmember(member)
File "/usr/lib64/python2.4/tarfile.py", line 1048, in getmember
tarinfo = self._getmember(name)
File "/usr/lib64/python2.4/tarfile.py", line 1762, in _getmember
members = self.getmembers()
File "/usr/lib64/python2.4/tarfile.py", line 1059, in getmembers
self._load() # all members, we first have to
File "/usr/lib64/python2.4/tarfile.py", line 1778, in _load
tarinfo = self.next()
File "/usr/lib64/python2.4/tarfile.py", line 1588, in next
self.fileobj.seek(self.offset)
File "/usr/lib64/python2.4/gzip.py", line 377, in seek
self.read(1024)
File "/usr/lib64/python2.4/gzip.py", line 225, in read
self._read(readsize)
File "/usr/lib64/python2.4/gzip.py", line 273, in _read
self._read_eof()
File "/usr/lib64/python2.4/gzip.py", line 309, in _read_eof
raise IOError, "CRC check failed"
IOError: CRC check failed
>>> print data
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'data' is not defined
的Python(处理异常):
>>> tar = tarfile.open("bundle.tar.gz")
>>> try:
... data = tar.extractfile("README").read()
... except:
... pass
...
>>> print(data)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'data' is not defined
查看tarfile.py代码,extractfile调用最终调用getmembers的getmember。 getmembers扫描整个tar文件,当它遇到EOF/Corrupted时,gzip会吱吱作响。尝试提供一个已经解压缩的流,以便crc异常不会被提取出来。 – kevpie 2010-12-04 04:32:32