阅读内容与Python

我是新来的Python和正在运行到读。广州文件的内容问题：阅读内容与Python

我已经有了一个完整的，我已经加上.gz文件的文件夹使用私有API以编程方式提取。每个.gz文件的内容都是一个.xml文件，所以我需要遍历该目录并提取它们。

问题是，当我以编程方式将这些.gz文件解压缩到它们各自的.xml版本时......文件创建时没有错误，当我打开一个（使用TextWrangler）时，它看起来像一个普通的.xml文件，我用十六进制编辑器查看它。另外，当我以编程方式打开.xml文件并打印它的内容时，它显示为一堆（二进制？）混乱的文本。

考虑到上述情况，如果我手动提取其中一个文件（即：使用OSX，但不是Python），该文件可以在我希望的十六进制编辑器中查看。

这里是我的代码片段（适当的进口没有显示，但它们是水珠和gzip）：

searchpattern = siteid + "_" + resource + "_*.gz" 
for infile in glob.glob(workingDir + searchpattern): 
    print infile 

    #read the zipped contents (https://docs.python.org/2/library/gzip.html) 
    f = gzip.open(infile, 'rb') 
    file_content = f.read() 
    file_content = str(file_content) #This was an attempt to fix 
    print file_content # This shows a bunch of mumbo jumbo 

    #write the contents we just read to a new file (uncompressed) 
    newfilename = infile[0:-3] # the filename without the ".gz" 
    newfilename = newfilename + ".xml" 
    fnew = open(newfilename, 'w+b') 
    fnew.write(str(file_content)) 
    fnew.close() 

    #delete the .gz version of the file 
    #os.remove(infile)

来源

2015-02-09 Adam

因此，这在我看来是一个愚蠢的错误，但我会将此作为其他人的后续行为，让我犯同样的错误。

问题是我正在压缩之前在我的程序中已经压缩过的内容。所以考虑到这一点，我在这个线程上的代码片段没有任何问题。（技术上）我创建.gz文件的代码也没有。正如你可以看到下面。通常打开文件，而不是在程序中的早些时候使用gzip库。

#Download and write the contents of each response to a .gz file 
    if limitCounter < limit or int(limit) == 0: 
     print _name + " " + scopeStartDate + " through " + scopeEndDate + " at " + href 
     file = api.get(href) 
     gz_file_content = file.content 
     #gz_file = gzip.open(workingDir + _name, "wb") # This breaks the program later 
     gz_file = open(workingDir + _name, 'wb') # This works. 
     gz_file.write(gz_file_content) 
     gz_file.close()

来源

2015-02-13 18:28:48 Adam

如果我跑这对XML我没有得到与程序的任何问题。

如果我用这个程序压缩和XML并将其提取出来，然后将这个程序的输出与原始文件进行比较，我就没有区别。

该程序不会添加额外的“.xml”扩展名。

来源

2015-02-10 13:12:05

阅读内容与Python

回答

相关问题