2017-01-23 94 views
0

我有这样的代码片段:合并的Unicode CSV文件,Python 2.7版

import csv, sys, os 
rootdir = sys.argv[1] 
for root,subFolders, files in os.walk(rootdir): 
    outfileName = rootdir + "\\root-dir.csv" # hardcoded path 
    #for subdir in subFolders: 
    for file in files: 
     filePath = os.path.join(root, file) 
     with open(filePath) as csvin: 
      readfile = csv.reader(csvin, delimiter=',') 
      with open(outfileName, 'a') as csvout: 
       writefile = csv.writer(csvout, delimiter=',', lineterminator='\n') 
       for row in readfile: 
        row.extend([file]) 
        writefile.writerow(row) 
       csvout.close() 
      csvin.close() 
print("Ready!") 

它的伟大工程与ASCII文件,但不能使用Unicode版本。 以下是autoruns日志文件的示例:https://cloud.mail.ru/public/6Gqc/MKjKaqs8B。我需要将一些这样的文件合并到一个文件中。 如何更改此代码以执行此操作?它需要为python 2.7。

预先感谢您!

回答

0

Python文档有一个很好的例子reading/writing to unicode CSVs

class UnicodeReader: 
    """ 
    A CSV reader which will iterate over lines in the CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     f = UTF8Recoder(f, encoding) 
     self.reader = csv.reader(f, dialect=dialect, **kwds) 

    def next(self): 
     row = self.reader.next() 
     return [unicode(s, "utf-8") for s in row] 

    def __iter__(self): 
     return self 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([s.encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 
+0

我试过使用它,但没有正确读取数据。当我试图打开一个原始文件时,它会抛出一个错误:'utf8'编解码器无法解码位置0中的字节0xff。当我从文件的开头删除2个字节时,它会抛出一个错误:line包含空字节 – Oleg

+0

@Oleg这听起来像你的数据文件是UTF-16,而不是UTF-8。 –

+0

也许,你是否打算阅读阅读UTF-16的方法? – Oleg