2013-06-26 46 views
0

我有一个Python脚本,处理数据文件:写空字节的文件,而不是正确的字符串

out = open('result/process/'+name+'.res','w') 
out.write("source,rssi,lqi,packetId,run,counter\n") 
f = open('result/resultat0.res','r') 
for ligne in [x for x in f if x != '']: 
    chaine = ligne.rstrip('\n') 
    tmp = chaine.split(',') 
    if (len(tmp) == 6): 
     out.write(','.join(tmp)+"\n") 
f.close() 

的完整源代码here

我使用这个脚本在多台计算机和行为不一样。 在第一台计算机上,使用python 2.6.6,结果就是我所期望的。 然而,在其他(python 2.6.6,3.3.2,2.7.5)文件对象的写入方法放置空字节,而不是在大多数处理期间我想要的值。我得到这个结果:

$ hexdump -C result/process/1.res 
00000000 73 6f 75 72 63 65 2c 72 73 73 69 2c 6c 71 69 2c |source,rssi,lqi,| 
00000010 70 61 63 6b 65 74 49 64 2c 72 75 6e 2c 63 6f 75 |packetId,run,cou| 
00000020 6e 74 65 72 0a 00 00 00 00 00 00 00 00 00 00 00 |nter............| 
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 
* 
0003a130 00 00 00 00 00 00 00 00 00 00 31 33 2c 36 35 2c |..........13,65,| 
0003a140 31 34 2c 38 2c 39 38 2c 31 33 31 34 32 0a 31 32 |14,8,98,13142.12| 
0003a150 2c 34 37 2c 31 37 2c 38 2c 39 38 2c 31 33 31 34 |,47,17,8,98,1314| 
0003a160 33 0a 33 2c 34 35 2c 31 38 2c 38 2c 39 38 2c 31 |3.3,45,18,8,98,1| 
0003a170 33 31 34 34 0a 31 31 2c 38 2c 32 33 2c 38 2c 39 |3144.11,8,23,8,9| 
0003a180 38 2c 31 33 31 34 35 0a 39 2c 32 30 2c 32 32 2c |8,13145.9,20,22,| 

你有什么想法如何解决这个问题吗?

+1

我查看了链接中的完整代码。你是Python和面向对象编程的新手?问题在于你使用全局变量并在各个地方打开文件,将文件句柄存储在字典中:这是非常难以理解的。你的代码拼命需要重构。 – MattH

+1

一些一般的评论。你有没有试过调试器?在列表理解中添加一个打印语句以验证输出是否如预期的那样。不要使用'rstrip',尝试使用'strip()'去除所有行尾字符,包括尾随空格。 –

+0

我尝试使用print-statement,输出是正确的行,而不是空字节。 – user2523255

回答

2

有以下考虑:

  • 在过编程蟒蛇的十年中,我从来没有碰到过一个令人信服的理由来使用global。相反,将参数传递给函数。
  • 为确保文件在完成时关闭,请使用with statement

这是一个(未经测试的)重构代码以实现完整性的尝试,假设您有足够的内存来容纳特定标识符下的所有行。

如果在重构后您的结果文件中有空字节,那么我们有合理的基础来继续调试。

import os 
import re 
from contextlib import closing 

def list_files_to_process(directory='results'): 
    """ 
    Return a list of files from directory where the file extension is '.res', 
    case insensitive. 
    """ 
    results = [] 
    for filename in os.listdir(directory): 
    filepath = os.path.join(directory,filename) 
    if os.path.isfile(filepath) and filename.lower().endswith('.res'): 
     results.append(filepath) 
    return results 

def group_lines(sequence): 
    """ 
    Generator, process a sequence of lines, separated by a particular line. 
    Yields batches of lines along with the id from the separator. 
    """ 
    separator = re.compile('^A:(?P<id>\d+):$') 
    batch = [] 
    batch_id = None 
    for line in sequence: 
    if not line: # Ignore blanks 
     continue 
    m = separator.match(line): 
    if m is not None: 
     if batch_id is not None or len(batch) > 0: 
     yield (batch_id,batch) 
     batch_id = m.group('id') 
     batch = [] 
    else: 
     batch.append(line) 
    if batch_id is not None or len(batch) > 0: 
    yield (batch_id,batch) 

def filename_for_results(batch_id,result_directory): 
    """ 
    Return an appropriate filename for a batch_id under the result directory 
    """ 
    return os.path.join(result_directory,"results-%s.res" % (batch_id,)) 

def open_result_file(filename,header="source,rssi,lqi,packetId,run,counter"): 
    """ 
    Return an open file object in append mode, having appended a header if 
    filename doesn't exist or is empty 
    """ 
    if os.path.exists(filename) and os.path.getsize(filename) > 0: 
    # No need to write header 
    return open(filename,'a') 
    else: 
    f = open(filename,'a') 
    f.write(header + '\n') 
    return f 

def process_file(filename,result_directory='results/processed'): 
    """ 
    Open filename and process it's contents. Uses group_lines() to group 
    lines into different files based upon specific line acting as a 
    content separator. 
    """ 
    error_filename = filename_for_results('error',result_directory) 
    with open(filename,'r') as in_file, open(error_filename,'w') as error_out: 
    for batch_id, lines in group_lines(in_file): 
     if len(lines) == 0: 
     error_out.write("Received batch %r with 0 lines" % (batch_id,)) 
     continue 
     out_filename = filename_for_results(batch_id,result_directory) 
     with closing(open_result_file(out_filename)) as out_file: 
     for line in lines: 
      if line.startswith('L') and line.endswith('E') and line.count(',') == 5: 
      line = line.lstrip('L').rstrip('E') 
      out_file.write(line + '\n') 
      else: 
      error_out.write("Unknown line, batch=%r: %r\n" %(batch_id,line)) 

if __name__ == '__main__': 
    files = list_files_to_process() 
    for filename in files: 
    print "Processing %s" % (filename,) 
    process_file(filename) 
+1

+1! –

+0

它适用于一些小的更正(每行末尾有一个换行符)。谢谢 !!! – user2523255

+0

@ user2523255:不客气。出于某种原因,我错误地认为迭代文件对象吃了换行符。它也修复你的空字节问题? – MattH

相关问题