解析和修改文件与Python

让我们假设我有以下文件：解析和修改文件与Python

H 0 -15.7284260000000  -16.4229420000000  0.364919000000000 
H 0 -16.4853770000000  -15.1118660000000  0.364919000000000 
O 0 -17.9378060000000  -14.2325190000000  0.944687000000000 
H 0 -18.7307670000000  -14.6487540000000  0.606761000000000 
H 0 -17.9738160000000  -13.3376780000000  0.606761000000000 
H 0 -17.1677320000000  -11.1468579990000  0.307511000000000 
...

和包含在文件中的行号下面的列表，根据一定的标准划分：index = [[1,3][4,7][2,5,6]]。

我想重写文件，添加一个标签到每一行根据假定的标准，即行1和3将得到标签'H'，行4,7标签'M'和行2,5,6标签“L”，来获取文件：

H 0 -15.7284260000000  -16.4229420000000  0.364919000000000 H 
H 0 -16.4853770000000  -15.1118660000000  0.364919000000000 L 
O 0 -17.9378060000000  -14.2325190000000  0.944687000000000 H 
H 0 -18.7307670000000  -14.6487540000000  0.606761000000000 M 
H 0 -17.9738160000000  -13.3376780000000  0.606761000000000 L 
H 0 -17.1677320000000  -11.1468579990000  0.307511000000000 L 
H 0 -10.3904079990000  -10.7642359990000  0.664160000000000 M 
...

我用下面的代码，但我不能够包括在write()方法所需要的条件，任何帮助是值得欢迎的。提前致谢。

try: 
    input_file = open(file, 'r') 
    input = input_file.readlines() 
    print 'Input file \"' + file + '\" was read' 
except: 
    error_mssg = 'Please provide an input file' 
    sys.exit(error_mssg) 

with open('output.com','w') as output: 
     while ii<=len(input)-1: 
     if(input[ii].strip()==''): 
      break 
     output.write(input[ii].strip()+' H'+'\n') 
     ii = ii + 1

来源

2017-03-06 Panadestein

你不能做什么？ –

'即第1行和第3行将得到标签'H'，第4,7行标签'M'等等，以获得文件'，你是如何决定H/M的？ '[2,5,6]'会得到什么？以及根据什么标准？ –

我无法根据列表中的数据选择某个标签，并将其添加到相关行中。这只是一个例子，对不起，如果不清楚，第[2,5,6]行会得到另一个标签。 – Panadestein

file = 'input.txt' 

try: 
    input_file = open(file, 'r') 
    input_lines = input_file.readlines() 
    print('Input file \"' + file + '\" was read') 
except: 
    error_mssg = 'Please provide an input file' 
    sys.exit(error_mssg) 

index_mapping = {'H': [1,3], 
       'M': [4,7], 
       'L': [2,5,6]} 

index_mapping_reversed = {val : key for key in index_mapping for val in index_mapping[key]} 

index_mapping_reversed 
# {1: 'H', 2: 'L', 3: 'H', 4: 'M', 5: 'L', 6: 'L', 7: 'M'} 

with open('output.txt','w') as output: 
    for idx, line in enumerate(input_lines): 
     suffix = '' 
     if idx + 1 in index_mapping_reversed: 
      suffix = ' ' + index_mapping_reversed.get(idx + 1, '') 
     output.write(line.strip() + suffix + '\n')

output.txt中：

H 0 -15.7284260000000  -16.4229420000000  0.364919000000000 H 
H 0 -16.4853770000000  -15.1118660000000  0.364919000000000 L 
O 0 -17.9378060000000  -14.2325190000000  0.944687000000000 H 
H 0 -18.7307670000000  -14.6487540000000  0.606761000000000 M 
H 0 -17.9738160000000  -13.3376780000000  0.606761000000000 L 
H 0 -17.1677320000000  -11.1468579990000  0.307511000000000 L

来源

2017-03-06 15:44:58

对于您而言，最简单的方法可能是在您将线条写回之前执行一些中间处理。

你想要的字符追加到列表中的每一行，定列表/人物配对的几种组合：

def append_char(text, char, lines): 
    """Given a list of text lines, text, a char, and a list of line 
    numbers, lines, append the char to each line identified by number. 
    Note that line numbers start at 1, while text indexes start at 0. 
    """ 
    for l in lines: 
     text[l-1] += ' ' + char

然后运行它，这样做：

letters = 'HM' 

for i, ch in enumerate(letters): 
    append_char(input, ch, index[i])

要知道，如果有任何碰撞，你会得到'嘘陛下'，而不是'嘘HM'，如果这很重要。

来源

2017-03-06 15:35:11

d = { 0 : 'H', 
     1 : 'H', 
     2 : 'M', 
    } 
def ending(i): 
    return d.get(i, '') + '\n' 

with open('input.txt') as f: 
    lines = f.readlines() 

with open('output.txt', 'w+') as o: 
    for i, line in enumerate(lines): 
     o.write('{}{}'.format(line, ending(i)))

下面介绍一种方法。在这里，我们封装了用于确定函数ending中行结束的逻辑。如果您事先知道哪些行需要更改，您可以使用像这样的字典解决方案。如果它需要一些计算（比如根据线本身），那么重写ending以反映这一点，确保它接受确定线路终点所需的所有信息作为参数。

来源

2017-03-06 15:39:42

你没有理由来读取内存中的所有：如果你要处理大文件时，它不会加快什么的只能浪费内存。

我不明白你怎么设法获得魔法值'H'和'M'，所以我认为他们的index数组中给予的，我认为进行预处理阵列来获得地图{LINE_NUMBER：标签}。然后，我只需要一次一个读取输入行，然后添加标签（如果存在）：

index = [([1,3], 'H'), ([4,7], 'M'), ([2,5,6], None)] 

def preprocess(index): 
    h = {} 
    for elt in index: 
     if elt[1] is not None: 
      for num in elt[0]: h[num] = elt[1] 
    return h 

with open(file, 'r') as inputfile: 
    with open('output.com','w') as output: 
     h = preprocess(index) 
     for num, line in enumerate(inputfile, 1): 
      if num in h: line = line.rstrip() + " " + h[num] + "\n" 
      dummy = output.write(line)

来源

2017-03-06 16:01:48

谢谢你的回答，我发布的问题只是更大问题的一部分。但是，也许你是对的，我不需要在内存中加载所有内容，我会检查。 – Panadestein

解析和修改文件与Python

回答

相关问题