2012-10-10 50 views
0

我有喜欢的内容非常大的文本文件:删除重复行 - 除了它包含了一个“{”或“}”

@INBOOK{Ackermann1999-b, 
    author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
     K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
     and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
     Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
     K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
     and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
     Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
     K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
     and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
     Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, 
    year = {1980}, 
    timestamp = {1995-12-02} 
}  

,我想删除除包含这些行重复的行括号{或}。 结果应该是这样的:

@INBOOK{Ackermann1999-b, 
    author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
     Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, 
    year = {1980}, 
    timestamp = {1995-12-02} 
} 

我用这个Python的Skript碰到过,由于维奈Sajip:

lines_seen = set() # holds lines already seen 
outfile = open("literatur_clean.txt", "w") 
for line in open("literatur_dupl.txt", "r"): 
    if line not in lines_seen: # not a duplicate 
     outfile.write(line) 
     lines_seen.add(line) 
outfile.close() 

但它也删除了与一个右括号行},并与线相同的authordata。为此,我需要括号的条件。

有人可以指出我添加此条件吗?

由于提前,

回答

2
if ('{' in line or '}' in line) and line not in lines_seen: # not a duplicate 
+0

感谢eumiro,用小修饰“或”,而不是“和”它完美: 如果(“{”在线或“}”在线)或线不在line_seen: – StandardNerd