与eumiro Delete duplicate rows in textfile - except it contains a "{" or "}" 的帮助下删除文本文件重复字的组合,我可以成功地删除重复的线路在一个大文本文件。这是从60MB到3MB文本文件的一大步。与蟒蛇
但现在我想删除重复的话是这样的:
@INBOOK{Miller1992,
author = {Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
R. Leary and Miller, Rowland S. und Mark R. Leary and Miller, Rowland
S. und Mark R. Leary and Miller, Rowland S. und Mark R. Leary and
Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
Miller, Rowland S. und Mark R. Leary},
year = {1992},
editor = {Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun
A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun A.
van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun A. van
Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk
and Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and
Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun
and Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk},
title = {Handbook of discourse analysis (Bd. 3/4)},
的结果应该是这样的:
@INBOOK{Miller1992,
author = {Miller, Rowland S. und Mark R. Leary},
year = {1992},
editor = {Teun A. van Dijk},
title = {Handbook of discourse analysis (Bd. 3/4)},
文本文件有70000行和authornames可以在多个项目中使用。所以也就只有在大括号中的重复(多行)应删除:
author = {Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
R. Leary and Miller, Rowland S. und Mark R. Leary and Miller, Rowland
S. und Mark R. Leary and Miller, Rowland S. und Mark R. Leary and
Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
Miller, Rowland S. und Mark R. Leary},
我想修改我的Python-Skript其删除重复行的大括号删除重复的话,但我stucked:
words_seen = set() # holds words already seen
outfile = open("literatur_clean.txt", "w")
for line in open("literatur_dupl.txt", "r"):
if ('{' in line or '}' in line
# some code to check whether the words are duplicate
outfile.close()
感谢您的回答,第一个方法似乎不太适合,但我会尝试第二种方法。 – StandardNerd