有没有办法删除重复和连续字符串中的单词/短语?例如。有没有办法删除字符串中重复和连续的单词/短语?
[中]:foo foo bar bar foo bar
[出]:foo bar foo bar
我已经试过这样:
>>> s = 'this is a foo bar bar black sheep , have you any any wool woo , yes sir yes sir three bag woo wu wool'
>>> [i for i,j in zip(s.split(),s.split()[1:]) if i!=j]
['this', 'is', 'a', 'foo', 'bar', 'black', 'sheep', ',', 'have', 'you', 'any', 'wool', 'woo', ',', 'yes', 'sir', 'yes', 'sir', 'three', 'bag', 'woo', 'wu']
>>> " ".join([i for i,j in zip(s.split(),s.split()[1:]) if i!=j]+[s.split()[-1]])
'this is a foo bar black sheep , have you any wool woo , yes sir yes sir three bag woo wu'
当它变得有点复杂,我想会发生什么删除短语(假设短语可以由多达5个字组成)?如何做呢?例如。
[IN]:foo bar foo bar foo bar
[OUT]:foo bar
又如:
[IN]:this is a sentence sentence sentence this is a sentence where phrases phrases duplicate where phrases duplicate . sentence are not prhases .
[OUT]:this is a sentence where phrases duplicate . sentence are not prhases .
聪明的答案! +1但是,如果应用于一个非常大的字符串,会出现性能问题吗? – ridgerunner