2013-03-27 121 views
6

我有一个用户输入的字符串,我想搜索它并用替换字符串替换任何出现的单词列表。用python中的另一个字符串替换单词列表中的所有单词

import re 

prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"] 


# word[1] contains the user entered message 
themessage = str(word[1])  
# would like to implement a foreach loop here but not sure how to do it in python 
for themessage in prohibitedwords: 
    themessage = re.sub(prohibitedWords, "(I'm an idiot)", themessage) 

print themessage 

上面的代码不起作用,我敢肯定我不明白python for循环是如何工作的。

+0

你应该尝试检查出的蟒蛇spambayes实现可能更具可扩展性。 – dusual 2013-03-27 12:18:01

回答

11

你可以做到这一点与一个调用sub

big_regex = re.compile('|'.join(map(re.escape, prohibitedWords))) 
the_message = big_regex.sub("repl-string", str(word[1])) 

例子:

>>> import re 
>>> prohibitedWords = ['Some', 'Random', 'Words'] 
>>> big_regex = re.compile('|'.join(map(re.escape, prohibitedWords))) 
>>> the_message = big_regex.sub("<replaced>", 'this message contains Some really Random Words') 
>>> the_message 
'this message contains <replaced> really <replaced> <replaced>' 

注意,使用str.replace可能导致微妙的错误:

>>> words = ['random', 'words'] 
>>> text = 'a sample message with random words' 
>>> for word in words: 
...  text = text.replace(word, 'swords') 
... 
>>> text 
'a sample message with sswords swords' 

同时使用re.sub给出正确的结果:

>>> big_regex = re.compile('|'.join(map(re.escape, words))) 
>>> big_regex.sub("swords", 'a sample message with random words') 
'a sample message with swords swords' 

由于thg435指出,如果要更换不是每个子串,你可以添加单词边界的正则表达式:

big_regex = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, words))) 

这会取代'random''random words'而不是'pseudorandom words'

+0

你可以显示一个运行 – 2013-03-27 12:03:51

+0

但是,如果你有很多词要替换,你将不得不打破它。 – DSM 2013-03-27 12:15:18

+0

您可能希望将您的表达式放在'\ b'中以避免替换“零售商”中的“tail”。 – georg 2013-03-27 12:31:30

4

试试这个:

prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"] 

themessage = str(word[1])  
for word in prohibitedwords: 
    themessage = themessage.replace(word, "(I'm an idiot)") 

print themessage 
+0

这很脆弱:正如Bakuriu解释的,当一个被禁止的单词是另一个的子串时,它很容易中断。 – Adam 2013-03-27 12:19:51

+0

@codesparkle这并不意味着这是错误的,你总是选择你的选择取决于某些条件 – 2013-03-27 12:25:48

0

代码:

prohibitedWords =["MVGame","Kappa","DatSheffy","DansGame", 
        "BrainSlug","SwiftRage","Kreygasm", 
        "ArsonNoSexy","GingerPower","Poooound","TooSpicy"] 
themessage = 'Brain' 
self_criticism = '(I`m an idiot)' 
final_message = [i.replace(themessage, self_criticism) for i in prohibitedWords] 
print final_message 

结果:

['MVGame', 'Kappa', 'DatSheffy', 'DansGame', '(I`m an idiot)Slug', 'SwiftRage', 
'Kreygasm', 'ArsonNoSexy', 'GingerPower', 'Poooound','TooSpicy'] 
相关问题