2016-10-08 35 views
1

我需要编写一个正则表达式来代替'.'','在某些患者对药物的评论中。他们在提到副作用后应该使用逗号,但其中一些使用了点。例如:正则表达式用客户意见中的逗号替换一些点

text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it." 

我写一个正则表达式的代码来检测一个字(如,头痛)或两个单词(如,坏的梦)由两个点包围:

检测由包围的字两个点:

text= re.sub (r'(\.)(\s*\w+\s*\.)',r',\2 ', text) 

检测两个词用两个点所包围:

text = re.sub (r'(\.)(\s*\w+\s\w+\s*\.)',r',\2 ', text11) 

这是输出:

the drug side-effects are: night mare, nausea, night sweat. bad dream, dizziness, severe headache. I suffered, she suffered. she told I should change it. 

但它应该是:night sweat to ','

the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it. 

我的代码并没有取代dot。另外,if a sentence starts with a subject pronoun (such as I and she) I do not want to change dot to comma after it, even if it has two words (such as, I suffered)。我不知道如何将这个条件添加到我的代码中。

有什么建议吗?谢谢 !

+0

请参阅https://regex101.com/r/awW1Hc/1,这是你想达到什么目的?你将不得不硬编码代词,没有办法。 –

+0

@ Sebastian Proske,谢谢!完美的作品! – Mary

回答

1

您可以使用以下模式:

\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$) 

这点相匹配,然后捕获一两句话,其中第一个是没有你提到的代词(你将需要展开列表最有可能的) 。这后面跟着一个既不是单词也不是空格的字符(例如.!:,)或字符串的结尾。

这样您就可以与,\1

来取代它在蟒蛇

import re 
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it." 
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I) 
print(text) 

输出

the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it. 

这很可能不是绝对的故障安全,你可能需要扩大一些模式边缘情况。

相关问题