格式的正则表达式在Python

我有话格式的正则表达式在Python

wordlist = ['hypothesis' , 'test' , 'results' , 'total']

的名单上有一句话

sentence = "These tests will benefit in the long run."

我要检查，看看是否在wordlist的词在句子。我知道，你可以检查，看看他们是否正在使用中的一句话子：

for word in wordlist: 
    if word in sentence: 
     print word

但是，使用子，我开始匹配不在wordlist的话，例如这里test将显示为一个子即使它是句子中的tests。我可以通过使用正则表达式来解决我的问题，但是，是否可以通过用每个新单词格式化的方式实现正则表达式，这意味着如果我想查看该单词是否在句子中，则：

for some_word_goes_in_here in wordlist: 
    if re.search('.*(some_word_goes_in_here).*', sentence): 
     print some_word_goes_in_here

所以在这种情况下，正则表达式会将some_word_goes_in_here解释为需要搜索的模式，而不是some_word_goes_in_here的值。有没有一种方法来格式化输入some_word_goes_in_here，以便正则表达式搜索some_word_goes_in_here的值？

来源

2014-01-08 kolonel

如果你有更好的溶胶我渴望听到它。 – kolonel

尝试使用：

if re.search(r'\b' + word + r'\b', sentence):

\b字界限，将你的话和非单词字符之间的匹配（单词字符是任何字母，数字或下划线）。

例如，

>>> import re 
>>> wordlist = ['hypothesis' , 'test' , 'results' , 'total'] 
>>> sentence = "The total results for the test confirm the hypothesis" 
>>> for word in wordlist: 
...  if re.search(r'\b' + word + r'\b', sentence): 
...    print word 
... 
hypothesis 
test 
results 
total

随着你的字符串：

>>> sentence = "These tests will benefit in the long run." 
>>> for word in wordlist: 
...  if re.search(r'\b' + word + r'\b', sentence): 
...   print word 
... 
>>>

什么也没有打印

来源

2014-01-08 10:58:54 Jerry

谢谢。是的，但在这种情况下，没有什么应该匹配。 – kolonel

@kolonel我使用了一个不同的字符串，但让我把你的一点点 – Jerry

不要使用'list'作为变量名，掩盖默认类型.. –

使用\b字边界来测试的话：

for word in wordlist: 
    if re.search(r'\b{}\b'.format(re.escape(word)), sentence): 
     print '{} matched'.format(word)

但你也可以把这个句子分成单独的单词。使用一组单词列表将让测试更有效率：

words = set(wordlist) 
if words.intersection(sentence.split()): 
    # no looping over `words` required.

演示：

>>> import re 
>>> wordlist = ['hypothesis' , 'test' , 'results' , 'total'] 
>>> sentence = "These tests will benefit in the long run." 
>>> for word in wordlist: 
...  if re.search(r'\b{}\b'.format(re.escape(word)), sentence): 
...   print '{} matched'.format(word) 
... 
>>> words = set(wordlist) 
>>> words.intersection(sentence.split()) 
set([]) 
>>> sentence = 'Lets test this hypothesis that the results total the outcome' 
>>> for word in wordlist: 
...  if re.search(r'\b{}\b'.format(re.escape(word)), sentence): 
...   print '{} matched'.format(word) 
... 
hypothesis matched 
test matched 
results matched 
total matched 
>>> words.intersection(sentence.split()) 
set(['test', 'total', 'hypothesis', 'results'])

来源

2014-01-08 11:00:57

我正在考虑使用're.escape'并决定反对它，因为_words_不需要转义。在更一般的情况下，这是一个很好的建议。 – Alfe

@MartijnPieters谢谢。 – kolonel

@MartjinPieters我认为将句子拆分成单词可能会引入错误，因为找到单词之间的界限并不是一项简单的任务。 – kolonel

我会使用这样的：

words = "hypothesis test results total".split() 
# ^^^ but you can use your literal list if you prefer that 
for word in words: 
    if re.search(r'\b%s\b' % (word,), sentence): 
    print word

您甚至可以通过加快这使用单个正则表达式：

for foundWord in re.findall(r'\b' + r'\b|\b'.join(words) + r'\b', sentence): 
    print foundWord

来源

2014-01-08 11:03:20 Alfe

感谢您的解决方案。 – kolonel

格式的正则表达式在Python

回答

相关问题