获取单词而不是子字符串

我想搜索一个带有句子的文件，并获得带有某些单词的句子。我写了这个代码来做到这一点。获取单词而不是子字符串

def finding(q): 
    for item in sentences: 
     if item.lower().find(q.lower()) != -1: 
      list.append(item) 

     for sentence in list: 
      outfile.write(sentence+'\r\n') 

finding('apple') 
finding('banana')

问题是这会找到子字符串而不是单词。例如，句子'appletree很大'。也会被解压。

来源

2013-12-21 user3119123

代码'item.lower（）找到（q.lower（））= -1'更好拼写'在item.lower q.lower（）（） ' – Eric

请勿使用名称'list';这掩盖了内置类型。使用'found'或类似的描述。 –

小心 - 不要在函数中初始化'list' – Eric

将这一行拆分为单词;最简单的就是使用str.split()：

for line in sentences: 
    if any(q.lower() == word.lower() for word in line.split()): 
     outfile.write(line + '\n')

您可以添加.strip('?!."()')除去最常见的标点符号以及，也许吧。

请注意，如果您写出\n，则在文本模式下打开的Python文件已经在Windows上使用\r\n。上面的代码也直接将匹配的行写入输出文件。

另外，使用正则表达式来查找匹配：

import re 

def finding(q, sentences, outfile): 
    pattern = re.compile(r'\b{}\b'.format(re.escape(q), flags=re.IGNORE) 
    for line in sentences: 
     if pattern.match(line) 
      outfile.write(line + '\n')

re.IGNORE使匹配忽略大小写，\b增加了单词边界和re.escape()将删除输入查询的任何表达式元字符。

来源

2013-12-21 16:10:46

一种替代方案：！

sentences = [ 
    'this has a banana', 
    'this one does not', 
    'bananatree should not be here', 
    'go go banana go' 
] 

import re 
found = filter(re.compile(r'\bbanana\b', flags=re.I).search, sentences) 
# ['this has a banana', 'go go banana go']

来源

2013-12-21 16:31:42

获取单词而不是子字符串

回答

相关问题