使用python进行文本搜索

我正在处理文本搜索项目，并使用文本blob从文本中搜索句子。 TextBlob有效地使用关键字拉取所有句子。然而，对于有效的研究，我也想提出一个句子，之后我无法确定。使用python进行文本搜索

下面是我使用的代码：

def extraxt_sents(Text,word): 
    search_words = set(word.split(',')) 
     sents = ''.join([s.lower() for s in Text]) 
     blob = TextBlob(sents) 
    matches = [str(s) for s in blob.sentences if search_words & set(s.words)] 
    print search_words 
    print(matches)

来源

2014-07-21 Raghav Shaligram

你的代码中是否有一些缩进错误？ –

我建议，看看'nltk' – cengizkrbck

@cengizkrbck TextBlob似乎比nltk工作得更好。我一个，一个不能在前一个和后一个数字中找出一个句子。 –

如果你想之前得到的线条和比赛结束后，你可以创建一个循环，并记住前行，或使用slices，像[from:to]在blob.sentences列表中。

最好的方法可能是使用enumerate bultin函数。

match_region = [map(str, blob.sentences[i-1:i+2])  # from prev to after next 
       for i, s in enumerate(blob.sentences) # i is index, e is element 
       if search_words & set(s.words)]  # same as your condition

这里，blob.sentences[i-1:i+2]将提取的子表从指数i-1（含）跨越到指数i+2（独家），和map轮流在此列表为字符串的元素。

注意：其实，你可能想用max(0, i-1)代替i-1;否则i-1可能是-1，Python会将其解释为最后一个元素，产生一个空片段。另一方面，如果i+2高于列表的长度，则这不会成为问题。

来源

2014-07-21 14:33:53

使用python进行文本搜索

回答

相关问题