在python中搜索关键字

我想写一个python脚本，以便它可以搜索文档中的关键字，并检索关键字所在的整个句子。从我的研究中，我看到acora可以使用，但我仍然发现它不成功。在python中搜索关键字

2011-06-30 Ryan

'$猫文档.txt | grep“keyword”' – 2011-06-30 06:25:50

@Franklin与他所说的完全不同。他要求判刑。 –

是的，我意识到grep“关键字”只是为“关键字”。但是我在寻找的是，如果关键字出现，我试图抓住关键字所在的整个句子。有任何想法吗？ – Ryan

这就是你可以简单地在shell中执行它的方法。你应该自己写在脚本中。

>>> text = '''this is sentence 1. and that is sentence 
       2. and sometimes sentences are good. 
       when that's sentence 4, there's a good reason. and that's 
       sentence 5.''' 
>>> for line in text.split('.'): 
...  if 'and' in line: 
...   print line 
... 
and that is sentence 2 
and sometimes sentences are good 
and that's sentence 5

在这里，我分裂text与.split('.')和迭代，然后用字and控制，并且如果其包含，打印它。

您还应该考虑这是区分大小写。您应该考虑您的解决方案很多东西，比如事情!和?结束也句子（但有时他们不）

这是一个句子（哈？），或者你认为（！），所以？

将被分裂为

这是一个句子（HA
），或者你认为（
），所以

来源

2011-06-30 06:32:21

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol...""" 

>>> import re 
>>> s = re.split(r'[.?!:]+', text) 
>>> def search(word, sentences): 
     return [i for i in sentences if re.search(r'\b%s\b' % word, i)] 

>>> search('is', s) 
['Hello, this is the first sentence', ' This is the second']

来源

2011-06-30 06:35:55 JBernardo

-1：即使它没有包含单词“is”，你的函数也会匹配第三个句子。它包含单词“this”中的* sequence *''is''。 – Blair

@Blair哦，是的。没有意识到这一点。修复起来非常简单，你也应该减少其他答案，因为他们还使用'用单词'来找到答案。 – JBernardo

@Blair不敢相信你真的那么做过。试着做个好兄弟 – JBernardo

我不没有太多的经验，但你可能正在寻找nltk。

尝试this;使用span_tokenize并找出您的单词的索引属于哪个范围，然后查看该句子。

来源

2011-06-30 06:36:46 nattofriends

使用grep或egrep命令与python的子进程模块，它可以帮助你。

e.g：

from subprocess import Popen, PIPE 

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout 
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",  
#shell=True, #stdout=PIPE).stdout 
data = stdout.read() 
data.split('\n')

来源

2011-06-30 09:16:39 Yajushi

在python中搜索关键字

回答

相关问题