2016-11-09 99 views
1

我使用CLiPS pattern.search(Python 2.7)在文本中进行模式匹配。 我需要提取两个对应'VBN NP'和'NP NP NP'的短语。 我可以单独做,然后再加入结果:如何在CLiPS模式中连接搜索模式。搜索

from pattern.en import parse,parsetree 
from pattern.search import search 

text="Published case-control studies have a lot of information about susceptibility to asthma." 
sentenceTree = parsetree(text, relations=True, lemmata=True) 
matches = [] 
for match in search("VBN NP",sentenceTree): 
    matches.append(match.string) 
for match in search("NP TO NP",sentenceTree): 
    matches.append(match.string) 
print matches 
# Output: [u'Published case-control studies', u'susceptibility to asthma'] 

但ID我想加入这一个搜索模式。如果我尝试这个,我根本得不到任何结果。

matches = [] 
for match in search("VBN NP|NP TO NP",sentenceTree): 
    matches.append(match.string) 
print matches 
#Output: [] 

Official documentation对此没有提供线索。我也试过'{VBN NP} | {NP TO NP}''[VBN NP] | [NP TO NP]',但没有任何运气。

问题是: 是否有可能在CLiPS pattern.search中连接搜索模式? 如果答案是“是”,那么该怎么做?

回答

0

这种模式很适合我,{VBN NP} * + {NP TO NP},与匹配沿()和组()方法

>>> from pattern.search import match 
>>> from pattern.en import parsetree 


>>> t = parsetree('Published case-control studies have a lot of information about susceptibility to asthma.',relations= True) 

>>> m = match('{VBN NP} *+ {NP TO NP}',t) 

>>> m.group(0) #matches the complete pattern 
[Word(u'Published/VBN'), Word(u'case-control/NN'), Word(u'studies/NNS'), Word(u'have/VBP'), Word(u'a/DT'), Word(u'lot/NN'), Word(u'of/IN'), Word(u'information/NN'), Word(u'about/IN'), Word(u'susceptibility/NN'), Word(u'to/TO'), Word(u'asthma/NN')] 
>>> m.group(1) # matches the first group 
[Word(u'Published/VBN'), Word(u'case-control/NN')] 
>>> m.group(2) # matches the second group 
[Word(u'susceptibility/NN'), Word(u'to/TO'), Word(u'asthma/NN')] 

终于可以结果显示为

>>> matches=[] 
>>> for i in range(2): 
...  matches.append(m.group(i+1).string) 
... 
>>> matches 
[u'Published case-control', u'susceptibility to asthma'] 
+0

这是一个例子。单词顺序可能不同或者只有一种类型的模式会出现在句子中......或者多于两个。它会失败。 –