1
Spacy包含noun_chunks
功能来检索一组名词短语。 功能english_noun_chunks
(附后)使用word.pos == NOUN
Spacy NLP - 使用正则表达式分块
def english_noun_chunks(doc):
labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
'attr', 'root']
np_deps = [doc.vocab.strings[label] for label in labels]
conj = doc.vocab.strings['conj']
np_label = doc.vocab.strings['NP']
for i in range(len(doc)):
word = doc[i]
if word.pos == NOUN and word.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label
elif word.pos == NOUN and word.dep == conj:
head = word.head
while head.dep == conj and head.head.i < head.i:
head = head.head
# If the head is an NP, and we're coordinated to it, we're an NP
if head.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label
我想从保持一定的正则表达式的一句话让块。例如,我的零个或多个形容词后面跟着一个或多个名词。
{(<JJ>)*(<NN | NNS | NNP>)+}
有没有可能不重写english_noun_chunks
函数?
那么这个函数被Cython翻译为C的事实呢? – Serendipity
你说得对,该文件具有'.pyx'扩展名,如果你改写它,你将失去一些性能。但是,你是否需要重写它,或者你可以简单地过滤最终结果? –