0
我的问题类似于这个question。在spacy
中,我可以分别进行词性标注和名词短语标识,例如mergnig名词短语块的POS标签
import spacy
nlp = spacy.load('en')
sentence = 'For instance , consider one simple phenomena :
a question is typically followed by an answer ,
or some explicit statement of an inability or refusal to answer .'
token = nlp(sentence)
token_tag = [(word.text, word.pos_) for word in token]
输出的样子:
[('For', 'ADP'),
('instance', 'NOUN'),
(',', 'PUNCT'),
('consider', 'VERB'),
('one', 'NUM'),
('simple', 'ADJ'),
('phenomena', 'NOUN'),
...]
对于名词短语或块,我可以得到noun_chunks
这是词的一大块如下:
[nc for nc in token.noun_chunks] # [instance, one simple phenomena, an answer, ...]
我想知道是否有是一种基于noun_chunks
对POS标签进行聚类的方式,以便我得到输出为
[('For', 'ADP'),
('instance', 'NOUN'), # or NOUN_CHUNKS
(',', 'PUNCT'),
('one simple phenomena', 'NOUN_CHUNKS'),
...]