2016-01-05 157 views
1

如果我有一个字符串,如这样的:如何打印出标签蟒蛇

text = "They refuse to permit us." 

txt = nltk.word_tokenize(text) 

有了这个,如果我打印的POS标签; nltk.pos_tag(txt)我得到

[( '他们', 'PRP'),( '拒绝', 'VBP'),( '到', 'TO'),( '许可证', 'VB'), ( '我们', 'PRP')]

我怎么能只打印出这一点:

[ 'PRP', 'VBP', 'TO', 'VB', 'PRP' ]

回答

1

你得到了一个元组列表,你应该遍历它得到每个元素的第二个元素元组。

>>> tagged = nltk.pos_tag(txt) 
>>> tags = [ e[1] for e in tagged] 
>>> tags 
['PRP', 'VBP', 'TO', 'VB', 'PRP'] 
1

看看Unpacking a list/tuple of pairs into two lists/tuples

>>> from nltk import pos_tag, word_tokenize 
>>> text = "They refuse to permit us." 
>>> tagged_text = pos_tag(word_tokenize(text)) 
>>> tokens, pos = zip(*tagged_text) 
>>> pos 
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.') 

可能在某些时候你会发现POS恶搞是缓慢的,你需要做到这一点(见Slow performance of POS tagging. Can I do some kind of pre-warming?):

>>> from nltk import pos_tag, word_tokenize 
>>> from nltk.tag import PerceptronTagger 
>>> tagger = PerceptronTagger() 
>>> text = "They refuse to permit us." 
>>> tagged_text = tagger.tag(word_tokenize(text)) 
>>> tokens, pos = zip(*tagged_text) 
>>> pos 
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.') 
0

你可以迭代像 -

print [x[1] for x in nltk.pos_tag(txt)]