2011-05-11 50 views
2

我有以下代码用于从输入文本文件中提取单词,并使用WordNet打印单词的同义词,定义和例句。它基于词性来将同义词从同义词中分离出来,即,作为动词的同义词和作为形容词的同义词被分别打印。打印词类以及单词的同义词

这个词大声疾呼的例子是1)flabbergast,boggle,碗上面是动词,2)傻眼,dumfounded,flabbergasted,惊愕,雷击,dumbstruck,dumbstricken是形容词。

如何打印与同义词一起的词性?我所提供的代码,我有这么远低于:


import nltk 
from nltk.corpus import wordnet as wn 
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') 
fp = open('sample.txt','r') 
data = fp.read() 
tokens= nltk.wordpunct_tokenize(data) 
text = nltk.Text(tokens) 
words = [w.lower() for w in text] 
for a in words: 
    print a 
syns = wn.synsets(a) 
for s in syns: 
    print 
    print "definition:" s.definition 
    print "synonyms:" 
    for l in s.lemmas: 
     print l.name 
    print "examples:" 
    for b in s.examples: 
     print b 
    print 

回答

1

看起来你搞砸了你的缩进:

for a in words: 
    print a 
syns = wn.synsets(a) 

好像syns = wn.synsets(a)应该是words在for循环中,因此您可以为每一个做到这一点一句话:

for w in words: 
    print w 
    syns = wn.synsets(w) 
    for s in syns: 
     print 
     print "definition:", s.definition 
     print "synonyms:" 
     for l in s.lemmas: 
      print l.name 
     print "examples:" 
     for b in s.examples: 
      print b 
    print 
0

引理有synset属性,它在其pos属性的演讲自己的一部分。所以,如果我们有一个外稃l,我们可以像这样访问spech其部分:

>>> l = Lemma('gladden.v.01.joy') 
>>> l.synset.pos 
'v' 

更一般地,我们可以扩展成一个圈这通过你的文件中读取。我使用with语句,因为一旦循环完成,它就会很好地关闭文件。

>>> with open('sample.txt') as f: 
...  raw = f.read() 
...  for sentence in nltk.sent_tokenize(raw): 
...   sentence = nltk.wordpunct_tokenize(sentence) 
...   for word in sentence: 
...    for synset in wn.synsets(word): 
...     for lemma in synset.lemmas: 
...      print lemma.name, lemma.synset.pos 
... 

如果你想确保你只用语音为您正在谈论这个词的同一部分选择引理,那么你就需要确定演讲的这个词的部分也:

>>> import nltk 
>>> from nltk.corpus import wordnet as wn 
>>> with open('sample.txt') as f: 
...  raw = f.read() 
...  for sentence in nltk.sent_tokenize(raw): 
...   sentence = nltk.pos_tag(nltk.wordpunct_tokenize(sentence)) 
...   for word, pos in sentence: 
...    print word, pos 

我会把这两个作为练习给读者使用。

+0

pos函数给我以下错误:lemma.synset.pos AttributeError:'function'对象没有属性'pos' – 2015-05-20 09:52:40

+0

感谢您的注释。自从我写这个答案以来,NLTK API已经发生了变化。我会找一些时间来更新这个答案。 – 2015-05-20 19:50:24

+0

你能告诉我可以给pos使用引理和synset的函数吗 – 2015-05-20 20:43:53