python使用nltk Sentiwordnet

我正在使用python NLTK对twitter数据进行情感分析。我需要一个包含+ ve和-ve极性的字典。我读了很多关于sentiwordnet的东西，但是当我将它用于我的项目时，它并没有给出有效和快速的结果。我想我没有正确使用它。任何人都可以告诉我正确的方式来使用它？下面是我做了到现在为止的步骤：python使用nltk Sentiwordnet

鸣叫
令牌的词性标注
传递每个标签sentinet

我使用NLTK包标记化和标记的标记化。见下面我的代码的一部分：

import nltk 
from nltk.stem import * 
from nltk.corpus import sentiwordnet as swn 

tokens=nltk.word_tokenize(row) #for tokenization, row is line of a file in which tweets are saved. 
tagged=nltk.pos_tag(tokens) #for POSTagging 

for i in range(0,len(tagged)): 
    if 'NN' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'n'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).pos_score() #positive score of a word 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).neg_score() #negative score of a word 
    elif 'VB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'v'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).pos_score() 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).neg_score() 
    elif 'JJ' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'a'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).pos_score() 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).neg_score() 
    elif 'RB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'r'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).pos_score() 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).neg_score()

最后我会计算有多少鸣叫是积极的，有多少鸣叫是否定的。我错在哪里？我应该如何使用它？是否还有其他类似的易于使用的字典？

来源

2015-11-27 jeny

我不完全明白你的问题是什么。速度？ – b3000

没有。我有大约4000条推文。通过使用sentiwordnet，我只能获得10个正面和18个负面推文，这当然不是正确的结果。而课程速度也是一个问题，但主要问题是效率。编码有没有错误？ – jeny

sentiwordnet的覆盖范围小于您从推文中获得的嘈杂输入，您必须将真实推文中的单词标准化为适合sentiwordnet，例如'你们 - 你们'等等。 – alvas

是的，还有其他词典可以使用。你可以在这里找到一个词库的小列表：http://sentiment.christopherpotts.net/lexicons.html#resources 看来，刘兵的意见词汇很容易使用。

除了链接到那些词汇，该网站是一个非常好的情绪分析教程。

来源

2015-12-23 11:28:45 nestoralvaro

python使用nltk Sentiwordnet

回答

相关问题