特定词的NLTK搭配

我知道如何使用NLTK获取bigram和trigram搭配，并将它们应用于我自己的语料库。代码如下。特定词的NLTK搭配

但我不确定（1）如何获得特定单词的搭配？（2）NLTK是否具有基于对数似然比的搭配度量？

import nltk 
from nltk.collocations import * 
from nltk.tokenize import word_tokenize 

text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence" 

trigram_measures = nltk.collocations.TrigramAssocMeasures() 
finder = TrigramCollocationFinder.from_words(word_tokenize(text)) 

for i in finder.score_ngrams(trigram_measures.pmi): 
    print i

来源

2014-01-16 Sabba

试试这个代码：

import nltk 
from nltk.collocations import * 
bigram_measures = nltk.collocations.BigramAssocMeasures() 
trigram_measures = nltk.collocations.TrigramAssocMeasures() 

# Ngrams with 'creature' as a member 
creature_filter = lambda *w: 'creature' not in w 


## Bigrams 
finder = BigramCollocationFinder.from_words(
    nltk.corpus.genesis.words('english-web.txt')) 
# only bigrams that appear 3+ times 
finder.apply_freq_filter(3) 
# only bigrams that contain 'creature' 
finder.apply_ngram_filter(creature_filter) 
# return the 10 n-grams with the highest PMI 
print finder.nbest(bigram_measures.likelihood_ratio, 10) 


## Trigrams 
finder = TrigramCollocationFinder.from_words(
    nltk.corpus.genesis.words('english-web.txt')) 
# only trigrams that appear 3+ times 
finder.apply_freq_filter(3) 
# only trigrams that contain 'creature' 
finder.apply_ngram_filter(creature_filter) 
# return the 10 n-grams with the highest PMI 
print finder.nbest(trigram_measures.likelihood_ratio, 10)

它使用的可能性的措施，并筛选出不包含这个词“生物”

的n-gram

来源

2014-01-17 11:54:31 bogs

至于问题2，是的！ NLTK在其关联度量中具有似然比。第一个问题仍然没有答案！

http://nltk.org/api/nltk.metrics.html?highlight=likelihood_ratio#nltk.metrics.association.NgramAssocMeasures.likelihood_ratio

来源

2014-01-17 03:57:58 Sabba

问题1 - 尝试：

target_word = "electronic" # your choice of word 
finder.apply_ngram_filter(lambda w1, w2, w3: target_word not in (w1, w2, w3)) 
for i in finder.score_ngrams(trigram_measures.likelihood_ratio): 
print i

的想法是过滤掉你不想要的。这种方法通常用于过滤ngram中特定部分的单词，并且可以根据您的内容调整它。

来源

2014-01-17 04:22:01 dmvianna

特定词的NLTK搭配

回答

相关问题