NLTK FreqDist，绘制标准化计数？

在NLTK，你可以很容易地计算的话计数文本，比方说，通过做NLTK FreqDist，绘制标准化计数？

from nltk.probability import FreqDist 
fd = FreqDist([word for word in text.split()])

，其中文本是一个字符串。现在，你可以绘制分布

fd.plot()

，这将让你与计数每个字一个很好的线图。在docs中没有提到绘制实际频率的方法，您可以在fd.freq(x)中看到。

绘制标准化计数的任何直接方法，不需要将数据转化为其他数据结构，分别标准化和绘图？

来源

2016-07-27 mar tin

您可以更新FD [文字]与FD [文字] /总

from nltk.probability import FreqDist 

text = "This is an example . This is test . example is for freq dist ." 
fd = FreqDist([word for word in text.split()]) 

total = fd.N() 
for word in fd: 
    fd[word] /= float(total) 

fd.plot()

注意：您将失去原始FreqDist值。

来源

2016-07-31 05:36:27 RAVI

请原谅缺少文件。在nltk,FreqDist为您提供文本中的原始计数（即单词的频率），但ProbDist为您提供给定文本的单词的概率。

欲了解更多信息，你必须做一些读码：https://github.com/nltk/nltk/blob/develop/nltk/probability.py

的具体线路是做正常化自带形式https://github.com/nltk/nltk/blob/develop/nltk/probability.py#L598

因此，要获得一个标准化ProbDist，你可以做到以下几点：

>>> from nltk.corpus import brown 
>>> from nltk.probability import FreqDist 
>>> from nltk.probability import DictionaryProbDist 
>>> brown_freqdist = FreqDist(brown.words()) 
# Cast the frequency distribution into probabilities 
>>> brown_probdist = DictionaryProbDist(brown_freqdist) 
# Something strange in NLTK to note though 
# When asking for probabilities in a ProbDist without 
# normalization, it looks it returns the count instead... 
>>> brown_freqdist['said'] 
1943 
>>> brown_probdist.prob('said') 
1943 
>>> brown_probdist.logprob('said') 
10.924070185585345 
>>> brown_probdist = DictionaryProbDist(brown_freqdist, normalize=True) 
>>> brown_probdist.logprob('said') 
-9.223104921442907 
>>> brown_probdist.prob('said') 
0.0016732805599763002

来源

2016-07-28 04:09:47 alvas

谢谢。太糟糕了，它没有plot（）方法来显示FreqDist所做的一个绘图。另外，FreqDist已经有了一个'freq'方法，可以进行标准化，但这并不能解决我直接从对象绘图的问题。 –

绘制概率可能没有意义，在这种情况下，您的x轴和y轴是什么？ – alvas

而不是计数我想要发生的频率，就这些。有意义的是，我想知道语料库中单词的份额是多少。我明白语言学中的“频率”这个词是用来表示计数的，但我想这个比例。 –

NLTK FreqDist，绘制标准化计数？

回答

相关问题