1
在阅读recent blog post关于泊松分布的应用程序后,我尝试使用Python的'scipy.stats'模块以及Excel/LibreOffice'POISSON'和'CHITEST'功能。Python SciPy chisquare测试从Excel和LibreOffice返回不同的p值
的文章中显示的预期值,我只是用:
import scipy.stats
for i in range(8):
print(scipy.stats.poisson.pmf(i, 2)*31)
这再现了博客中所示的表格 - 我也重新从内LibreOffice中,使用具有第一列A单元格A1,A2,...,A8中的值0至7以及在列B的前8行中重复的简单公式'= POISSON(A1,2,0)* 31'。
迄今为止好 - 现在对于卡方p检验值:
在LibreOffice下,我只是写下了在单元格C1-C8中观察到的值,并且使用'= CHITEST(C1:C8,B1:B8)'重现该文章报道的0.18的p值。然而,在scipy.stats,我似乎无法重现此值:
import numpy as np
import scipy.stats
obs = [4, 10, 7, 5, 4, 0, 0, 1]
exp = [scipy.stats.poisson.pmf(i, 2)*31 for i in range(8)]
# we only estimated one variable (the rate of 2 killings per year via 62/31)
# so dof will be N-1-estimates
estimates = 1
print(scipy.stats.chisquare(np.array(obs), np.array(exp), ddof=len(obs)-1-estimates))
# (10.112318133864241, 0.0014728159441179519)
# the p-test value reported is 0.00147, not 0.18...
#
# Maybe I need to aggregate categories with observations less than 5
# (as suggested in many textbooks of statistics for chi-squared tests)?
observedAggregateLessThan5 = [14, 7, 5, 5]
expectedAggregateLessThan5 = [exp[0]+exp[1], exp[2], exp[3], sum(exp[4:])]
print(scipy.stats.chisquare(np.array(observedAggregateLessThan5), np.array(expectedAggregateLessThan5), ddof=len(observedAggregateLessThan5)-1-estimates))
# (0.53561749342466913, 0.46425467595930309)
# Again the p-test value computed is not 0.18, it is 0.46...
我做错了什么?