2
我正在遵循我在http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#subclassing-rv-discrete找到的一个代码示例,用于为正态分布的离散值实现一个随机数生成器。确切的例子(不奇怪)工作得很好,但如果我修改它只允许左或右尾的结果,那么围绕0的分布应该太低(零点零点应该包含更多的值)。我一定遇到了边界条件,但无法解决这个问题。我错过了什么吗?numpy生成离散概率分布
这是每个仓计数的随机数的结果:
np.bincount(rvs) [1082 2069 1833 1533 1199 837 644 376 218 111 55 20 12 7 2 2]
这是直方图:
from scipy import stats
np.random.seed(42)
def draw_discrete_gaussian(rng, tail='both'):
# number of integer support points of the distribution minus 1
npoints = rng if tail == 'both' else rng * 2
npointsh = npoints/2
npointsf = float(npoints)
# bounds for the truncated normal
nbound = 4
# actual bounds of truncated normal
normbound = (1+1/npointsf) * nbound
# integer grid
grid = np.arange(-npointsh, npointsh+2, 1)
# bin limits for the truncnorm
gridlimitsnorm = (grid-0.5)/npointsh * nbound
# used later in the analysis
gridlimits = grid - 0.5
grid = grid[:-1]
probs = np.diff(stats.truncnorm.cdf(gridlimitsnorm, -normbound, normbound))
gridint = grid
normdiscrete = stats.rv_discrete(values=(gridint, np.round(probs, decimals=7)), name='normdiscrete')
# print 'mean = %6.4f, variance = %6.4f, skew = %6.4f, kurtosis = %6.4f'% normdiscrete.stats(moments = 'mvsk')
rnd_val = normdiscrete.rvs()
if tail == 'both':
return rnd_val
if tail == 'left':
return -abs(rnd_val)
elif tail == 'right':
return abs(rnd_val)
rng = 15
tail = 'right'
rvs = [draw_discrete_gaussian(rng, tail=tail) for i in xrange(10000)]
if tail == 'both':
rng_min = rng/-2.0
rng_max = rng/2.0
elif tail == 'left':
rng_min = -rng
rng_max = 0
elif tail == 'right':
rng_min = 0
rng_max = rng
gridlimits = np.arange(rng_min-.5, rng_max+1.5, 1)
print gridlimits
f, l = np.histogram(rvs, bins=gridlimits)
# cheap way of creating histogram
import matplotlib.pyplot as plt
%matplotlib inline
bins, edges = f, l
left,right = edges[:-1],edges[1:]
X = np.array([left, right]).T.flatten()
Y = np.array([bins, bins]).T.flatten()
# print 'rvs', rvs
print 'np.bincount(rvs)', np.bincount(rvs)
plt.plot(X,Y)
plt.show()
综观图,在我看来,像滨0包含从-0.5到0.5之间的一切。如果是这样,那么它就是下一个垃圾箱的一半就不足为奇了。你不会从该垃圾箱的左半边产生结果。 – user2357112
@ user2357112:我可能是错的,但我认为这只是由于可视化(它围绕着bin号码,而实际上bin被限制在+0.5)。如果我做'gridlimits = np.arange(rng_min,rng_max + 2,1)',它是一样的图。 – orange
我也认为@ user235711是正确的。当你服用腹肌时,你需要结合Probs的负面和正面的箱子。检查从零开始的垃圾箱的长度与其他垃圾箱的长度相同。我只需要在右边或左边截取正确的截断法线,即开始或结束于零。 – user333700