2016-11-18 111 views
0

我有这个数据如何绘制一个柱状图,其中柱高是Python中柱宽的函数?

[-152, -132, -132, -128, -122, -121, -120, -113, -112, -108, 
-107, -107, -106, -106, -106, -105, -101, -101, -99, -89, -87, 
-86, -83, -83, -80, -80, -79, -74, -74, -74, -71, -71, -69, 
-67, -67, -65, -62, -61, -60, -60, -59, -55, -54, -54, -52, 
-50, -49, -48, -48, -47, -44, -43, -38, -37, -35, -34, -34, 
-29, -27, -27, -26, -24, -24, -19, -19, -19, -19, -18, -16, 
-16, -16, -15, -14, -14, -12, -12, -12, -4, -1, 0, 0, 1, 2, 7, 
14, 14, 14, 14, 18, 18, 19, 24, 29, 29, 41, 45, 51, 72, 150, 155] 

我想通过使用直方图与这些绘制它:

[-160,-110,-90,-70,-40,-10,20,50,80,160] 

我用这个代码为:

import matplotlib.pyplot as plt 
... 
plt.hist(data, bins) 
plt.show() 

但是这个图的问题是,酒吧的高度不是根据酒吧的宽度,因为频率应该象征酒吧(见this page)的面积。那么我怎么能绘制这种类型的直方图呢? 在此先感谢。

+1

一般直方图不具有约束杆的区域是频率的量度。很多时候,酒吧的高度被用作频率测量。 matplotlib的hist函数完成后者。所以你不能使用该功能。无论如何,将数据分析与可视化分开是一个好主意。因此,首先通过例如计算直方图。使用['numpy.histogram'](https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html),然后绘制它,例如通过['matplotlib.pyplot.hist()'](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist) – ImportanceOfBeingErnest

+1

我认为这个问题是一个好的开始:http://stackoverflow.com/questions/17429669/how-to-plot-a-histogram-with-unequal-widths-without-computing-it-from-raw-data –

回答

0

谢谢Nikos Tavoularisthis post

我的解决办法代码:

import requests 
from bs4 import BeautifulSoup 
import re 
import matplotlib.pyplot as plt 
import numpy as np 

regex = r"((-?\d+(\s?,\s?)?)+)\n" 
page = requests.get('http://www.stat.berkeley.edu/~stark/SticiGui/Text/histograms.htm') 
soup = BeautifulSoup(page.text, 'lxml') 
# La data se halla dentro de los scripts y no dentro de la etiqueta html TABLE 
scripts = soup.find_all('script') 
target = scripts[23].string 
hits = re.findall(regex, target, flags=re.MULTILINE) 
data = [] 
if hits: 
    for val, _, _ in hits: 
     data.extend([int(x) for x in re.findall(r"-?\d+", val)]) 
print(sorted(data)) 
print('Length of data:', len(data), "\n") 

# Intervals 
bins = np.array([-160, -110, -90, -70, -40, -10, 20, 50, 80, 160]) 

# calculating histogram 
widths = bins[1:] - bins[:-1] 
freqs = np.histogram(data, bins)[0] 
heights = freqs/widths 
mainlabel = 'The deviations of the 100 measurements from a ' \ 
       'base value of {}, times {}'.format(r'$9.792838\ ^m/s^2$', r'$10^8$') 
hlabel = 'Data gravity' 

# plot with various axes scales 
plt.close('all') 
fig = plt.figure() 
plt.suptitle(mainlabel, fontsize=16) 
# My screen resolution is: 1920x1080 
plt.get_current_fig_manager().window.wm_geometry("900x1100+1050+0") 

# Bar chart 
ax1 = plt.subplot(211) # 2-rows, 1-column, position-1 
barlist = plt.bar(bins[:-1], heights, width=widths, facecolor='yellow', alpha=0.7, edgecolor='gray') 
plt.title('Bar chart') 
plt.xlabel(hlabel, labelpad=30) 
plt.ylabel('Heights') 
plt.xticks(bins, fontsize=10) 
# Change the colors of bars at the edges... 
twentyfifth, seventyfifth = np.percentile(data, [25, 75]) 
for patch, rightside, leftside in zip(barlist, bins[1:], bins[:-1]): 
    if rightside < twentyfifth: 
     patch.set_facecolor('green') 
    elif leftside > seventyfifth: 
     patch.set_facecolor('red') 
# code from: https://stackoverflow.com/questions/6352740/matplotlib-label-each-bin 
# Label the raw counts and the percentages below the x-axis... 
bin_centers = 0.5 * np.diff(bins) + bins[:-1] 
for count, x in zip(freqs, bin_centers): 
    # Label the raw counts 
    ax1.annotate(str(count), xy=(x, 0), xycoords=('data', 'axes fraction'), 
        xytext=(0, -18), textcoords='offset points', va='top', ha='center', fontsize=9) 

    # Label the percentages 
    percent = '%0.0f%%' % (100 * float(count)/freqs.sum()) 
    ax1.annotate(percent, xy=(x, 0), xycoords=('data', 'axes fraction'), 
        xytext=(0, -28), textcoords='offset points', va='top', ha='center', fontsize=9) 
plt.grid(True) 

# Histogram Plot 
ax2 = plt.subplot(223) # 2-rows, 2-column, position-3 
plt.hist(data, bins, alpha=0.5) 
plt.title('Histogram') 
plt.xlabel(hlabel) 
plt.ylabel('Frequency') 
plt.grid(True) 

# Histogram Plot 
ax3 = plt.subplot(224) # 2-rows, 2-column, position-4 
plt.hist(data, bins, alpha=0.5, normed=True, facecolor='g') 
plt.title('Histogram (normed)') 
plt.xlabel(hlabel) 
plt.ylabel('???') 
plt.grid(True) 

plt.tight_layout(pad=1.5, w_pad=0, h_pad=0) 
plt.show() 

enter image description here

1

docstring

范:布尔,可选

如果真,返回元组的第一个元素是归一化以形成概率密度的计数 ,即,n /(LEN(X)`DBIN),即,在 直方图的积分将总结为1。如果堆叠也是如此, 的直方图的总和归一化为1

默认值为False

plt.hist(data, bins=bins, normed=True) 

enter image description here

+0

好的,图形是正确的,但在Y轴的值,为什么? –

+0

@ΟυιλιαμΑκκευα标签轴被广泛认为是很好的做法,但是如果你愿意,你可以摆脱标签。你需要帮助吗? – Goyo