2012-02-08 74 views
2

全部,从Wunderground解析HTML数据

我试图从Wunderground下载天气数据历史记录。我的问题是我需要完整的METAR信息。

下面是我想要下载的示例:CSV with full METAR

由于我想下载全年的小时数据,我需要编写脚本。但无论我尝试了什么(bash with wget或python),我仍然无法通过脚本获得带有完整METAR的页面。

这里是我的脚本的例子:

import urllib2 
from BeautifulSoup import BeautifulSoup 
url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1" 
page = urllib2.urlopen(url) 
dailyData = page.read()        
print dailyData 

我所拥有的是一样的东西:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,200,2011-01-01 05:54:00<br /> 
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,200,2011-01-01 06:54:00<br /> 
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,200,2011-01-01 07:54:00<br /> 
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,150,2011-01-01 08:54:00<br /> 

通过Web browswer,这是我所得到的 - 注意,开始新的一列与METAR:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00 
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00 
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,METAR KBUF 010754Z 20007KT 10SM BKN050 BKN130 10/07 A2994 RMK AO2 SLP140 T01000067,200,2011-01-01 07:54:00 
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,METAR KBUF 010854Z 15005KT 10SM SCT050 SCT130 11/07 A2992 RMK AO2 SLP134 T01060067 58000,150,2011-01-01 08:54:00 

任何解决方案,这将不胜感激。谢谢!

+0

您所提供的链接不给我METAR在Firefox - 也许你没有使用你认为你的链接? – jjlin 2012-02-08 21:41:19

+0

@jjlin:它给铬'METAR'。 – RanRag 2012-02-08 21:46:10

回答

3

Browsing the wunderunderground,我找到了"Show full METARS"的链接。点击此处后,将浏览器指向link you posted"Comma Delimited File" link即可显示METAR数据。它似乎设置了一些cookie。例如,page.info()显示, “偏好设置” 包括 “SHOWMETAR:1”:

Set-Cookie: Prefs=FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|SHOWMETAR:1|PHOTOTHUMBS:50|HISTICAO:KBUF*NULL|; path=/; expires=Fri, 01-Jan-2020 00:00:00 GMT; domain=.wunderground.com 

import urllib2 
import cookielib 

cookieJar = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar)) 

setmetar = 'http://www.wunderground.com/cgi-bin/findweather/getForecast?setpref=SHOWMETAR&value=1' 
request = urllib2.Request(setmetar) 
response = opener.open(request) 

url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1" 
request = urllib2.Request(url) 
page = opener.open(request) 
# print(page.info()) 
dailyData = page.read()        
print dailyData 

产量

TimeEST,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,FullMetar,WindDirDegrees,DateUTC<br /> 
12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00<br /> 
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00<br /> 
+0

Yiihaa!这个工程...很多! – ery 2012-02-08 22:59:17

0

当我通过浏览器访问该URL时,我看到的数据与您包含的第一个示例相同。纵观Wunderground网站,看起来有一种方法可以注册开发者/ API帐户 - 如果您已经这么做了,并且在检索数据时已登录,那么这种差异可能是由于可用的额外信息注册用户。

如果您需要验证以获取完整数据,那么值得您花时间研究一下使用mechanize来帮助您管理cookie。

否则,我怀疑您使用的网址存在差异 - 扩展后的数据可能使用其他参数指定。

+0

谢谢,答案是cookie,正如其他答案指出的那样。 – ery 2012-02-08 23:00:13