2012-01-12 65 views
3

我驾驶坚果试图从以下网址下载的CSV文件:蟒蛇下载文件:urllib2的ClientForm

有在网站4种形式,我设法设置正确的表格上的日期,然后我发布该表单,然后用正确的html获得http响应。但我想实际下载csv而不是响应的html。我认为我必须提交2个表格,首先是日期,以及csv选择后,但在第一个回复中,我没有得到任何形式的对话。

这里是我的代码:

#!/usr/bin/env python 
import csv 
from urllib2 import urlopen 
from ClientForm import ParseResponse 
import urllib2 

proxy = urllib2.ProxyHandler({'http': '172.26.10.100:8080'}) 
# proxy = urllib2.ProxyHandler({}) 
opener = urllib2.build_opener(proxy) 
urllib2.install_opener(opener) 

headers = { 
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/2010010' \ 
    '1 Firefox/4.0.1', 
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 
    'Accept-Language':'en-us,en;q=0.5', 
    'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.7'} 

# set the request 
url = "http://www.opcom.ro/rapoarte/raportPIPsiVolumTranzactionat.php?lang=en" 
request = urllib2.Request(url, None, headers) 

try: 
    response = urllib2.urlopen(request) 

except urllib2.HTTPError, response: 
    pass 

print response.geturl() 
print response.info() # headers 
# print response.read() # body 

# get forms from response 
forms = ParseResponse(response, backwards_compat=False) 
response.close() 

# print "###FORMS: " ,len(forms)  
# for i in range(len(forms)): 
    # print "@@@@@" 
    # print forms[i] 

form1 = forms[1] 

# setting a specific date in the form 
form1.set_value("7", kind="text", nr=0) 
form1.set_value("10", kind="text", nr=2) 
form1.set_value("2011", kind="text", nr=4) 
print form1 

# # # SEND THE FORM 
request2 = forms[1].click() # urllib2.Request object  BIEN 

try: 
    response2 = urllib2.urlopen(request2) 

except urllib2.HTTPError, response2: 
    pass 

print response2.geturl() 
print response2.info() # headers 
# print response2.read() # body 
with open('salida.txt', 'w') as f: 
    f.write(response2.read()) 

forms2 = ParseResponse(response2, backwards_compat=False) 
response2.close() 

print "###FORMS 2: " ,len(forms2) 

注意,第一形式(日期选择器)是阵列的形式[1]中的形式。表单[2]是CSV文件或XML的选择框。选择CSV的代码是:

# form2 = forms2[2] 


# # Select CSV file in selection control 
# form2.find_control("menu_sari").items[1].selected = True # check 

但我评论它,因为在回应后我没有得到这种形式。

任何帮助/反馈非常欢迎。

+0

你看过使用这种机械化的东西吗? http://pypi.python.org/pypi/mechanize – 2012-01-12 12:39:08

+0

我去了ClientForm网站,发现这个:“这个模块提供的功能现在是机械化的一部分,我不打算进一步独立版本的ClientForm”。尽管如此,我认为它可以用两种方法完成,但我做错了,因为它不检索CSV – user1145469 2012-01-12 14:15:35

回答

0

查看源代码可以看到,当你填写表单时,会调用一个php脚本来生成csv文件,结果是一个保存文件的对话框。

例如同时:

http://www.opcom.ro/rapoarte/raportPIPsiVolumTranzactionat.php?lang=en

如果您填写表格选择13/1/2012和CSV以下网址被称为:

http://www.opcom.ro/rapoarte/export_csv_raportPIPsiVolumTranzactionat.php?zi=13&luna=1&an=2012&limba=ro

以下适用于我:

>>> import urllib2 
>>> url = "http://www.opcom.ro/rapoarte/export_csv_raportPIPsiVolumTranzactionat.php?zi=13&luna=1&an=2012&limba=ro" 
>>> request = urllib2.Request(url) 
>>> response = urllib2.urlopen(request) 
>>> print response.read() 
"PIP si volum tranzactionat pentru ziua de livrare: 13/1/2012" 

"","Pret mediu [lei/MWh]","Volum [MWh]" 
"ROPEX_DAM_Base (1-24)","185.23","31226.488" 
"ROPEX_DAM_Peak (7-22)","239.68","22773.036" 
"ROPEX_DAM_Off_Peak (1-6) & (23-24)","76.32","8453.452" 

"Interval","Pret de Inchidere a Pietei [lei/MWh]","Volum Tranzactionat [MWh]" 
"1","29.99","876.148" 
"2","70.00","1057.729" 
"3","50.00","1058.868" 
"4","50.00","1044.700" 
"5","50.00","1061.574" 
"6","61.71","1015.513" 
"7","86.08","1070.586" 
"8","181.00","1187.392" 
"9","222.00","1434.829" 
"10","230.00","1515.633" 
"11","262.60","1539.495" 
"12","226.00","1538.931" 
"13","225.00","1559.273" 
"14","271.00","1515.113" 
"15","266.42","1513.220" 
"16","250.00","1534.506" 
"17","263.02","1481.099" 
"18","298.00","1351.114" 
"19","283.42","1336.266" 
"20","280.00","1398.646" 
"21","271.00","1450.183" 
"22","219.34","1346.750" 
"23","170.00","1219.772" 
"24","128.88","1119.148" 

>>> 

看看这是否有帮助。如果你发现你无法获得表单的工作,你可以随时按照他们的模式构建url。

+0

真棒!非常感谢 – user1145469 2012-01-13 11:55:26