2009-07-04 111 views
1

我使用的巴士公司运行一个糟糕的网站(Hebrew,English),它使一个简单的“从今天的A到B时间表”查询恶梦。我怀疑他们正试图鼓励使用昂贵的SMS查询系统。使用python自动按下“提交”按钮

我试图从网站收获整个时间表,通过提交查询每个可能的点到每个可能的点,这将总计约10K查询。查询结果出现在一个弹出窗口中。我对网络编程颇为陌生,但熟悉python的基本方面。

  1. 什么是最优雅的方式来解析页面,从下拉菜单中选择一个值,并按下“提交”使用脚本?
  2. 如何让程序将新弹出窗口的内容作为输入?

谢谢!

回答

10

Twill是一种简单的Web浏览脚本语言。它碰巧运动一个python api

twill is essentially a thin shell around the mechanize package. All twill commands are implemented in the commands.py file, and pyparsing does the work of parsing the input and converting it into Python commands (see parse.py). Interactive shell work and readline support is implemented via the cmd module (from the standard Python library).

“压” 的一个例子从上面的链接文档提交:

from twill.commands import go, showforms, formclear, fv, submit 

go('http://issola.caltech.edu/~t/qwsgi/qwsgi-demo.cgi/') 
go('./widgets') 
showforms() 

formclear('1') 
fv("1", "name", "test") 
fv("1", "password", "testpass") 
fv("1", "confirm", "yes") 
showforms() 

submit('0') 
+0

由于错误:我需要使用submit()not submit('0'):HiddenControl实例没有属性'_click'。请参阅:lists.idyll.org/pipermail/twill/2006-August/000526.html – user391339 2014-09-11 07:59:58

10

我会建议你使用mechanize。下面是从他们的网页的代码片段展示了如何提交一个表单:


import re 
from mechanize import Browser 

br = Browser() 
br.open("http://www.example.com/") 
# follow second link with element text matching regular expression 
response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) 
assert br.viewing_html() 
print br.title() 
print response1.geturl() 
print response1.info() # headers 
print response1.read() # body 
response1.close() # (shown for clarity; in fact Browser does this for you) 

br.select_form(name="order") 
# Browser passes through unknown attributes (including methods) 
# to the selected HTMLForm (from ClientForm). 
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__) 
response2 = br.submit() # submit current form 

# print currently selected form (don't call .submit() on this, use br.submit()) 
print br.form 

7

你很少想实际“按提交按钮”,而不是让GET或POST请求直接处理程序资源。查看表单所在的HTML,查看提交给哪个URL的参数,以及GET或POST方法。你可以很容易地用urllib(2)形成这些请求。

+1

机械化软件包可以帮助您避免“......查看提交什么参数...”这些无聊的细节。斜纹机械化并提供更高级别的抽象。 – gimel 2009-07-05 17:40:37