5
在stackoverflow上已经有很多很好的资源,但我仍然有一个问题。我访问过这些来源:在python中浏览网站,抓取和发布
- how to submit query to .aspx page in python
- Submitting a post request to an aspx page
- Scrapping aspx webpage with Python using BeautifulSoup
- http://www.pythonforbeginners.com/cheatsheet/python-mechanize-cheat-sheet
我试图访问http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx并选择一个教区。我相信这会强制发布帖子,并允许我选择一年,再次发布帖子,并允许更多的选择。在上述来源之后,我用几种不同的方式编写了我的脚本,并且未能成功提交该网站以允许我输入一年。
我当前的代码
import urllib
from bs4 import BeautifulSoup
import mechanize
headers = [
('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('Origin', 'http://www.indiapost.gov.in'),
('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'),
('Content-Type', 'application/x-www-form-urlencoded'),
('Referer', 'http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx'),
('Accept-Encoding', 'gzip,deflate,sdch'),
('Accept-Language', 'en-US,en;q=0.8'),
]
br = mechanize.Browser()
br.addheaders = headers
url = 'http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx'
response = br.open(url)
# first HTTP request without form data
soup = BeautifulSoup(response)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})
formData = (
('__EVENTVALIDATION', eventvalidation[0]['value']),
('__VIEWSTATE', viewstate[0]['value']),
('__VIEWSTATEENCRYPTED',''),
)
try:
fout = open('C:\\GIS\\tmp.htm', 'w')
except:
print('Could not open output file\n')
fout.writelines(response.readlines())
fout.close()
我也尝试这种在外壳和我输入了什么加什么,我收到(修改,以减少对散装)可以发现http://pastebin.com/KAW5VtXp
反正我尝试更改教区下拉列表中的值,然后发布到网站站长登录页面。
我接近这个正确的方法吗?任何想法都会非常有帮助。
谢谢!