2014-03-06 77 views
5

在stackoverflow上已经有很多很好的资源,但我仍然有一个问题。我访问过这些来源:在python中浏览网站,抓取和发布

我试图访问http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx并选择一个教区。我相信这会强制发布帖子,并允许我选择一年,再次发布帖子,并允许更多的选择。在上述来源之后,我用几种不同的方式编写了我的脚本,并且未能成功提交该网站以允许我输入一年。

我当前的代码

import urllib 
from bs4 import BeautifulSoup 
import mechanize 

headers = [ 
    ('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), 
    ('Origin', 'http://www.indiapost.gov.in'), 
    ('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'), 
    ('Content-Type', 'application/x-www-form-urlencoded'), 
    ('Referer', 'http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx'), 
    ('Accept-Encoding', 'gzip,deflate,sdch'), 
    ('Accept-Language', 'en-US,en;q=0.8'), 
] 

br = mechanize.Browser() 
br.addheaders = headers 

url = 'http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx' 

response = br.open(url) 
# first HTTP request without form data 
soup = BeautifulSoup(response) 
# parse and retrieve two vital form values 
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"}) 
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"}) 

formData = (
    ('__EVENTVALIDATION', eventvalidation[0]['value']), 
    ('__VIEWSTATE', viewstate[0]['value']), 
    ('__VIEWSTATEENCRYPTED',''), 

) 



try: 
    fout = open('C:\\GIS\\tmp.htm', 'w') 
except: 
    print('Could not open output file\n') 

fout.writelines(response.readlines()) 
fout.close() 

我也尝试这种在外壳和我输入了什么加什么,我收到(修改,以减少对散装)可以发现http://pastebin.com/KAW5VtXp

反正我尝试更改教区下拉列表中的值,然后发布到网站站长登录页面。

我接近这个正确的方法吗?任何想法都会非常有帮助。

谢谢!

回答

3

我结束了使用硒。

from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 

driver = webdriver.Firefox() 
driver.get("http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx") 
elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$ddParish") 
elem.send_keys("TERREBONNE PARISH") 
elem.send_keys(Keys.RETURN) 

elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$ddYear") 
elem.send_keys("2013") 
elem.send_keys(Keys.RETURN) 

elem = driver.find_element_by_id("ctl00_ContentPlaceHolderMain_rbSearchField_1") 
elem.click() 

APN = 'APN # here' 
elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$txtSearch") 
elem.send_keys(APN) 
elem.send_keys(Keys.RETURN) 

# Access the PDF 
elem = driver.find_element_by_link_text('Generate Report') 
elem.click() 
elements = driver.find_elements_by_tag_name('a') 
elements[1].click()