2014-07-03 43 views
1

我一直在试图通过机械化来刮擦国会财务披露reports;表单提交成功,但我找不到任何搜索结果。我的脚本如下:用Python抓取aspx机械化 - 获取搜索结果

br = Browser() 
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
br.open('http://clerk.house.gov/public_disc/financial-search.aspx') 
br.select_form(name='aspnetForm') 
br.set_all_readonly(False) 
br['filing_year'] = ['2008'] 

response = br.submit(name='search_btn') 
html = response.read() 

我是新来的刮,并希望对此有任何更正/建议。谢谢!

+0

你还好吗使用'selenium'的替代解决方案? – alecxe

+0

@alecxe当然,如果这是首选的方法 – sirallen

回答

0

这是一个替代解决方案,在selenium工具的帮助下涉及真正的浏览器。

from selenium import webdriver 
from selenium.webdriver.support.select import Select 

# initialize webdriver instance and visit url 
url = "http://clerk.house.gov/public_disc/financial-search.aspx" 
browser = webdriver.Firefox() 
browser.get(url) 

# find select tag and select 2008 
select = Select(browser.find_element_by_id('ctl00_cphMain_txbFiling_year')) 
select.select_by_value('2008') 

# find "search" button and click it 
button = browser.find_element_by_id('ctl00_cphMain_btnSearch') 
button.click() 

# display results 
table = browser.find_element_by_id('search_results') 
for row in table.find_elements_by_tag_name('tr')[1:-1]: 
    print [cell.text for cell in row.find_elements_by_tag_name('td')] 

# close the browser 
browser.close() 

打印:

[u'ABERCROMBIE, HON.NEIL', u'HI01', u'2008', u'FD Amendment'] 
[u'ABERCROMBIE, HON.NEIL', u'HI01', u'2008', u'FD Original'] 
[u'ACKERMAN, HON.GARY L.', u'NY05', u'2008', u'FD Amendment'] 
[u'ACKERMAN, HON.GARY L.', u'NY05', u'2008', u'FD Amendment'] 
...