2017-09-11 56 views
2

从一个JavaScript函数生成的链接下载PDF文件的URL是:site我无法从使用Python 3.6.0 +硒3.4.3

利用硒与Firefox 47.0.2二进制和Python 3.6.0,从这个页面我点击“Pesquisar”按钮,然后在下一页我填写表格中的日期范围(格式d/m/y)并再次点击新的“Pesquisar”按钮,然后我得到一个PDF列表文件,我想下载它们。

当我打印page_source时,可以看到生成的链接,但我不明白为什么selenium无法找到这些链接。

简化代码如下:

from selenium import webdriver 
from selenium.webdriver.support.ui import Select 
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.common.by import By 
from datetime import datetime, date, timedelta 
from calendar import monthrange 
import time 


driver = webdriver.Firefox(firefox_profile=profile, firefox_binary=binary, capabilities=capabilities) 
driver.maximize_window() 
wait = WebDriverWait(driver, 10) 

months = range(1, 13) 
limits = monthrange(2017, 8) 

#num_docs = limites[1]-limites[0] 

date_input_begin = '{num:0{width}}'.format(num=limits[0], width=2) + '08' + '2017' 
date_input_end = '{num:0{width}}'.format(num=limits[1], width=2) + '08' + '2017' 

today = datetime.now().date() 
date = today 

date = date - timedelta(24) 

driver.get("http://dje.trf2.jus.br/DJE/Paginas/Externas/inicial.aspx") 

driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrInicial_btnPesquisar").click() 

wait.until(EC.presence_of_element_located(
    (By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar"]'))) 

select1 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlAreaJudicial")) 
select1.select_by_index(3) 

select2 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlRegistrosPaginas")) 
select2.select_by_index(6) 

element_date_begin = driver.find_element_by_id(
    'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataInicial') 
element_date_begin.clear() 
element_date_begin.send_keys(date_input_begin) 

element_date_end = driver.find_element_by_id(
    'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataFinal') 
element_date_end.clear() 
element_date_end.send_keys(date_input_end) 

driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').submit() 

wait.until(EC.presence_of_element_located((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar'))) 
wait.until(EC.element_to_be_clickable((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar'))) 

time.sleep(5) 
driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').click() 

wait.until(EC.presence_of_element_located(
    (By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_lblNomeCaderno"]'))) 

driver.find_element_by_xpath(
    '//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData"]').click() 

但是,当我找的ID或XPATH的链接,我得到以下错误:

File "C:\Users\b2002032064079\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"//*[@id=\"ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData\"]"}

我在刮是个新手我非常感谢任何帮助!谢谢!

回答

1

首先:您正在使用哪种浏览器? 2:您的网站速度很慢。也许尝试给予更多的等待时间。 3:xpath是否正确?我认为问题是XPATH 尝试使用chrome上的XPath helper来检查。

+0

@biligung关于xpath,我已将代码的最后一行更改为:'driver.find_element_by_xpath('/ html/body/form/div [7]/div/div/div [1]/div [2 ]/div/div [2]/div/table/tbody/tr [2]/td [1]/a')。click()',其中xpath现在是从firebug获得的,您!关于等待时间,我仍然试图解决它的迭代下载。 – viniciusdoss

+0

如果它帮助大拇指:D如果你需要等待 - >有一些方法使用javascript <>来检查页面是否被重新加载,但我不知道下载。它由OS对话框处理,所以只需手动设置即可。 – biligunb