2017-08-30 21 views
1

我试图取消this website,但有一些表单需要填写。无法在html代码中找到表单 - 使用Python和硒进行网页搜索

主要目的是填写这5个表格(一个出现后选择另一个)并通过按钮“Consultar”下载数据。

此表格用javascript编码,我无法在页面的html代码中找到它们。当我通过谷歌浏览器检查框架时,我找到了表单ID,但代码没有找到它们。

我只是我的代码原型。如果不知道我能做些什么来找到这些表格,我不能前进。

from selenium import webdriver 
from bs4 import BeautifulSoup 
import time 
import os 

#Variables 

url = 'http://www.anbima.com.br/pt_br/informar/sistema-reune.htm' 
path_phantom = 'C:\\Users\\TBMEPYG\\AppData\\Local\\Continuum\\Anaconda3\\Lib\\site-packages\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' 

#Processing 

driver = webdriver.PhantomJS(executable_path= path_phantom) 

driver.get(url) 

data = driver.find_element_by_id('data_ref') 

data.send_keys("21/08/2017") 

driver.quit() 

编辑:

我的代码更新到这一点:

from selenium import webdriver 

    path_phantom = 'C:\\Users\\TBMEPYG\\AppData\\Local\\Continuum\\Anaconda3\\Lib\\site-packages\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' 
    driver = webdriver.PhantomJS(executable_path= path_phantom) 
    driver.get('http://www.anbima.com.br/reune/reune.asp') 


    driver.switch_to.frame(driver.find_element_by_xpath('//iframe[@class="full"]')) 
    data = driver.find_element_by_name('Dt_Ref') 
    data.clear() 
    data.send_keys('21/08/ 

而且我得到了这个错误:

CD: C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3 
Current directory: C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3 
python "C:\Users\TBMEPYG\Desktop\vamo.py" 
Process started >>> 
Traceback (most recent call last): 
    File "C:\Users\TBMEPYG\Desktop\vamo.py", line 8, in <module> 
    data = driver.find_element_by_name('Dt_Ref') 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 426, in find_element_by_name 
    return self.find_element(by=By.NAME, value=name) 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 832, in find_element 
    'value': value})['value'] 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 297, in execute 
    self.error_handler.check_response(response) 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response 
    raise exception_class(message, screen, stacktrace) 
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with name 'Dt_Ref'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"89","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:62040","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"name\", \"value\": \"Dt_Ref\", \"sessionId\": \"bdd3fc70-8dd0-11e7-aeb1-85b8cfbe0d1c\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/bdd3fc70-8dd0-11e7-aeb1-85b8cfbe0d1c/element"}} 
Screenshot: available via screen 

EDIT2:

另一种可能性是使用里面的链接该电源页http://www.anbima.com.br/reune/reune.asp

当我改变了代码到这一点,我已经得到了另一个错误

from selenium import webdriver 

path_phantom = 'C:\\Users\\TBMEPYG\\AppData\\Local\\Continuum\\Anaconda3\\Lib\\site-packages\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' 
driver = webdriver.PhantomJS(executable_path= path_phantom) 
driver.get('http://www.anbima.com.br/reune/reune.asp') 


data = driver.find_element_by_name('Dt_Ref') 
data.clear() 
data.send_keys('21/08/2017') 

错误:

Traceback (most recent call last): 
    File "C:\Users\TBMEPYG\Desktop\vamo.py", line 9, in <module> 
    data = driver.find_element_by_name('Dt_Ref') 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 426, in find_element_by_name 
    return self.find_element(by=By.NAME, value=name) 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 832, in find_element 
    'value': value})['value'] 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 297, in execute 
    self.error_handler.check_response(response) 
    File "C:\Users\TBMEPYG\AppData\Local\Continuum\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response 
    raise exception_class(message, screen, stacktrace) 
selenium.common.exceptions.WebDriverException: Message: {"request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"89","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:61820","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"name\", \"value\": \"Dt_Ref\", \"sessionId\": \"e61dd170-8dcf-11e7-a019-41573671066b\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/e61dd170-8dcf-11e7-a019-41573671066b/element"}} 
Screenshot: available via screen 
+0

这不是表格,它是表格 – Hackerman

+0

@Hackerman关于如何与此表互动的任何想法?检查元素使我相信它们都是表单,但我真的是javascript界的新手。 –

+0

试一下,'data = driver.find_element_by_name('Dt_Ref')' – Hackerman

回答

2

为了能够处理元素中form你需要切换至iframe第一张:

driver.switch_to.frame(driver.find_element_by_xpath('//iframe[@class="full"]')) 
data = driver.find_element_by_name('Dt_Ref') 
data.clear() 
data.send_keys('21/08/2017') 
+0

我相信我已经通过另一种方式解决了这一步。我找到了“第二页”http://www.anbima.com.br/reune/reune.asp,但仍不能报废Dt_Ref元素。当我检查这个链接的HTML,所有的信息都在那里,但我不能废弃它。之前显示的错误是因为驱动程序找不到该元素,但现在这是错误:selenium.common.exceptions.WebDriverExceptio –

+0

我的代码对我来说都在两个页面上工作(第二个不切换)。你能分享完整的例外日志吗? – Andersson

+0

当然。我编辑了这个问题,因为错误太长而无法发布评论。 –

-1

我想哟你搞砸了单引号和双引号。正确的代码会是─

driver.get("http://www.anbima.com.br/reune/reune.asp") 
data = driver.find_element_by_name("Dt_Ref") 
data.clear() 
data.send_keys("21/08/2017") 
+0

它如何解决这个问题?在Python中,不管你使用单引号还是双引号 – Andersson

+0

这在代码中没有任何区别。 –

0

只是为了更新:

问题没有得到解决,但我知道是什么。

这是发生在phantomJS中的一个错误。因此,如果您遇到同样的问题,请尝试使用Chrome或Firefox。

感谢您的答案。

+0

我想说你的问题已经解决了。您应该将Andersson的答案标记为可接受的解决方案,因为他帮助您为您进行错误测试并尝试他提供的无头链接链接。 – Tetora