1

我正在使用selenium驱动程序并使用python脚本来执行此操作。这里是我的代码。我需要浏览网页中的每个链接及其子页面链接

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 

for i in list_links: 
    print url 

上述程序正确地给予了把尽可能

https://www.w3schools.com/ 
https://www.ubuntu.com/ 
None 

但是当我编译下面的代码:

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 

for i in list_links: 
    url=i.get_attribute('href') 
    print url 
    d.get(url) 

它浏览到第一个链接https://www.w3schools.com/ successfully.Then它说:

Traceback (most recent call last): 
File "web_nav.py", line 20, in <module> 
url=i.get_attribute('href') 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 141, in get_attribute 
resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name}) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 493, in _execute 
return self._parent.execute(command, params) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute 
self.error_handler.check_response(response) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response 
raise exception_class(message, screen, stacktrace) 
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document 
(Session info: chrome=59.0.3071.115) 
(Driver info: chromedriver=2.30.477691 
(6ee44a7247c639c0703f291d320bdf05c1531b57),platform=Linux 4.4.0-31- 
generic x86_64) 

我在这里使用Ubuntu 14.04,语言Python和我使用硒网络驱动程序

回答

1

首先获得所有的URL,然后导航到他们

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 
urls = []  
for i in list_links: 
    urls.append(i.get_attribute('href')) 
for url in urls: 
    d.get(url) 

你可以用函数

def get_link_urls(url,driver): 
    driver.get(url) 
    urls = [] 
    for link in d.find_elements_by_tag_name('a'): 
     urls.append(link.get_attribute('href')) 
    return urls 

urls = get_link_urls("http://localhost:8080") 
sub_urls = [] 
for url in urls: 
    sub_urls.extend(get_link_urls(url)) 
简化这个
+0

您保存了我的很多工作,谢谢您。但是,此处仅导航到第一页中的链接。是不是这样?有没有办法导航子页面链接到一个特定的深度.. – Kit

+0

例如:在这里首先我先导航https://www.w3schools.com/ ..我需要通过链接在这个页面内给定深度 – Kit

+0

我需要扩展这段代码,以便在导航时保存动态html页面。请帮助我解决这个问题 – Kit