我需要浏览网页中的每个链接及其子页面链接

我正在使用selenium驱动程序并使用python脚本来执行此操作。这里是我的代码。我需要浏览网页中的每个链接及其子页面链接

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 

for i in list_links: 
    print url

上述程序正确地给予了把尽可能

https://www.w3schools.com/ 
https://www.ubuntu.com/ 
None

但是当我编译下面的代码：

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 

for i in list_links: 
    url=i.get_attribute('href') 
    print url 
    d.get(url)

它浏览到第一个链接https://www.w3schools.com/ successfully.Then它说：

Traceback (most recent call last): 
File "web_nav.py", line 20, in <module> 
url=i.get_attribute('href') 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 141, in get_attribute 
resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name}) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 493, in _execute 
return self._parent.execute(command, params) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute 
self.error_handler.check_response(response) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response 
raise exception_class(message, screen, stacktrace) 
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document 
(Session info: chrome=59.0.3071.115) 
(Driver info: chromedriver=2.30.477691 
(6ee44a7247c639c0703f291d320bdf05c1531b57),platform=Linux 4.4.0-31- 
generic x86_64)

我在这里使用Ubuntu 14.04，语言Python和我使用硒网络驱动程序

来源

2017-07-08 Kit

首先获得所有的URL，然后导航到他们

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 
urls = []  
for i in list_links: 
    urls.append(i.get_attribute('href')) 
for url in urls: 
    d.get(url)

你可以用函数

def get_link_urls(url,driver): 
    driver.get(url) 
    urls = [] 
    for link in d.find_elements_by_tag_name('a'): 
     urls.append(link.get_attribute('href')) 
    return urls 

urls = get_link_urls("http://localhost:8080") 
sub_urls = [] 
for url in urls: 
    sub_urls.extend(get_link_urls(url))

简化这个

来源

2017-07-08 05:35:09

您保存了我的很多工作，谢谢您。但是，此处仅导航到第一页中的链接。是不是这样？有没有办法导航子页面链接到一个特定的深度.. – Kit

例如：在这里首先我先导航https://www.w3schools.com/ ..我需要通过链接在这个页面内给定深度 – Kit

我需要扩展这段代码，以便在导航时保存动态html页面。请帮助我解决这个问题 – Kit

我需要浏览网页中的每个链接及其子页面链接

回答

相关问题