2017-08-19 31 views
0

我试图用Python和selenium下载这张website的PDF幻灯片,但我认为载入幻灯片的链接仅在加载脚本后出现。我试图等待JavaScript加载,但它仍然没有找到任何东西。有任何想法吗?Selenium没有得到PDF链接的HTML

import os, sys, time, random 
import requests 
from selenium import webdriver 
from bs4 import BeautifulSoup 

url = 'https://mila.umontreal.ca/en/cours/deep-learning-summer-school-2017/slides' 

browser = webdriver.Chrome() 
browser.get(url) 
browser.implicitly_wait(3) 
html = browser.page_source 
links = browser.find_elements_by_class_name('flip-entry') 
print(links) 
browser.quit() 
+0

乍一看:你为什么要设置'HTML = browser.page_source',而不是使用'html'? – JacobIRR

回答

0

原因是在主页面上没有链接。您正在获取IFrame中的链接。此iframe指向https://drive.google.com/embeddedfolderview?hl=fr&id=0ByUKRdiCDK7-c0k1TWlLM1U1RXc#list

IFrame

您可以直接在浏览你的代码,而不是主要页面的URL。或者你也可以切换到帧

browser.switch_to_frame(browser.find_element_by_class_name("iframe-class")) 
links = browser.find_elements_by_css_selector('.flip-entry a') 

for link in links: 
    print(link.get_attribute("href")) 
0
from bs4 import BeautifulSoup 
from selenium import webdriver 

url = 'https://mila.umontreal.ca/en/cours/deep-learning-summer-school-2017/slides' 
browser = webdriver.Chrome() 
browser.get(url) 
browser.switch_to_frame(browser.find_element_by_class_name('iframe-class')) 
links = browser.find_elements_by_class_name('.flip-entry a') 
for link in links: 
    print(link.get_attribute("href")) 
browser.quit()