在Selenium中返回源代码之前等待（不是超时（））

我正在尝试网络爬虫this website。正如您所看到的，打开时，它会首先显示第一个错误的页面几秒钟，然后加载实际的，我感兴趣的右页。在Selenium中返回源代码之前等待（不是超时（））

为了清晰起见。 First/wrong page和second, right page

正如预期的那样，使用BeautifulSoup或Requests我只弄到了“第一页”的HTML，而不是“正确”的页面。

我试过使用Selenium和set_page_load_timeout()，它只返回'首页/错误'页面而不是实际页面。

driver = webdriver.Chrome() 
driver.set_page_load_timeout(7) 
url = 'https://images.nga.gov/en/search/do_quick_search.html?q=%221949.7.1%22' 
driver.get(url) 
source = BeautifulSoup(driver.page_source, 'html.parser') 
print(source)

我试图寻找相关的问题，但是他们都关于设置超时，这似乎并不成为问题来了，由于网页加载，它只是不是我想要的页面。

有没有办法在7秒后得到source？

来源

2017-06-06 Mitchell van Zuylen

您可以使用title_is()expected condition所需的页面打开时要等待一个特殊时刻（即获得源，而不必在7秒之后超时前等7秒），（网页标题改为从"Just a moment..."到"National Gallery of Art | NGA Images"）：

from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait as wait 


driver = webdriver.Chrome() 
url = 'https://images.nga.gov/en/search/do_quick_search.html?q=%221949.7.1%22' 
title = "National Gallery of Art | NGA Images" 
driver.get(url) 
wait(driver, 10).until(EC.title_is(title)) 
source = BeautifulSoup(driver.page_source, 'html.parser') 
print(source)

来源

2017-06-06 19:49:26 Andersson

这里wait（）中的10是超时吗？即如果EC在x秒之后是真的，只要x <10，它就会持续下去。 –

是的，10秒超时。如果标题在10秒后保持不变，您将得到'TimeOutException' – Andersson

在Selenium中返回源代码之前等待（不是超时（））

回答

相关问题