使用“下一步”按钮进行网页搜刮Python

我得到网页的评论，并且需要扫描每一页，直到不再有任何评论为止。评论页面有多个页面，我的第一个想法是使用While循环，但是，我不确定从哪里开始。网页的HTML代码看起来与此类似。使用“下一步”按钮进行网页搜刮Python

上一页的HTML代码;

任何帮助表示赞赏。

来源

2016-12-25 Pythoner1234

显示代码，您都试过了。同时分享最后一页的“HTML” – Andersson

检查URL是否有页面作为参数。如果是这样，则不需要使用下一个按钮。而对于解析HTML，我会建议[BeautifulSoup]（https://www.crummy.com/software/BeautifulSoup/bs4/doc） –

URL没有页面参数。我无法想出一些循环的东西。任何代码建议都非常受欢迎。 @Andersson – Pythoner1234

尝试点击Next，同时可以发现：到目前为止

from selenium import webdriver 
from selenium.common.exceptions import NoSuchElementException 

driver = webdriver.Firefox() 
driver.get("http://www.some_site.com") 
while True: 
    # do whatever you want 
    try: 
     driver.find_element_by_xpath('//a/span[text()="Next"]').click() 
    except NoSuchElementException: 
     break

来源

2016-12-25 12:19:34 Andersson

我得到这个错误：NameError：全局名称'驱动程序'未定义。我之前没有使用Selenium，我的代码使用urllib2请求并使用BeautifulSoup进行分析。 @Andersson – Pythoner1234

答复已更新。如果您使用'http'请求获取页面“HTML”源代码，为什么您需要'selenium'呢？ – Andersson

为了使用类似浏览器的html代码来点击下一个按钮。我找不到比Selenium更直接的方法。 – Pythoner1234

使用“下一步”按钮进行网页搜刮Python

回答

相关问题