2016-08-25 34 views
2

我现在正在尝试编写一些代码来从某些Java呈现的页面中抓取网页内容。我通过使用PyQt5从网上找到了一些例子。但是,当我用PyQt5.5.7安装我的Python 3.5但未能导入其模块(ImportError:无法导入名称'QWebPage')时。我附上以下代码以供参考。非常感谢,如果任何人都可以建议该怎么做来解决这个问题,或者任何其他方式来取消Java渲染的网页内容。无法从Python 3.5中的PyQt5导入QWebPage

# standard imports 
import sys 

# third-party imports 
import requests 
from bs4 import BeautifulSoup 
from pyvirtualdisplay import Display 
from PyQt5.QtWebEngineWidgets import QWebPage 
from PyQt5.QtWidgets import QApplication 



class Render(QWebPage): 
    """Render HTML with PyQt5 WebKit.""" 

    def __init__(self, html): 
     self.html = None 
     self.app = QApplication(sys.argv) 
     QWebPage.__init__(self) 
     self.loadFinished.connect(self._loadFinished) 
     self.mainFrame().setHtml(html) 
     self.app.exec_() 

    def _loadFinished(self, result): 
     self.html = self.mainFrame().toHtml() 
     self.app.quit() 


url = 'https://impythonist.wordpress.com/2015/01/06/ultimate-guide-for-scraping-javascript-rendered-web-pages/' 

# get the raw HTML 
source_html = requests.get(url).text 

# return the JavaScript rendered HTML 
with Display(visible=0, size=(800, 600)): 
    rendered_html = Render(source_html).html 

# get the BeautifulSoup 
soup = BeautifulSoup(rendered_html, 'html.parser') 

print('title is %r' % soup.select_one('title').text) 

回答

-1

尝试使用此 从PyQt5.QtWebKitWidgets进口QWebView,QWebPage