Python + Selenium + PhantomJS渲染为PDF

当PhantomJS与Selenium和Python结合使用时，是否可以使用PhantomJS's呈现PDF功能？（即通过Selenium在Python中模仿page.render('file.pdf')行为）。Python + Selenium + PhantomJS渲染为PDF

我意识到这使用GhostDriver和GhostDriver并不真正支持很多的打印方式。

如果另一种选择是不可能的硒不是我的耳朵。

来源

2013-06-04 Rejected

你看过Pypdf2吗？ http://www.blog.pythonlibrary.org/tag/python-pdf-series/ – Amit

@Amit：相当广泛，因为我一直都在使用它。即使Phaseit自己也说过“PyPDF2没有HTML知识”。它不会可靠地呈现任何HTML。 – Rejected

@Rejected是否需要在测试期间以确切的状态进行截图？或者你只是想加载一个页面并渲染为PDF？ –

你可以使用selenium.selenium.capture_screenshot('file.png')但这会给你一个PNG屏幕截图而不是pdf。似乎没有办法将屏幕截图作为pdf。

这里是capture_screenshot文档：http://selenium.googlecode.com/git/docs/api/py/selenium/selenium.selenium.html?highlight=screenshot#selenium.selenium.selenium.capture_screenshot

来源

2014-03-30 22:29:43 KAtkinson

PDF是一个关键因素。由于众多原因，例如文本搜索，表单，嵌入式媒体等，我无法下载到一个简单的图像。 – Rejected

试过pdfkit？它可以从HTML页面呈现PDF文件。

来源

2014-04-02 10:13:24 moodh

我也研究过它。 PDFKit转换HTML - > PDF，但没有其他功能。内容分析，以确定页面是否包含PDF之前所需的内容悲伤是不可能的。 – Rejected

是的，我有与PDFKit相同的问题，我想要更高级的渲染，使用它与JS框架相当麻烦.. :( – moodh

“内容分析，以确定一个页面是否包含所需的内容” - >好吧，你不能自己做内容分析，如果它匹配，那么你只需发送它与pdfkit呈现。这就是我如何做到这一点。 – Jonathan

@rejected，我知道你提到不想使用的子过程，但是......

你实际上可能能够利用比你预期的子进程通信等等。理论上，您可以采取Ariya's stdin/stdout example并将其扩展为相对通用的包装脚本。它可能会首先接受要加载的页面，然后在该页面上侦听（&执行）您的测试操作。最终，你可能会揭开序幕，.render甚至做出错误处理的一般拍摄：

try { 
    // load page & execute stdin commands 
} catch (e) { 
    page.render(page + '-error-state.pdf'); 
}

来源

2014-04-04 18:20:25

执行通过stdin接收的代码需要通过'eval'完成，并且从我尝试这样做的经验中，它都是inse治愈和不可靠。除非我错了？ – Rejected

尽管您希望对您的输入保持谨慎（从可靠性的角度来看），但您可能不必担心安全问题，因为（我假设）您拥有该流程。 –

你也可以列出特定的命令等，以便更快地处理意外错误。然而，我设想的最佳场景是，您可以将屏幕截图之前可能发生的测试（或其他逻辑）提取到单独的.js文件并将其加载到页面中（http://phantomjs.org/api/phantom/）方法/注入-js.html）。你可以让Python最大传递一个arg来加载特定的文件JS。 –

下面是使用硒和特殊命令GhostDriver 溶液（它应该工作，因为GhostDriver 1.1.0和1.9 PhantomJS。 6，测试与PhantomJS 1.9.8）：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

"""Download a webpage as a PDF.""" 


from selenium import webdriver 


def download(driver, target_path): 
    """Download the currently displayed page to target_path.""" 
    def execute(script, args): 
     driver.execute('executePhantomScript', 
         {'script': script, 'args': args}) 

    # hack while the python interface lags 
    driver.command_executor._commands['executePhantomScript'] = ('POST', '/session/$sessionId/phantom/execute') 
    # set page format 
    # inside the execution script, webpage is "this" 
    page_format = 'this.paperSize = {format: "A4", orientation: "portrait" };' 
    execute(page_format, []) 

    # render current page 
    render = '''this.render("{}")'''.format(target_path) 
    execute(render, []) 


if __name__ == '__main__': 
    driver = webdriver.PhantomJS('phantomjs') 
    driver.get('http://stackoverflow.com') 
    download(driver, "save_me.pdf")

也看到我的回答对同一问题here。

来源

2015-02-01 23:21:30 MTuner

Python + Selenium + PhantomJS渲染为PDF

回答

相关问题