如何使用Python截取网站截图/图像？

我想实现的是从python的任何网站获取网站截图。如何使用Python截取网站截图/图像？

ENV：Linux的

2009-07-28 Esteban Feldman

快速搜索网站会带来很多很多近似重复的内容。这是一个很好的开始：http：//stackoverflow.com/questions/713938/how-can-i-generate-a-screenshot-of-a-webpage-using-a-server-side-script – Shog9 2009-07-28 22:55:30

Shog9：谢谢！你的链接有一些...会检查它。 – 2009-07-28 23:22:11

Shog9：你为什么不把它添加为答案？所以它可以给你点。 – 2009-07-28 23:27:05

在Mac上，有webkit2png，在Linux + KDE上，您可以使用khtml2png。我试过前者，效果很好，听说后者正在使用。

我最近遇到了QtWebKit，它声称是跨平台的（Qt将WebKit放入他们的库中，我想）。但我从来没有尝试过，所以我不能告诉你更多。

QtWebKit链接显示了如何从Python进行访问。你应该至少可以使用子进程对其他进程做同样的事情。

来源

2009-07-29 01:12:48 ars

你不提你在运行什么样的环境，这使得一个很大的不同，因为没有一个纯Python Web浏览器是能够呈现HTML的。

但是，如果您使用的是Mac，我已经使用webkit2png，并取得了巨大成功。如果没有，正如其他人指出的那样，有很多选择。

来源

2009-07-28 23:28:08

我不能评论ars的答案，但我实际上得到Roland Tapken's code运行使用QtWebkit，它工作得很好。

只想确认Roland在他的博客上发布的内容在Ubuntu上的效果如何。我们的产品版本最终没有使用他写的任何东西，但我们使用的PyQt/QtWebKit绑定取得了很大的成功。

来源

2009-07-29 04:19:46 aezell

这里有一个简单的解决方案使用的WebKit： http://webscraping.com/blog/Webpage-screenshots-with-webkit/

import sys 
import time 
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

class Screenshot(QWebView): 
    def __init__(self): 
     self.app = QApplication(sys.argv) 
     QWebView.__init__(self) 
     self._loaded = False 
     self.loadFinished.connect(self._loadFinished) 

    def capture(self, url, output_file): 
     self.load(QUrl(url)) 
     self.wait_load() 
     # set to webpage size 
     frame = self.page().mainFrame() 
     self.page().setViewportSize(frame.contentsSize()) 
     # render image 
     image = QImage(self.page().viewportSize(), QImage.Format_ARGB32) 
     painter = QPainter(image) 
     frame.render(painter) 
     painter.end() 
     print 'saving', output_file 
     image.save(output_file) 

    def wait_load(self, delay=0): 
     # process app events until page loaded 
     while not self._loaded: 
      self.app.processEvents() 
      time.sleep(delay) 
     self._loaded = False 

    def _loadFinished(self, result): 
     self._loaded = True 

s = Screenshot() 
s.capture('http://webscraping.com', 'website.png') 
s.capture('http://webscraping.com/blog', 'blog.png')

来源

2012-08-20 01:21:49 hoju

下面是从各种渠道帮助抓住我的解决方案。它需要完整的网页屏幕截图，并裁剪（可选），并从裁剪后的图像生成缩略图。以下是要求：安装的NodeJS

使用节点的包管理器安装phantomjs

：

要求npm -g install phantomjs
安装硒（在你的virtualenv，如果你正在使用）
安装imageMagick
将幻影添加到系统路径（在窗口中）

import os 
from subprocess import Popen, PIPE 
from selenium import webdriver 

abspath = lambda *p: os.path.abspath(os.path.join(*p)) 
ROOT = abspath(os.path.dirname(__file__)) 


def execute_command(command): 
    result = Popen(command, shell=True, stdout=PIPE).stdout.read() 
    if len(result) > 0 and not result.isspace(): 
     raise Exception(result) 


def do_screen_capturing(url, screen_path, width, height): 
    print "Capturing screen.." 
    driver = webdriver.PhantomJS() 
    # it save service log file in same directory 
    # if you want to have log file stored else where 
    # initialize the webdriver.PhantomJS() as 
    # driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log') 
    driver.set_script_timeout(30) 
    if width and height: 
     driver.set_window_size(width, height) 
    driver.get(url) 
    driver.save_screenshot(screen_path) 


def do_crop(params): 
    print "Croping captured image.." 
    command = [ 
     'convert', 
     params['screen_path'], 
     '-crop', '%sx%s+0+0' % (params['width'], params['height']), 
     params['crop_path'] 
    ] 
    execute_command(' '.join(command)) 


def do_thumbnail(params): 
    print "Generating thumbnail from croped captured image.." 
    command = [ 
     'convert', 
     params['crop_path'], 
     '-filter', 'Lanczos', 
     '-thumbnail', '%sx%s' % (params['width'], params['height']), 
     params['thumbnail_path'] 
    ] 
    execute_command(' '.join(command)) 


def get_screen_shot(**kwargs): 
    url = kwargs['url'] 
    width = int(kwargs.get('width', 1024)) # screen width to capture 
    height = int(kwargs.get('height', 768)) # screen height to capture 
    filename = kwargs.get('filename', 'screen.png') # file name e.g. screen.png 
    path = kwargs.get('path', ROOT) # directory path to store screen 

    crop = kwargs.get('crop', False) # crop the captured screen 
    crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen 
    crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen 
    crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture? 

    thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True 
    thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail 
    thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail 
    thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image? 

    screen_path = abspath(path, filename) 
    crop_path = thumbnail_path = screen_path 

    if thumbnail and not crop: 
     raise Exception, 'Thumnail generation requires crop image, set crop=True' 

    do_screen_capturing(url, screen_path, width, height) 

    if crop: 
     if not crop_replace: 
      crop_path = abspath(path, 'crop_'+filename) 
     params = { 
      'width': crop_width, 'height': crop_height, 
      'crop_path': crop_path, 'screen_path': screen_path} 
     do_crop(params) 

     if thumbnail: 
      if not thumbnail_replace: 
       thumbnail_path = abspath(path, 'thumbnail_'+filename) 
      params = { 
       'width': thumbnail_width, 'height': thumbnail_height, 
       'thumbnail_path': thumbnail_path, 'crop_path': crop_path} 
      do_thumbnail(params) 
    return screen_path, crop_path, thumbnail_path 


if __name__ == '__main__': 
    ''' 
     Requirements: 
     Install NodeJS 
     Using Node's package manager install phantomjs: npm -g install phantomjs 
     install selenium (in your virtualenv, if you are using that) 
     install imageMagick 
     add phantomjs to system path (on windows) 
    ''' 

    url = 'http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python' 
    screen_path, crop_path, thumbnail_path = get_screen_shot(
     url=url, filename='sof.png', 
     crop=True, crop_replace=False, 
     thumbnail=True, thumbnail_replace=False, 
     thumbnail_width=200, thumbnail_height=150, 
    )

这些是所生成的图像：

来源

2013-08-05 21:30:57

-1

尝试此..

#!/usr/bin/env python 

import gtk.gdk 

import time 

import random 

while 1 : 
    # generate a random time between 120 and 300 sec 
    random_time = random.randrange(120,300) 

    # wait between 120 and 300 seconds (or between 2 and 5 minutes) 
    print "Next picture in: %.2f minutes" % (float(random_time)/60) 

    time.sleep(random_time) 

    w = gtk.gdk.get_default_root_window() 
    sz = w.get_size() 

    print "The size of the window is %d x %d" % sz 

    pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1]) 
    pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1]) 

    ts = time.time() 
    filename = "screenshot" 
    filename += str(ts) 
    filename += ".png" 

    if (pb != None): 
     pb.save(filename,"png") 
     print "Screenshot saved to "+filename 
    else: 
     print "Unable to get the screenshot."

来源

2013-11-18 10:09:39 Anand

如何使用Python截取网站截图/图像？

回答

相关问题