2013-01-01 40 views
2

可能重复:
How to download image using requests如何从抓取的URL列表中下载图片?

我有刮tumblr博客的图片网址,此Python脚本,并想将其下载到本地文件夹我的桌面上。我将如何去实现这个

import requests 
from bs4 import BeautifulSoup 

def make_soup(url): 
#downloads a page with requests and creates a beautifulsoup object 

    raw_page = requests.get(url).text 
    soup = BeautifulSoup(raw_page) 

    return soup 


def get_images(soup): 
#pulls images from the current page 

    images = [] 

    foundimages = soup.find_all('img') 

    for image in foundimages: 
     url = img['src'] 

     if 'media.tumblr.com' in url: 
      images.append(url) 


    return images 


def scrape_blog(url): 
# scrapes the entire blog 

    soup = make_soup(url) 

    next_page = soup.find('a' id = 'nextpage') 

    while next_page is not none: 

     soup = make_soup(url + next_page['href']) 
     next_page = soup.find('a' id = 'nextpage') 

     more_images = get_images(soup) 
     images.extend(more_images) 

    return images 


url = 'http://x.tumblr.com' 
images = scrape_blog(url) 

回答

1

Python的“urllib2”可能是你在找什么。如果你需要做任何复杂的事情(比如使用cookie或认证),可能需要查看一个包装库,例如Requests,它为标准库的许多更麻烦的功能提供了很好的包装。