2013-03-24 62 views
4

我正在尝试编写一个Python脚本来下载图像并将其设置为我的壁纸。不幸的是,机械化文档很差。我的脚本正确地跟踪链接,但我很难将图像保存在我的电脑上。从我研究的内容来看,.retrieve()方法应该完成这项工作,但是如何指定文件应该下载到哪里的路径?这里是我有...使用Python下载图像Mechanize

def followLink(browser, fixedLink): 
    browser.open(fixedLink) 

if browser.find_link(url_regex = r'1600x1200'): 

    browser.follow_link(url_regex = r'1600x1200') 

elif browser.find_link(url_regex = r'1400x1050'): 

    browser.follow_link(url_regex = r'1400x1050') 

elif browser.find_link(url_regex = r'1280x960'): 

    browser.follow_link(url_regex = r'1280x960') 

return 

回答

3

你可以通过打开IMG SRC的URL来获取/下载图像。

image_response = browser.open_novisit(img['src']) 

现在保存文件,只是使用的fopen:

with open('image_out.png', 'wb') as f: 
    f.write(image_response.read()) 
+0

它无法正常工作。我得到错误:“NameError:全局名称'img'未定义”。并且图像应该保存在哪里? – XVirtusX 2013-03-24 01:58:35

+1

这里第一行“img”表示它正在寻找“”标签。将它指向包含要保存在其“src”属性中的图像的url的标记。另外,该图像将保存在与脚本相同的文件夹中,如f.write语句所示。 – 2013-03-26 19:00:18

9
import mechanize, os 
from BeautifulSoup import BeautifulSoup 

browser = mechanize.Browser() 
html = browser.open(url) 
soup = BeautifulSoup(html) 
image_tags = soup.findAll('img') 
for image in image_tags: 
    filename = image['src'].lstrip('http://') 
    filename = os.path.join(dir, filename.replace('/', '_')) 
    data = browser.open(image['src']).read() 
    browser.back() 
    save = open(filename, 'wb') 
    save.write(data) 
    save.close() 

这可以帮助你从网页上下载的所有图像。至于解析html,你最好使用BeautifulSoup或lxml。下载只是读取数据,然后将其写入本地文件。您应该将自己的值分配给dir。这是你图像存在的地方。

5

不知道为什么这个解决方案没有出现,但你也可以使用mechanize.Browser.retrieve函数。也许这只适用于mechanize的新版本,因此没有提及?

无论如何,如果你想通过zhangyangyu缩短the answer,你可以这样做:

import mechanize, os 
from BeautifulSoup import BeautifulSoup 

browser = mechanize.Browser() 
html = browser.open(url) 
soup = BeautifulSoup(html) 
image_tags = soup.findAll('img') 
for image in image_tags: 
    filename = image['src'].lstrip('http://') 
    filename = os.path.join(dir, filename.replace('/', '_')) 
    browser.retrieve(image['src'], filename) 
    browser.back() 

也请记住,你可能会希望把所有这一切都为tryexcept块像这样的:

import mechanize, os 
from BeautifulSoup import BeautifulSoup 

browser = mechanize.Browser() 
html = browser.open(url) 
soup = BeautifulSoup(html) 
image_tags = soup.findAll('img') 
for image in image_tags: 
    filename = image['src'].lstrip('http://') 
    filename = os.path.join(dir, filename.replace('/', '_')) 
    try: 
     browser.retrieve(image['src'], filename) 
     browser.back() 
    except (mechanize.HTTPError,mechanize.URLError) as e: 
     pass 
     # Use e.code and e.read() with HTTPError 
     # Use e.reason.args with URLError 

当然,您需要根据自己的需要进行调整。也许你想让它在遇到问题时被炸出来。这完全取决于你想要达到的目标。

0

这是非常糟糕的,但它 “作品” 对我来说,与0xc0000022l前面回答的

进口机械化,OS 从BeautifulSoup进口BeautifulSoup 进口的urllib2

def DownloadIMGs(url): # IMPORTANT URL WITH HTTP OR HTTPS 
    print "From", url 
    dir = 'F:\Downloadss' #Dir for Downloads 
    basicImgFileTypes = ['png','bmp','cur','ico','gif','jpg','jpeg','psd','raw','tif'] 

    browser = mechanize.Browser() 
    html = browser.open(url) 
    soup = BeautifulSoup(html) 
    image_tags = soup.findAll('img') 
    print "N Images:", len(image_tags) 
    print 
    #---------SAVE PATH 
    #check if available 
    if not os.path.exists(dir): 
     os.makedirs(dir) 
    #---------SAVE PATH 
    for image in image_tags: 

     #---------SAVE PATH + FILENAME (Where It is downloading) 
     filename = image['src'] 
     fileExt = filename.split('.')[-1] 
     fileExt = fileExt[0:3] 

     if (fileExt in basicImgFileTypes): 
      print 'File Extension:', fileExt 
      filename = filename.replace('?', '_') 
      filename = os.path.join(dir, filename.split('/')[-1]) 
      num = filename.find(fileExt) + len(fileExt) 
      filename = filename[:num] 
     else: 
      filename = filename.replace('?', '_') 
      filename = os.path.join(dir, filename.split('/')[-1]) + '.' + basicImgFileTypes[0] 
     print 'File Saving:', filename 
     #---------SAVE PATH + FILENAME (Where It is downloading) 

     #--------- FULL URL PATH OF THE IMG 
     imageUrl = image['src'] 
     print 'IMAGE SRC:', imageUrl 

     if (imageUrl.find('http://') > -1 or imageUrl.find('https://') > -1): 
      pass 
     else: 
      if (url.find('http://') > -1): 
       imageUrl = url[:len('http://')] 
       imageUrl = 'http://' + imageUrl.split('/')[0] + image['src'] 
      elif(url.find('https://') > -1): 
       imageUrl = url[:len('https://')] 
       imageUrl = 'https://' + imageUrl.split('/')[0] + image['src'] 
      else: 
       imageUrl = image['src'] 

     print 'IMAGE URL:', imageUrl 
     #--------- FULL URL PATH OF THE IMG 

     #--------- TRY DOWNLOAD 
     try: 
      browser.retrieve(imageUrl, filename) 
      print "Downloaded:", image['src'].split('/')[-1] 
      print 
     except (mechanize.HTTPError,mechanize.URLError) as e: 
      print "Can't Download:", image['src'].split('/')[-1] 
      print 
      pass 
     #--------- TRY DOWNLOAD 
    browser.close() 

DownloadIMGs('https://stackoverflow.com/questions/15593925/downloading-a-image-using-python-mechanize')