2016-09-29 23 views
4

我试图从here下载所有产品的图像。我的蜘蛛的样子:蜘蛛必须返回请求,BaseItem,字典或无,得到'设置'

from shopclues.items import ImgData 
import scrapy 
    class multipleImages(scrapy.Spider): 
     name='multipleImages' 
     start_urls=['http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera',] 

     def parse (self, response): 
      for url in response.css('div.products-grid div.grid-product): 
       yield { 
       ImgData(image_urls=[url.css('img::attr(src)').extract()]) 
       } 

items.py

import scrapy 
from scrapy.item import Item 
class ShopcluesItem(scrapy.Item): 
    # define the fields for your item here like: 
    # name = scrapy.Field() 
    pass 

class ImgData(Item): 
    image_urls=scrapy.Field() 
    images=scrapy.Field() 

,但我得到上运行的蜘蛛以下错误:

2016-09-29 11:56:19 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/robots.txt> (referer: None) 
2016-09-29 11:56:20 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> (referer: None) 
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> 
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> 
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> 
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> 
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> 

这个错误是什么意思?可能是错误的可能原因是什么?

回答

4

将一列网址传递给管道。

def parse (self, response): 
    images = ImgData() 
    images['image_urls']=[] 
    for url in response.css('div.products-grid div.grid-product): 
     images['image_urls'].append(url.css('img::attr(src)').extract_first()) 
    yield images 
3

{}是用于在python或字典中定义集合的表示法。取决于您在大括号内提供的值。如果它是一个列表{a,b,c,d} < - 这是一个集合,如果它是评估{a:b,c:d}的关键 - < - 这是一个字典。

你在这一行产生一组:

yield { 
    ImgData(image_urls=[url.css('img::attr(src)').extract()]) 
} 

我假设你要产生字典?

yield { 
    'images': ImgData(image_urls=[url.css('img::attr(src)').extract()]), 
} 
+0

我实际上想要下载所有可以在给定路径下使用的图像。我使用'yield ImgData(image_urls = [url.css('img :: attr(src)')。extract_first()])来完成单个图像的操作''但是对于多个图像使用相同的错误会导致错误处理{'image_urls':[[u'http:// cdn.shopclues.com/images/thumbnails/25469/200/200/canoneos750dkitefs1855mmisstmdslr400x400imae77typaskcv7f14343509861444028422.jpg''。 –

相关问题