我试图从here下载所有产品的图像。我的蜘蛛的样子:蜘蛛必须返回请求,BaseItem,字典或无,得到'设置'
from shopclues.items import ImgData
import scrapy
class multipleImages(scrapy.Spider):
name='multipleImages'
start_urls=['http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera',]
def parse (self, response):
for url in response.css('div.products-grid div.grid-product):
yield {
ImgData(image_urls=[url.css('img::attr(src)').extract()])
}
和items.py:
import scrapy
from scrapy.item import Item
class ShopcluesItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
class ImgData(Item):
image_urls=scrapy.Field()
images=scrapy.Field()
,但我得到上运行的蜘蛛以下错误:
2016-09-29 11:56:19 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/robots.txt> (referer: None)
2016-09-29 11:56:20 [scrapy] DEBUG: Crawled (200) <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera> (referer: None)
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
2016-09-29 11:56:20 [scrapy] ERROR: Spider must return Request, BaseItem, dict or None, got 'set' in <GET http://www.shopclues.com/electronic-accessories-8/cameras-18/cameras-special.html?search=1&q1=camera>
这个错误是什么意思?可能是错误的可能原因是什么?
我实际上想要下载所有可以在给定路径下使用的图像。我使用'yield ImgData(image_urls = [url.css('img :: attr(src)')。extract_first()])来完成单个图像的操作''但是对于多个图像使用相同的错误会导致错误处理{'image_urls':[[u'http:// cdn.shopclues.com/images/thumbnails/25469/200/200/canoneos750dkitefs1855mmisstmdslr400x400imae77typaskcv7f14343509861444028422.jpg''。 –