的第二电平在下面的代码解析函数执行大约32倍(福尔环32 HREF的实测值)在同一开头每个子链路该去刮去数据(32个单个网址parse_next功能) 。但parse_next功能只执行一次(单程)/不叫(输出CSV文件empty.can任何人能帮助我在哪里,我没有错刮:刮URL
import scrapy
import logging
logger = logging.getLogger('mycustomlogger')
from ScrapyTestProject.items import ScrapytestprojectItem
class QuotesSpider(scrapy.Spider):
name = "nestedurl"
allowed_domains = ['www.grohe.in']
start_urls = [
'https://www.grohe.com/in/7780/bathroom/bathroom-faucets/essence/',
def parse(self, response):
logger.info("Parse function called on %s", response.url)
for divs in response.css('div.viewport div.workspace div.float-box'):
item = {'producturl': divs.css('a::attr(href)').extract_first(),
'imageurl': divs.css('a img::attr(src)').extract_first(),
'description' : divs.css('a div.text::text').extract() + divs.css('a span.nowrap::text').extract()}
next_page = response.urljoin(item['producturl'])
#logger.info("This is an information %s", next_page)
yield scrapy.Request(next_page, callback=self.parse_next, meta={'item': item})
#yield item
def parse_next(self, response):
item = response.meta['item']
logger.info("Parse function called on2 %s", response.url)
item['headline'] = response.css('div#content a.headline::text').extract()
return item
#response.css('div#product-variants a::attr(href)').extract()
检查您的循环,并应正常工作。因此,日志中应该存在某种错误。你有没有试图用DEBUG日志级别运行蜘蛛?这应该给你一些指示哪里出错的地方。 – Casper