为什么scrapy yield.Request（）没有递归？

这是我的代码？为什么scrapy yield.Request（）没有递归？

class QuotesSpider(scrapy.Spider): 
name = 'quotes' 
allowed_domains = ['quotes.toscrape.com/'] 
start_urls = ['http://quotes.toscrape.com//'] 

def parse(self, response): 
    quotes = response.css('.quote') 
    for quote in quotes: 
     item = QuoteItem() 
     text = quote.css('.text::text').extract_first() 
     author = quote.css('.author::text').extract_first() 
     tags = quote.css('.tags .tag::text').extract() 
     item['text'] = text 
     item['author'] = author 
     item['tags'] = tags 
     yield item 

    next = response.css('.pager .next a::attr(href)').extract_first() 
    url = response.urljoin(next) 
    yield scrapy.Request(url=url, callback=self.parse)

我是scrapy的新手。我认为这将总是递归的，但实际上它不是。那是为什么？

来源

2017-07-08 lxacoder

你是什么意思与递归？ – eLRuLL

这里的问题是，scrapy使用allowed_domains作为一个正则表达式来确定通过的链接是否属于指定的域。

只要将字符串quotes.toscrape.com/更改为quotes.toscrape.com如果您只想允许来自该特定子域的请求。

如果您希望允许来自每个域的请求，也可以删除整个变量。

来源

2017-07-08 14:54:39 eLRuLL

为什么scrapy yield.Request（）没有递归？

回答

相关问题