我有这样的代码在我的履带Scrapy履带不completeing所有环路的解析函数
class StackSpider(InitSpider):
name = 'stack'
allowed_domains = ['sitepoint.com']
start_urls = ["http://www.sitepoint.com"]
start_page = "http://www.sitepoint.com"
item = StackItem()
def init_request(self):
return Request(url=self.start_page, callback=self.parse)
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="headline_area"]')
items = []
ivar = 1
for site in sites[:5]:
item = StackItem()
log.msg(' LOOP' +str(ivar)+ '', level=log.ERROR)
item['title'] ="yoo ma"
request = Request("http://www.sitepoint.com/getting-to-know-css3-selectors-structural-pseudo-classes/", callback=self.test1)
request.meta['item'] = item
ivar = ivar + 1
yield request
def test1(self, response):
log.msg(' LOOP 2 \n', level=log.ERROR)
item = response.meta['item']
item['desc'] = "test4"
return item
我做到了按documentation但它只能在一个环路。 我的意思是,我只能在日志中看到屏幕上
LOOP1
LOOP2
应重复3次
我想回报的不同组合和屈服,
return request
和return item
给输出LOOP1 LOOP2
yield request
andreturn item
给出输出LOOP1 LOOP1 LOOP1 LOOP2
yield request
和yield item
使输出LOOP1 LOOP1 LOOP1 LOOP2
return request
和yield item
使输出LOOP1 LOOP2
我怎样才能LOOP 1 LOOP2 LOOP1 LOOP2 AND so on
解决您的identation –
显然站点= hxs.select多次请求(” // div [@ class =“top”]')只返回两个项目....没有人可以证明这一点,因为您缺少重要信息以便进一步重现此问题。因此-1 –
我可以确认它有许多来自scrapy外壳的项目。这就是为什么我切片检测 – user19140477031