Scrapy没有返回它应该的所有项

我试图让Scrapy爬过一个网站，但仅限于匹配某种模式的页面，这让我很头疼。Scrapy没有返回它应该的所有项

该网站的结构是这样的：

website.com/category/page1/ 
website.com/category/page2/ 
website.com/category/page3/

等。

我需要它开始从类爬行，然后按照所有导致另一页的链接（有AR共375页，而且数量是不固定的，当然）。

的问题是，它通过抓取〜10页之前，我阻止它，但它只返回10-15项，其中应该有200+

这里是我的代码，它不工作的权利：

class WSSpider(CrawlSpider): name = "ws" allowed_domains = ["website.com"] start_urls = ["https://www.website.com/category/"] rules = ( Rule(LinkExtractor(allow=("/level_one/page*",)), callback="parse_product", follow=True), ) def parse_product(self, response): sel = Selector(response) sites = sel.css(".pb-infos") items = [] for site in sites: item = Website() item["brand"] = site.css(".pb-name .pb-mname::text").extract() item["referinta"] = site.css(".pb-name a::text").extract() item["disponibilitate"] = site.css(".pb-availability::text").extract() item["pret_vechi"] = site.css(".pb-sell .pb-old::text").extract() item["pret"] = site.css(".pb-sell .pb-price::text").extract() item["procent"] = site.css(".pb-sell .pb-savings::text").extract() items.append(item) #return items f = open("output.csv", "w") for item in items: line = \ item["brand"][0].strip(), ";", \ item["referinta"][-1].strip(), ";", \ item["disponibilitate"][0].strip(), ";", \ item["pret_vechi"][0].strip().strip(" lei"), ";", \ item["pret"][0].strip().strip(" lei"), ";", \ item["procent"][0].strip().strip("Mai ieftin cu "), "\n" f.write("".join(line)) f.close()

任何帮助非常感谢！

来源

2015-07-03 Faryus