我无法使用scrapy上的规则获取数据

我正在做一个蜘蛛与scrapy，如果我不执行任何规则，但现在我试图实施一个规则来获取paginator和刮所有其余的页面。但我不知道为什么我不能实现它。我无法使用scrapy上的规则获取数据

蜘蛛代码：

allowed_domains = ['guia.bcn.cat'] 
    start_urls = ['http://guia.bcn.cat/index.php?pg=search&q=*:*'] 

rules = (
     Rule(SgmlLinkExtractor(allow=("index.php?pg=search&from=10&q=*:*&nr=10"), 
     restrict_xpaths=("//div[@class='paginador']",)) 
     , callback="parse_item", follow=True),) 

def parse_item(self, response) 
...

而且，我试图设置“的index.php”在允许规则的参数，但既不工程。

因为SgmlLinkExtractor会自动搜索链接，所以我没有在scrapy组中读过“a /”或“a/@ href”。

控制台输出似乎工作正常，但没有得到任何东西。

有什么想法？

在此先感谢

编辑：

有了这个代码工作

from scrapy.selector import Selector 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from scrapy.contrib.spiders import CrawlSpider, Rule 
from bcncat.items import BcncatItem 
import re 

class BcnSpider(CrawlSpider): 
    name = 'bcn' 
    allowed_domains = ['guia.bcn.cat'] 
    start_urls = ['http://guia.bcn.cat/index.php?pg=search&q=*:*'] 


rules = (
    Rule(
     SgmlLinkExtractor(
      allow=(re.escape("index.php")), 
      restrict_xpaths=("//div[@class='paginador']")), 
     callback="parse_item", 
     follow=True), 
) 

def parse_item(self, response): 
    self.log("parse_item") 
    sel = Selector(response) 
    i = BcncatItem() 
    #i['domain_id'] = sel.xpath('//input[@id="sid"]/@value').extract() 
    #i['name'] = sel.xpath('//div[@id="name"]').extract() 
    #i['description'] = sel.xpath('//div[@id="description"]').extract() 
    return i

来源

2014-01-13 Carlos Espeleta

的allow参数SgmlLinkExtractor是正则表达式（多个）（列表）。所以“？”，“*”和“。”被视为特殊字符。

可以使用allow=(re.escape("index.php?pg=search&from=10&q=*:*&nr=10"))（与你的脚本的开头import re某处）

编辑：其实，上面的规则不起作用。但是，因为您已经有了您想要提取链接的受限区域，所以您可以使用allow=('index.php')

来源

2014-01-13 16:16:24

如果我使用'allow =（'index.php'）'它不会执行任何操作 –

我上传了示例CrawlSpider和控制台.log：https：//gist.github.com/redapple/8405909 –

现在，它的工作原理！我不知道如何python准确的工作，但如果我取消注释一个项目行#i ['domain_id'] = sel.xpath（'//输入[@ id =“sid “）/ @ value'）。extract（）'有时控制台显示**索引错误**并修复它，我必须退出标签空间。这是正常的吗？是新手错误？非常感谢您的答复和工作！ –

我无法使用scrapy上的规则获取数据

回答

相关问题