Scrapy xpath不工作

-2

我是Scrapy的新手，我只是好奇为什么我的刮刀不工作。这里是我的代码：Scrapy xpath不工作

import scrapy 

from tutorial.items import TutorialItem 

class tutSpider(scrapy.Spider): 
    name = "tutorial" 
    allowed_domains = ["backpage.com"] 
    start_urls = [ 
     "http://chicago.backpage.com/FemaleEscorts/naughtiest-_girl-next-door/20557457" 
    ] 

    def parse(self, response): 
     # sel = response.xpath('//*') 
     item = TutorialItem() 
     item['title'] = response.xpath('//div[@id="postingTitle"]/h1/text()').extract() 
     item['link'] = response.xpath('a/@href').extract() 
     item['desc'] = response.xpath('//body/div[@id="postingBody"]/text()').extract() 
     yield item

它产生以下JSON文件：

[{"title": [], "link": [], "desc": []}]

我相信，这是无法找到我表示指定的元素，即使我” m 100％确定这些div ID是有效的。他们在身体内的其他divs内。

来源

2015-11-06 Matt

我投票结束这个问题作为题外话，因为这基本上说**我xpath不工作，但我知道这是正确的**。任何人都可以在没有DOM的情况下回答这个问题，因为每个DOM和xPath都会有所不同，所以这是非常本地化的，所以任何人都会遇到类似的问题，这对于任何人来说都是无用的。也不要使用不相关的标签来垃圾邮件。这是不是特定于python，如果它实际上是不好的xPath查询。 –

有大量的在线xPath站点会为您为给定的URL的内容实际编写xPath。 –

正如你所猜测的，问题在于xpath本身。

对于item['title']，h1节点位于您使用的xpath中不存在的节点内。所以它必须是

item['title'] = response.xpath('//div[@id="postingTitle"]/a/h1/text()').extract() 
item['link'] = response.xpath('//div[@id="postingBody"]/a/@href').extract() 
item['desc'] = response.xpath('//div[@id="postingBody"]//text()').extract()

由于@Jarrod罗伯逊所指出的那样，有很多提供的XPath和验证他们

如果你使用Firefox和Firebug的工具，尝试firepath。它总是很好的尝试xpaths之前把它们放入你的蜘蛛

来源

2015-11-06 22:42:21

Scrapy xpath不工作

回答

相关问题