2015-09-10 93 views
2

我无法理解在尝试抓取某个网站的某些元素时要选择Xpath的哪个部分。在这种情况下,我想凑在这篇文章中的链接(例如所有网站中的XPath的这一部分:使用Xpath在使用Scrapy抓取网站时混淆

data-track="Body Text Link: External" href="http://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/brca-related-cancer-risk-assessment-genetic-counseling-and-genetic-testing"> 

我的蜘蛛的作品,但它并没有任何刮

!我的代码如下:

import scrapy 
from scrapy.selector import Selector 

from nymag.items import nymagItem 

class nymagSpider(scrapy.Spider): 
    name = 'nymag' 
    allowed_domains = ['http://wwww.nymag.com'] 
    start_urls = ["http://nymag.com/thecut/2015/09/should-we-all-get-the-breast-cancer-gene-test.html"] 

    def parse(self, response): 
     #I'm pretty sure the below line is the issue 
     links = Selector(response).xpath(//*[@id="primary"]/main/article/div/span) 
     for link in links: 
      item = nymagItem() 
      #This might also be wrong - am trying to extract the href section 
      item['link'] = question.xpath('a/@href').extract() 
      yield item 

回答

1

还有一个更简单的方法获取具有data-trackhref属性所有a要素:

In [1]: for link in response.xpath("//div[@id = 'primary']/main/article//a[@data-track and @href]"): 
    print link.xpath("@href").extract()[0] 
    ...:  
//nymag.com/tags/healthcare/ 
//nymag.com/author/Susan%20Rinkunas/ 
http://twitter.com/sueonthetown 
http://www.facebook.com/sharer/sharer.php?u=http://nymag.com/thecut/2015/09/should-we-all-get-the-breast-cancer-gene-test.html%3Fmid%3Dfb-share-thecut 
https://twitter.com/share?text=Should%20All%20Women%20Get%20Tested%20for%20the%20Breast%20Cancer%20Gene%3F&url=http://nymag.com/thecut/2015/09/should-we-all-get-the-breast-cancer-gene-test.html%3Fmid%3Dtwitter-share-thecut&via=TheCut 
https://plus.google.com/share?url=http%3A%2F%2Fnymag.com%2Fthecut%2F2015%2F09%2Fshould-we-all-get-the-breast-cancer-gene-test.html 
http://pinterest.com/pin/create/button/?url=http://nymag.com/thecut/2015/09/should-we-all-get-the-breast-cancer-gene-test.html%3Fmid%3Dpinterest-share-thecut&description=Should%20All%20Women%20Get%20Tested%20for%20the%20Breast%20Cancer%20Gene%3F&media=http:%2F%2Fpixel.nymag.com%2Fimgs%2Ffashion%2Fdaily%2F2015%2F09%2F08%2F08-angelina-jolie.w750.h750.2x.jpg 
whatsapp://send?text=Should%20All%20Women%20Get%20Tested%20for%20the%20Breast%20Cancer%20Gene%3F%0A%0Ahttp%3A%2F%2Fnymag.com%2Fthecut%2F2015%2F09%2Fshould-we-all-get-the-breast-cancer-gene-test.html&mid=whatsapp 
mailto:?subject=Should%20All%20Women%20Get%20Tested%20for%20the%20Breast%20Cancer%20Gene%3F&body=I%20saw%20this%20on%20The%20Cut%20and%20thought%20you%20might%20be%20interested...%0A%0AShould%20All%20Women%20Get%20Tested%20for%20the%20Breast%20Cancer%20Gene%3F%0AIt's%20not%20a%20crystal%20ball.%0Ahttp%3A%2F%2Fnymag.com%2Fthecut%2F2015%2F09%2Fshould-we-all-get-the-breast-cancer-gene-test.html%3Fmid%3Demailshare%5Fthecut 
... 
+0

非常感谢!完美的工作:) – cgp25