2
我无法理解在尝试抓取某个网站的某些元素时要选择Xpath的哪个部分。在这种情况下,我想凑在这篇文章中的链接(例如所有网站中的XPath的这一部分:使用Xpath在使用Scrapy抓取网站时混淆
data-track="Body Text Link: External" href="http://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/brca-related-cancer-risk-assessment-genetic-counseling-and-genetic-testing">
我的蜘蛛的作品,但它并没有任何刮
!我的代码如下:
import scrapy
from scrapy.selector import Selector
from nymag.items import nymagItem
class nymagSpider(scrapy.Spider):
name = 'nymag'
allowed_domains = ['http://wwww.nymag.com']
start_urls = ["http://nymag.com/thecut/2015/09/should-we-all-get-the-breast-cancer-gene-test.html"]
def parse(self, response):
#I'm pretty sure the below line is the issue
links = Selector(response).xpath(//*[@id="primary"]/main/article/div/span)
for link in links:
item = nymagItem()
#This might also be wrong - am trying to extract the href section
item['link'] = question.xpath('a/@href').extract()
yield item
非常感谢!完美的工作:) – cgp25