2017-05-12 40 views
3

我有一些HTML如下图所示如何使用xpath从此输入中只获取值9?

<ol Class="z1"> 
     <li><h3>Number Theory - HCF LCM</h3> 
      <p lang="title">How many pairs of integers (x, y) exist such that the product of x, y and HCF (x, y) = 1080?</p> 
      <ol class="xyz"> 
       <li>8</li> 
       <li>7</li> 
       <li>9</li> 
       <li>12</li> 
      </ol> 
     <ul class="exp"><li class="grey fleft"><span class="qlabs_tooltip_bottom qlabs_tooltip_style_33" style="cursor:pointer;"><span><strong>Correct Answer</strong>Choice (C).</br>9</span> Correct answer</span></li><li class="primary fleft"><a href="hcf-lcm_1.shtml">Explanatory Answer</a></li><li class="grey1 fleft">HCF LCM</li><li class="red1 flrt">Hard</li> 
     </ul> 
     </li> 
</ol> 

我希望从它的类EXP后面的BR

我写了一个现有的XPath查询UL下正确答案抓住价值9该得到的一切,但犯规相当做的工作 “ '.// UL [@类= ”EXP“] /李/ SPAN/SPAN /文()'”

任何帮助,高度赞赏?

试图在scrapy运行此XPath表达式

class BrickSetSpider(scrapy.Spider): 
    name = "cat_spider" 
    start_urls = ['http://iim-cat-questions-answers.2iim.com/quant/number-system/hcf-lcm/'] 

    def parse(self, response): 
     CLASS_SELECTOR = '//ol[@class="z1"]/li' 
     problems = [] 
     for lis in response.xpath(CLASS_SELECTOR): 
      question = lis.xpath('.//p[@lang="title"]/text()').extract_first().strip() 
      choices = lis.xpath('.//ol[@class="xyz"]/li/text()').extract() 
      ANSWER_SELECTOR = './/ul[@class="exp"]/li/span/span/text()[not(contains(.,"Choice"))]' 
      correct_answer = lis.xpath(ANSWER_SELECTOR).extract_first() 
      explanation = lis.xpath('.//ul[@class="exp"]/li[2]/a/@href').extract_first().strip() 
      difficulty = lis.xpath('.//ul[@class="exp"]/li[last()]/text()').extract_first().strip() 
      p = Problem(question,choices, correct_answer, explanation, difficulty) 
      print(question, choices, correct_answer) 
+0

您想仅获得'9'文本吗?这是在正确的答案 – NarendraR

+0

是@NarendraRajput – PirateApp

回答

3

尝试below expression,让我知道,如果它不是你所需要的:

//ul[@class="exp"]//strong[.="Correct answer"]/following::text()[2] 
+0

这一返回null,据我所知,你是否正在努力在强烈的 – PirateApp

+1

内搜索文本正确的答案。不应该在**“正确答案”**下面找到“”9“。 – Andersson

+0

都能跟得上返回[NULL]这是我测试的xpath表达式http://iim-cat-questions-answers.2iim.com/quant/number-system/hcf-lcm/ – PirateApp

1
response.xpath('//ol[@class="xyz"]/li[3]/text()').extract_first() 

UPDATE

check = response.xpath('//ol[class="z1"]/li/ul/li/span/strong/text()').extract_first() 
if "Correct answer" in check : 
    correct_answer = response.xpath('//ol[class="z1"/li/ol/li[3]/text()').extract_first() 
+0

感谢张贴的解决方案,但没有找到正确的答案,在这4个选项中,它是下面提到的内部ul其class = exp – PirateApp

+0

@PirateApp我更新了我的答案 – parik

3

使用以下xpath来获取所需的文本

.//ul[@class="exp"]/li/span/span/text()[not(contains(.,'Choice'))] 
+0

适用于xpath chrome插件,但不适用于scrapy,无处不在 – PirateApp