使用XPath采取同样的类型

我需要在这个页面中的所有答案，结构例如与作者姓名和答案的文本的多个要素。使用XPath采取同样的类型

https://answers.yahoo.com/question/index?qid=20151007080620AAVNtY1

如果我使用此代码

item = YahooItem() 
text_to_gain = hxs.xpath('//a[contains(@class,"uname Clr- b")]/text()').extract() 
    if text_to_gain: 
     item['author']= str(text_to_gain[0]).strip() 
    else: 
     item['author']= "Anonymous" 

    item['type']="Answer" 

    text_to_gain = hxs.xpath('//span[contains(@class,"ya-q-full-text")][@itemprop="text"]/text()').extract() 
    if text_to_gain: 
     item['text']= str(text_to_gain[0]).strip() 
    else: 
     item['text']= "NULL" 
    yield item

我只需要一元。我也尝试改变HXS或使用迭代器，例如：

all_answer = hxs.xpath('//li[contains(@class,"Cf Py-14 ya-other-answer Pend-14 ")]').extract()

但不是工作

来源

2015-10-11 RedVelvet

您是否可以编辑自己的帖子以添加具体的问题？很难说出你在问什么。 –

能否请您提供您的蜘蛛的完整代码？ – alecxe

你可以得到所有的答案和相关作者与下面的表达式。这个表达式选择页面上的所有问题的答案，包括最佳答案

all_answers = hxs.xpath("descendant::*[@itemtype='https://schema.org/Answer']");

现在迭代对每个答案answ，和下面的XPath表达式（相对执行到每个answ节点）将选择文本和作者，分别为

text = hxs.xpath(answ,"descendant::*[@itemprop='text']"); 
author = hxs.xpath(answ,"//a[starts-with(@class,'uname')]");

来源

2015-10-14 13:58:19 legrass

使用XPath采取同样的类型

回答

相关问题