xpath选择节点文本和子节点

我正在使用python scrapy从网站上刮取一些数据。xpath选择节点文本和子节点

的网站内容是这样的

<html> 
    <div class="details"> 
    <div class="a"> not needed</div> 
    content 1 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p> 
    <div class="b"> this is also not needed</div> 
    </div> 
</html>

我需要得到完整的HTML数据排除与阶级一个div，B。

所以我的输出会是这样

<div class="details"> 
content 1 
<p>content 2</p> 
<div>content 2</div> 
<p>content 2</p> 
<div>content 2</div> 
<p>content 2</p> 
</div>

我怎么能写正确的XPath为或者我应该写的XPath使用类“细节”，“A”，“B”和使用字符串操作DIV删除类'a'，'b'的div？

注意的是，这里的内容是文本，而不是用DIV的一个子类“细节”

来源

2014-11-24 sajith

你可以得到除div与a类或b所有儿童使用node()和self::语法：

//div[@class="details"]/node()[not(self::div[@class="a" or @class="b"])]

使用scrapy shell演示：

$ scrapy shell index.html 
>>> nodes = response.xpath('//div[@class="details"]/node()[not(self::div[@class="a" or @class="b"])]').extract() 
>>> print ''.join(nodes) 
    content 1 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p>

来源

2014-11-24 05:09:01 alecxe

xpath选择节点文本和子节点

回答

相关问题