如何仅选择Nokogiri的叶节点？

我正在寻找一些关于如何完成的建议。我想一个解决方案只使用XPath：如何仅选择Nokogiri的叶节点？

一个HTML例子：

<div> 
    <div> 
    <div>text div (leaf)</div> 
    <p>text paragraph (leaf)</p> 
    </div> 
</div> 
<p>text paragraph 2 (leaf)</p>

代码：

doc = Nokogiri::HTML.fragment("- the html above -") 
result = doc.xpath("*[not(child::*)]") 


[#<Nokogiri::XML::Element:0x3febf50f9328 name="p" children=[#<Nokogiri::XML::Text:0x3febf519b718 "text paragraph 2 (leaf)">]>]

但这仅支持XPath给了我最后一个 “P”。我想要的就像一个平坦的行为，只返回叶节点。

下面是计算器一些参考答案：

How to select all leaf nodes using XPath expression?

XPath - Get node with no child of specific type

感谢

来源

2013-07-26 Luccas

你想要什么值？ –

文本上有（叶）的所有节点 – Luccas

@Luccas：你只想要文本，还是你想要包含元素？即你想'文本段落（叶）'还是'

文本段落（叶）

'？如果你只想要文本，你想单独使用所有的文本节点，还是只需要将所有文本作为单个字符串进行拼接？ – Borodin

与您的代码的问题是语句：

doc = Nokogiri::HTML.fragment("- the html above -")

在这里看到：

require 'nokogiri' 

html = <<END_OF_HTML 
<div> 
    <div> 
    <div>text div (leaf)</div> 
    <p>text paragraph (leaf)</p> 
    </div> 
</div> 
<p>text paragraph 2 (leaf)</p> 
END_OF_HTML 


doc = Nokogiri::HTML(html) 
#doc = Nokogiri::HTML.fragment(html) 
results = doc.xpath("//*[not(child::*)]") 
results.each {|result| puts result} 

--output:-- 
<div>text div (leaf)</div> 
<p>text paragraph (leaf)</p> 
<p>text paragraph 2 (leaf)</p>

如果我运行此：

doc = Nokogiri::HTML.fragment(html) 
results = doc.xpath("//*[not(child::*)]") 
results.each {|result| puts result}

我得到没有输出。

来源

2013-07-26 20:16:35 7stud

请参阅https://github.com/sparklemotion/nokogiri/issues/213和https://github.com/sparklemotion/nokogiri/issues/572 – Phrogz

可以使用发现，没有子元素的所有元素节点：

//*[not(*)]

例如：

require 'nokogiri' 

doc = Nokogiri::HTML.parse <<-end 
<div> 
    <div> 
    <div>text div (leaf)</div> 
    <p>text paragraph (leaf)</p> 
    </div> 
</div> 
<p>text paragraph 2 (leaf)</p> 
end 

puts doc.xpath('//*[not(*)]').length 
#=> 3 

doc.xpath('//*[not(*)]').each do |e| 
    puts e.text 
end 
#=> "text div (leaf)" 
#=> "text paragraph (leaf)" 
#=> "text paragraph 2 (leaf)"

来源

2013-07-26 20:14:37

在XPath中，文本本身就是一个节点 - 所以给出你的评论，你只想选择标签内容，而不是包含内容的标签 - 但是你会捕获一个<br/>（如果有的话）。

我猜你正在寻找所有不含有其他元素（标签）的元素（这是不准确什么你一直要求的） - 那么你的罚款与@Justin柯的答案和使用XPath表达式

//*[not(*)]

如果你真的想寻找所有叶子节点，您不能使用*选择，但需要使用node()：

//node()[not(node())]

节点可以是元素，但也可以是文本节点，注释，处理指令，属性甚至是XML文档（但不能在其他元素中出现）。

如果你真的只希望文本节点，去为//text()像@Priti提出，这的确有点，而不是什么叶子节点都是选择正是你要求（通过突出显示的节点定义为）。

来源

2013-07-26 21:38:37

如何仅选择Nokogiri的叶节点？

回答

相关问题