红宝石 - 穿越的引入nokogiri元素

我有一个这样的HTML：红宝石 - 穿越的引入nokogiri元素

... 
<table> 
<tbody> 
    ... 
    <tr> 
    <th> head </th> 
    <td> td1 text<td> 
    <td> td2 text<td> 
    ... 
    </tr> 
</tbody> 
<tfoot> 
</tfoot> 
</table> 

...

我使用的引入nokogiri红宝石。我想遍历每一行，并将th和相应的td的文本变成一个散列。

来源

2011-06-06 Sayuj

require "nokogiri" 

#Parses your HTML input 
html_data = "...stripped HTML markup code..." 
html_doc = Nokogiri::HTML html_data 

#Iterates over each row in your table 
#Note that you may need to clarify the CSS selector below 
result = html_doc.css("table tr").inject({}) do |all, row| 

    #Modify if you need to collect only the first td, for example 
    all[row.css("th").text] = row.css("td").text 

end

来源

2011-06-06 07:57:01

我没有运行此代码，所以我没有绝对的把握，但总体思路应该是正确的：

html_doc = Nokogiri::HTML("<html> ... </html>") 
result = [] 
html_doc.xpath("//tr").each do |tr| 
    hash = {} 
    tr.children.each do |node| 
    hash[node.node_name] = node.content 
    end 
    result << hash 
end 
puts result.inspect

查看文档的更多信息：http://nokogiri.org/Nokogiri/XML/Node.html

来源

2011-06-06 07:53:30

我m作为孩子得到一些空的节点。我如何跳过这些？ – Sayuj 2011-06-06 08:51:22

tr.css（“th”）和tr.css（“td”）将执行此操作。谢谢Evgeny。 – Sayuj 2011-06-06 08:54:14

红宝石 - 穿越的引入nokogiri元素

回答

相关问题