2015-10-23 26 views
0

我改变一个XML表格导入到某个HTML表格,并有做节点的一些节点重新安排。显示引入nokogiri儿童为原料HTML而不是>标签<

为了实现转型,我刮XML,把它变成一个二维数组,然后建立新的HTML输出。

但是一些单元格中有HTML标签,并且在我的转换<su>变成&gt;su&lt;后。

的XML数据是:

<BOXHD> 
    <CHED H="1">Disc diameter, inches (cm)</CHED> 
    <CHED H="1">One-half or more of disc covered</CHED> 
    <CHED H="2">Number <SU>1</SU> 
    </CHED> 
    <CHED H="2">Exhaust foot <SU>3</SU>/min.</CHED> 
    <CHED H="1">Disc not covered</CHED> 
    <CHED H="2">Number <SU>1</SU> 
    </CHED> 
    <CHED H="2">Exhaust foot<SU>3</SU>/min.</CHED> 
</BOXHD> 

我正在将其转换为HTML表的步骤是:

class TableCell 

    attr_accessor :text, :rowspan, :colspan 

    def initialize(text='') 
     @text = text 
     @rowspan = 1 
     @colspan = 1 
    end  
end 
@frag = Nokogiri::HTML(xml) 

# make a 2d array to store how the cells should be arranged 
column = 0 
prev_row = -1 
@frag.xpath("boxhd/ched").each do |ched| 
    row = ched.xpath("@h").first.value.to_i - 1 
    if row <= prev_row 
    column +=1 
    end 
    prev_row = row 
    @data[row][column] = TableCell.new(ched.inner_html) 
end 

# methods to find colspan and rowspan, put them in @data 
# ... snip ... 

# now build an html table 
doc = Nokogiri::HTML::DocumentFragment.parse "" 
Nokogiri::HTML::Builder.with(doc) do |html| 
    html.table { 
    @data.each do |tr| 
     html.tr { 
     tr.each do |th| 
      next if th.nil? 
      html.th(:rowspan => th.rowspan, :colspan => th.colspan).table_header th.text 
     end 
     } 
    end 
    } 
end 

这给出了以下的HTML(注意标被转义):

<table> 
    <tr> 
     <th rowspan="2" colspan="1" class="table_header">Disc diameter, inches (cm)</th> 
     <th rowspan="1" colspan="2" class="table_header">One-half or more of disc covered</th> 
     <th rowspan="1" colspan="2" class="table_header">Disc not covered</th> 
    </tr> 
    <tr> 
     <th rowspan="1" colspan="1" class="table_header">Number &lt;su&gt;1&lt;/su&gt; </th> 
     <th rowspan="1" colspan="1" class="table_header">Exhaust foot &lt;su&gt;3&lt;/su&gt;/min.</th> 
     <th rowspan="1" colspan="1" class="table_header">Number &lt;su&gt;1&lt;/su&gt;</th> 
     <th rowspan="1" colspan="1" class="table_header">Exhaust foot&lt;su&gt;3&lt;/su&gt;/min.</th> 
    </tr> 
</table> 

如何获取原始HTML而不是实体?

我试着这些没有成功

@data[row][column] = TableCell.new(ched.children) 
@data[row][column] = TableCell.new(ched.children.to_s) 
@data[row][column] = TableCell.new(ched.to_s) 

回答

0

我放弃了建设者,并简单地构建了HTML:

headers = html_headers() 

def html_headers() 

    rows = Array.new 
    @data.each do |row| 
     cells = Array.new 
     row.each do |cell| 
      next if cell.nil? 
      cells << "<th rowspan=\"%d\" colspan=\"%d\">%s</th>" % 
         [cell.rowspan, 
         cell.colspan, 
         cell.text] 
     end 
     rows << "<tr>%s</tr>" % cells.join 
    end 
    rows.join 

end 

def replace_nodes(headers) 

    # ... snip ... 

    @frag.xpath("boxhd").each do |old| 
     puts "replacing boxhd..." 
     old.replace headers 
    end 

    # ... snip ... 

end 

我不明白为什么,但现在看来,文本我更换了<BOXHD>标签与被解析和搜索,因为我能够标记名称从数据cell.text改变。

1

这可以帮助你了解发生了什么:

require 'nokogiri' 

doc = Nokogiri::XML('<root><foo></foo></root>') 

doc.at('foo').content = '<html><body>bar</body></html>' 
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo>&lt;html&gt;&lt;body&gt;bar&lt;/body&gt;&lt;/html&gt;</foo>\n</root>\n" 

doc.at('foo').children = '<html><body>bar</body></html>' 
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo>\n <html>\n  <body>bar</body>\n </html>\n </foo>\n</root>\n" 

doc.at('foo').children = Nokogiri::XML::Document.new.create_cdata '<html><body>bar</body></html>' 
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo><![CDATA[<html><body>bar</body></html>]]></foo>\n</root>\n" 
相关问题