Nokogiri如何解析DOM中的HTML格式字符串

我一直在寻找Nokogiri源代码，但还没有得到Nokogiri如何将字符串解析为元素。源代码可以在这里找到：Nokogiri如何解析DOM中的HTML格式字符串

https://github.com/sparklemotion/nokogiri/tree/master/lib/nokogiri

例如：我有一个字符串：

raw = "<html> <body> body <div>this is div </div> </body> <html>" 

Nokogiri::HTML(raw) 
=> 
#(Document:0x4d0c786 { 
    name = "document", 
    children = [ 
    #(DTD:0x4d0bc6e { name = "html" }), 
    #(Element:0x4cfa46e { 
     name = "html", 
     children = [ 
     #(Element:0x4cf9bfe { 
      name = "body", 
      children = [ 
      #(Text "body"), 
      #(Element:0x4cf9348 { 
       name = "div", 
       children = [ #(Text "this is div")] 
       })] 
      })] 
     })] 
    })

我期待到nokogiri/lib/nokogiri/xml/sax，我没有看到任何地方它是如何解释HTML字符串。当我尝试阅读源代码时，我发现在上面的输出中，有数据类型Element，但我没有看到声明class Element的源代码中的任何地方。

一般来说，任何人都可以帮我解释一下Nokogiri如何将字符串解析为上面的数据结构？

来源

2012-12-11 qusr

Nokogiri使用[libxml2]（http://www.xmlsoft.org/），一个本地C库。它是libxml2，实际上是在解析。 –

谢谢。你知道红宝石如何与libxml2交互？ – qusr

你可能不得不看看C的东西（https://github.com/sparklemotion/nokogiri/tree/master/ext/nokogiri） –

如前所述，Nokogiri使用libxml2来处理实际的解析。这是使用本机（读取：用C编码）Ruby扩展完成的。 Ruby有一个well documented标准接口来构建本地扩展。 Here is a good guide。

来源

2012-12-12 09:19:13

Nokogiri如何解析DOM中的HTML格式字符串

回答

相关问题