2012-12-23 51 views
0

我想将一个html文档转换为c#对象。我有一个有序列表中的名称示例列表,如下所示。我正在使用Html Agility Pack。HTML到C#对象,递归函数?

<ol> 
    <li>Heather</li> 
    <li>Channing</li> 
    <li>Briana</li> 
    <li>Amber</li> 
    <li>Sabrina</li> 
    <li>Jessica 
     <ol> 
      <li>Melody</li> 
      <li>Dakota</li> 
      <li>Sierra</li> 
      <li>Vandi</li> 
      <li>Crystal</li> 
      <li>Samantha</li> 
      <li>Autumn</li> 
      <li>Ruby</li> 
     </ol></li> 
    <li>Taylor</li> 
    <li>Tara</li> 
    <li>Tammy</li> 
    <li>Laura</li> 
    <li>Shelly</li> 
    <li>Shantelle</li> 
    <li>Bob and Alice 
     <ol> 
     <li>Courtney</li> 
     <li>Misty</li> 
     <li>Jenny</li> 
     <li>Christa</li> 
     <li>Mindy</li> 
     </ol></li> 
    <li>Noel</li> 
    <li>Shelby</li> 
</ol> 

这些是我创建的代表名称列表的对象。即人和他们的孩子。

public class PeopleList { 
    public List<Person> People {get; set;} 
} 

public class Person { 
    public string Name {get; set;} 
    public PeopleList Children {get; set;} 
} 

我在想,要创建这些对象,递归函数将是最好的。任何人都可以提供有关如何将HTML转换为C#对象的任何想法?

Abu。

+0

只是使用XPath来获取文档的一部分,然后用递归函数返回嵌套列表 –

+0

你有什么例子吗?我一直在尝试使用递归函数,但无法找出它 –

+0

你想如何填充它a)具有Person类型的子类的人员PeopleList或b)具有其PeopleList的人员列表? – Anthill

回答

2
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(html); 

var list = Recurse(doc.DocumentNode); 

List<Person> Recurse(HtmlAgilityPack.HtmlNode root) 
{ 
    var ol = root.Element("ol"); 
    if (ol == null) return null; 

    return ol.Elements("li") 
       .Select(li => new Person 
       { 
        Name = li.FirstChild.InnerText.Trim(), 
        Children = Recurse(li) 
       }) 
       .ToList(); 
} 
0

我将调查HTMLAgilityPack http://htmlagilitypack.codeplex.com/

我还没有使用它的这个特别,但它的作品真的很好解析HTML。

+0

我已经使用HtmlAgilityPack了。试图弄清楚如何将这个名单和孩子列表解析成C#对象。 –

0

为了好玩 - 或者如果你真的想与他们PeopleList有人员PeopleList的列表:P - 你可以这样做(无需HtmlAgilityPack为您发布的代码) :

namespace StackFun 
{ 
    using System.Collections.Generic; 
    using System.Linq; 
    using System.Xml.Linq; 

    public class PeopleList 
    { 
     public List<Person> People { get; set; } 
    } 

    public class Person 
    { 
     public string Name { get; set; } 
     public PeopleList Children { get; set; } 
    } 

    class Program 
    { 
     static IEnumerable<PeopleList> GetChildren(PeopleList parent, IEnumerable<XElement> children) 
     { 
      parent.People = new List<Person>(); 
      foreach (var child in children) 
      { 
       var person = new Person 
       { 
        Name = ((XText)child.FirstNode).Value.Trim(new[] { ' ', '\r', '\n' }), 
       }; 
       parent.People.Add(person); 
       foreach (var childrenOf in child.Elements("ol").SelectMany(BuildFromXml)) 
       { 
        person.Children = childrenOf; 
       } 
      } 
      yield return parent; 

     } 

     static IEnumerable<PeopleList> BuildFromXml(XElement node) 
     { 
      return GetChildren(new PeopleList(), node.Elements("li")); 
     } 

     static void Main(string[] args) 
     { 
      const string xml = @"<ol> 
      <li>Heather</li> 
      <li>Channing</li> 
      <li>Briana</li> 
      <li>Amber</li> 
      <li>Sabrina</li> 
      <li>Jessica 
       <ol> 
        <li>Melody</li> 
        <li>Dakota</li> 
        <li>Sierra</li> 
        <li>Vandi</li> 
        <li>Crystal</li> 
        <li>Samantha</li> 
        <li>Autumn</li> 
        <li>Ruby</li> 
       </ol></li> 
      <li>Taylor</li> 
      <li>Tara</li> 
      <li>Tammy</li> 
      <li>Laura</li> 
      <li>Shelly</li> 
      <li>Shantelle</li> 
      <li>Bob and Alice 
       <ol> 
       <li>Courtney</li> 
       <li>Misty</li> 
       <li>Jenny</li> 
       <li>Christa</li> 
       <li>Mindy</li> 
       </ol></li> 
      <li>Noel</li> 
      <li>Shelby</li> 
     </ol>"; 

      var doc = XDocument.Parse(xml); 
      var listOfPeople = BuildFromXml(doc.Root).ToList(); 
     } 
    } 
} 

你可能想,虽然什么(猜你没有指定),是人民和他们的孩子的名单,你可以开始使用:

static IEnumerable<Person>Populate(IEnumerable<XElement> children) 
{ 
    foreach (var child in children) 
    { 
      var person = new Person 
      { 
       Name = ((XText)child.FirstNode).Value.Trim(new[] { ' ', '\r', '\n' }), 
       Children = new PeopleList() 

      }; 
      person.Children.People = new List<Person>(); 
      foreach (var childrenOf in child.Elements("ol").SelectMany(BuildFromXml)) 
      { 
       person.Children.People.Add(childrenOf); 
      } 
      yield return person; 
    } 

} 

static IEnumerable<Person> BuildFromXml(XElement node) 
{ 
    return Populate(node.Elements("li")); 
} 

如果你希望(或需要)使用HtmlAgilityPack的代码可能看起来像:

class Program 
{ 
    static IEnumerable<Person> Populate(IEnumerable<HtmlNode> children) 
    { 
     foreach (var child in children) 
     { 
      var person = new Person 
      { 
       Name = child.InnerText.Split(new char[] { '\r', '\n' })[0].Trim(), 
       Children = new PeopleList() 

      }; 
      person.Children.People = new List<Person>(); 
      foreach (var childrenOf in child.Elements("ol").SelectMany(BuildFromHtml)) 
      { 
       person.Children.People.Add(childrenOf); 
      } 
      yield return person; 
     } 


    } 
    static IEnumerable<Person> BuildFromHtml(HtmlNode node) 
    { 
     return Populate(node.Elements("li")); 
    } 

    static void Main(string[] args) 
    { 
     const string html = @"<ol> 
      <li>Heather</li> 
      <li>Channing</li> 
      <li>Briana</li> 
      <li>Amber</li> 
      <li>Sabrina</li> 
      <li>Jessica 
       <ol> 
        <li>Melody</li> 
        <li>Dakota</li> 
        <li>Sierra</li> 
        <li>Vandi</li> 
        <li>Crystal</li> 
        <li>Samantha</li> 
        <li>Autumn</li> 
        <li>Ruby</li> 
       </ol></li> 
      <li>Taylor</li> 
      <li>Tara</li> 
      <li>Tammy</li> 
      <li>Laura</li> 
      <li>Shelly</li> 
      <li>Shantelle</li> 
      <li>Bob and Alice 
       <ol> 
       <li>Courtney</li> 
       <li>Misty</li> 
       <li>Jenny</li> 
       <li>Christa</li> 
       <li>Mindy</li> 
       </ol></li> 
      <li>Noel</li> 
      <li>Shelby</li> 
     </ol>"; 

     var doc = new HtmlDocument(); 
     doc.LoadHtml(html); 
     var listOfPeople = BuildFromHtml(doc.DocumentNode.FirstChild).ToList(); 
    } 
}