2014-11-04 96 views
0

我一直在为此奋斗了很长一段时间。将特定的HTML结构转换为特定的XML结构

我想将html转换为xml。结构如下所示。

我正在使用“HtmlAgilityPack”将html转换为有效的xml结构。所以,在此之后,我的HTML看起来像这样:

<div class="menuItem1" video="" preview=""> 
    Menu 1 
    <div class="subMenu1"> 
     <div class="menuItem2" video="" preview=""> 
      Menu 2 
      <div class="subMenu2"> 
       <div class="menuItem3" video="" preview=""> 
        Menu 3 
        <div class="subMenu3"> 
         <div class="" video="" preview="">Menu 4</div> 
        </div> 
        <div class="treeExpand"></div> 
       </div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
      </div> 
      <div class="treeExpand"></div> 
     </div> 
    </div> 
    <div class="treeExpand"></div> 
</div> 
<div class="menuItem1" video="" preview=""> 
    Menu 1 
    <div class="subMenu1"> 
     <div class="menuItem2" video="" preview=""> 
      Menu 2 
      <div class="subMenu2"> 
       <div class="menuItem3" video="" preview=""> 
        Menu 3 
        <div class="subMenu3"> 
         <div class="" video="" preview="">Menu 4</div> 
        </div> 
        <div class="treeExpand"></div> 
       </div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
      </div> 
      <div class="treeExpand"></div> 
     </div> 
    </div> 
    <div class="treeExpand"></div> 
</div> 

这正是我想要的。现在我能得到这个成的XElement,使用该C#代码:

XDocument doc = XDocument.Parse(THE_HTML_STRING_AS_SHOWN_ABOVE); 
XDocument docw = new XDocument(new XElement("Navigation", doc.Root)); 
XElement root = docw.Root; 

我创建一个方法,该方法我可以通过根成:

GenerateXmlFromHtml(root); 

此方法的代码:

private string GenerateXmlFromHtml(XElement elem) 
{ 
    StringBuilder sbNavigationXml = new StringBuilder(); 
    try 
    { 
     //HTML will always have a video and preview, according to the generation of the html structure. 

     string text = string.Empty; 
     string videopath = string.Empty; 
     string previewpath = string.Empty; 
     XText textNode; 

     foreach (XElement element in elem.Elements()) 
     { 
      element.Name = "MenuItem"; //Change element name. 

      string htmlClass; 
      try { htmlClass = element.Attribute("class").Value; } 
      catch { htmlClass = ""; } 

      if (!string.IsNullOrEmpty(htmlClass)) 
      { 
       if (htmlClass.Contains("subMenu")) 
       { 
        element.AddBeforeSelf(element.Elements()); 
        element.Remove(); 
        GenerateXmlFromHtml(element); 
       } 
       else if (htmlClass.Contains("menuItem")) 
       { 
        textNode = element.Nodes().OfType<XText>().FirstOrDefault(); 
        text = textNode.Value; 
        videopath = element.Attribute("video").Value; 
        previewpath = element.Attribute("preview").Value; 

        if (element.HasElements) 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\">"); 
         sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
         sbNavigationXml.AppendLine("</MenuItem>"); 
        } 
        else 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\" />"); 
        } 
       } 
       else if (htmlClass.Contains("treeExpand")) 
       { 
        element.AddBeforeSelf(element.Elements()); 
        element.Remove(); 
        GenerateXmlFromHtml(element); 
       } 
      } 
      else 
      { 
       element.AddBeforeSelf(element.Elements()); 
       element.Remove(); 
       GenerateXmlFromHtml(element); 
      } 
     } 
    } 
    catch (Exception) 
    { 
     throw; 
    } 
    return sbNavigationXml.ToString(); 
} 

最后,我想这产生XML输出:

<Navigation> 
    <MenuItem Text="Menu 1" VideoPath="" PreviewPath=""> 
    <MenuItem Text="Menu 2"> 
     <MenuItem Text="Menu 3"> 
     <MenuItem Text="Menu 4" VideoPath="" PreviewPath="" /> 
     </MenuItem> 
     <MenuItem Text="Menu 3" /> 
     <MenuItem Text="Menu 3" /> 
    </MenuItem> 
    </MenuItem> 
    <MenuItem Text="Menu 1" VideoPath="" PreviewPath=""> 
    <MenuItem Text="Menu 2"> 
     <MenuItem Text="Menu 3"> 
     <MenuItem Text="Menu 4" VideoPath="" PreviewPath="" /> 
     </MenuItem> 
     <MenuItem Text="Menu 3" /> 
     <MenuItem Text="Menu 3" /> 
    </MenuItem> 
    </MenuItem> 
</Navigation> 

换句话说,子菜单应该消失,并且树扩展div,然后我想生成XML,但目前,我仍然失败悲惨。请问是否有不清楚的地方。任何帮助赞赏!

============================================== ================================================== ===

编辑: 固定递归方法,任何人谁希望看到:

private string GenerateXmlFromHtml(XElement elem) 
{ 
    //HTML will always have a video and preview, according to the generation of the html structure. 
    StringBuilder sbNavigationXml = new StringBuilder(); 
    string text = string.Empty; 
    string videopath = string.Empty; 
    string previewpath = string.Empty; 
    XText textNode; 

    try 
    { 
     foreach (XElement element in elem.Elements()) 
     { 
      //element.Name = "MenuItem"; //Change element name. 
      string htmlClass; 
      try { htmlClass = element.Attribute("class").Value; } 
      catch { htmlClass = ""; } 

      if (!string.IsNullOrEmpty(htmlClass)) 
      { 
       if (htmlClass.Contains("subMenu")) 
       { 
        if (element.HasElements) 
        { 
         sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
        } 
       } 
       else if (htmlClass.Contains("menuItem")) 
       { 
        textNode = element.Nodes().OfType<XText>().FirstOrDefault(); //Get node Text attribute value. 
        text = textNode.Value; 
        videopath = element.Attribute("video").Value; //Get node VideoPath attribute value. 
        previewpath = element.Attribute("preview").Value; //Get node PreviewPath attribute value. 

        if (element.HasElements) 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\">"); 
         sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
         sbNavigationXml.AppendLine("</MenuItem>"); 
        } 
        else 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\" />"); 
        } 
       } 
       else if (htmlClass.Contains("treeExpand")) 
       { 
        //DO NOTHING 
       } 
      } 
      else 
      { 
       if (element.HasElements) 
       { 
        sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
       } 
      } 
     } 
    } 
    catch (Exception) 
    { 
     throw; 
    } 
    return sbNavigationXml.ToString(); 
} 
+0

边注:通常人们把它搞砸其他方式周围 - 解析与正则表达式的HTML,但仍构造XML适当的API。有什么原因需要使用字符串连接来构建XML? – 2014-11-04 15:12:57

+0

@AlexeiLevenkov - 不,我可以做我想做的任何事情......这只是我采用的路径,但其他任何产生XML输出的东西都可以,即使我必须做一些完全不同的事情。 – 2014-11-04 15:14:26

+0

查看[如何在C#中构建XML](http://stackoverflow.com/questions/284324/how-can-i-build-xml-in-c)以获取指导。 – 2014-11-04 15:15:34

回答

1

尝试在不同的文件分离的输入和输出。

然后导航输入并开始以您想要的格式输出到您的XmlDocument输出(另一个变量)。

喜欢的东西...

class Converter 
{ 
    public XmlDocument Convert(XmlDocument inputDocument) 
    { 
     XmlDocument result = new XmlDocument(); 
     ConvertNode(inputDocument.DocumentElement, result.DocumentElement, result); 
     return result; 
    } 

    public void ConvertNode(XmlNode inputNode, XmlNode outputNode, XmlDocument outputDoc) 
    { 
     XmlNode newNode = null; 

     // check elemment class 
     string htmlClass; 
     try { htmlClass = inputNode.Attributes["class"].Value; } 
     catch { htmlClass = ""; } 

     if(!string.IsNullOrWhiteSpace(htmlClass)) 
     { 
      if (htmlClass.Contains("menuItem")) 
      { 
       newNode = outputDoc.CreateElement("MenuItem"); 
       outputNode.AppendChild(newNode); 
      } 

      /// check other wanted nodes etc.. 
     } 

     if (newNode != null) 
     { 
      foreach (XmlNode node in inputNode.ChildNodes) 
      { 
       ConvertNode(node, newNode, outputDoc); 
      } 
     } 
    } 
} 
+0

我正在使用解析器。这就是我如何将html转换为有效的xml结构,然后使用XElement将其作为正常的xml处理。 – 2014-11-04 15:24:07

+0

我不明白这是如何回答这个问题的......你有没有在帖子中看过代码? – 2014-11-04 15:30:19

+0

@AlexeiLevenkov我想分离仍然适用。我改变了重要的部分。 – rodrigogq 2014-11-04 15:34:09