2013-07-16 41 views
0

前段时间,我编写了一个用于将ONIX文件导入零售数据库系统的过程。 (ONIX是出版商用于发布其目录信息的XML标准。)该过程将XML文件直接导入到数据集中,并且对于我们接收的大多数文件来说运行良好,但偶尔也会有例外。

在这种特殊情况下,我试图导入的文件在产品描述字段中包含HTML标签,这与标准Dataset.ReadXML()方法混淆,因为它试图将HTML标签解释为XML。一些ONIX文件包括避免这个问题CDATA标签,但是在这种情况下,发布已经选择使用一个标签属性来指定该字段是HTML格式,例如:将ONIX XML导入为忽略HTML标记的数据集

<othertext> 
     <d102>03</d102> 
     <d104 textformat="05"> 
      <p>Enter a world where bloody battles, and heroic deeds combine in the historic struggle to unite Britain in the face of a common enemy.</p> 
      <p>The third instalment in Bernard Cornwell’s King Alfred series, follows on from the outstanding previous novels The Last Kingdom and The Pale Horseman.</p> 
      <p>The year is 878 and the Vikings have been thrown out of Wessex. Uhtred, fresh from fighting for Alfred in the battle to free Wessex, travels north to seek revenge for his father's death, killed in a bloody raid by Uhtred's old enemy, renegade Danish lord, Kjartan.</p> 
      <p>While Kjartan lurks in his formidable stronghold of Dunholm, the north is overrun by chaos, rebellion and fear. Together with a small band of warriors, Uhtred plans his attack on his enemy, revenge fuelling his anger, resolute on bloody retribution. But, he finds himself betrayed and ends up on a desperate slave voyage to Iceland. Rescued by a remarkable alliance of old friends and enemies, he and his allies, together with Alfred the Great, are free to fight once more in a battle for power, glory and honour.</p> 
      <p>‘The Lords of the North’ is a tale of England's making, a powerful story of betrayal, struggle and romance, set in an England torn apart by turmoil and upheaval.</p> 
     </d104> 
    </othertext> 

的TextFormat =“05”属性表示HTML。

如果不编写用于解释HTML的自定义代码,是否仍然可以使用ReadXML()导入它,还是需要先编程插入CDATA标签才能解决它?

注意:我不想删除HTML标记,因为数据将显示在网站上。

回答

1

这是Linqpad中的程序,它应该找到textformat = 05节点并将它们的内容包装在CData节中。看到这个stackoverflow post

void Main() 
{ 
    string xml = @"<othertext> 
      <d102>03</d102> 
      <d104 textformat=""05""> 
       <p>Enter a world where bloody battles, and heroic deeds combine in the historic struggle to unite Britain in the face of a common enemy.</p> 
       <p>The third instalment in Bernard Cornwell’s King Alfred series, follows on from the outstanding previous novels The Last Kingdom and The Pale Horseman.</p> 
       <p>The year is 878 and the Vikings have been thrown out of Wessex. Uhtred, fresh from fighting for Alfred in the battle to free Wessex, travels north to seek revenge for his father's death, killed in a bloody raid by Uhtred's old enemy, renegade Danish lord, Kjartan.</p> 
       <p>While Kjartan lurks in his formidable stronghold of Dunholm, the north is overrun by chaos, rebellion and fear. Together with a small band of warriors, Uhtred plans his attack on his enemy, revenge fuelling his anger, resolute on bloody retribution. But, he finds himself betrayed and ends up on a desperate slave voyage to Iceland. Rescued by a remarkable alliance of old friends and enemies, he and his allies, together with Alfred the Great, are free to fight once more in a battle for power, glory and honour.</p> 
       <p>‘The Lords of the North’ is a tale of England's making, a powerful story of betrayal, struggle and romance, set in an England torn apart by turmoil and upheaval.</p> 
      </d104> 
     </othertext>"; 

    XmlDocument xmlDoc = new XmlDocument(); 
    xmlDoc.LoadXml(xml); 
    var nodes = xmlDoc.SelectNodes("//othertext/*[@textformat='05']"); 
    foreach(XmlNode node in nodes) 
    { 
     var cdata = xmlDoc.CreateCDataSection(node.InnerXml); 
     node.InnerText = string.Empty; 
     node.AppendChild(cdata); 
     node.InnerXml.Dump(); 
    } 
} 
+0

谢谢 - 这是诀窍!我必须做的唯一的调整是在下面的行(注意双斜线): 'var nodes = xmlDoc.SelectNodes(“// othertext/* [@ textformat = '05']”);' – Billious

+0

I更正了你的更正答案。谢谢 –