2009-05-18 160 views
1

一个可能(工作)解决方法:

Private Sub ReadXMLAttributes(ByVal oXML As String) 
    ReadXMLAttributes(oXML, "mso-infoPathSolution") 
End Sub 
Private Sub ReadXMLAttributes(ByVal oXML As String, ByVal oTagName As String) 
    Try 
     Dim XmlDoc As New Xml.XmlDocument 
     XmlDoc.LoadXml(oXML) 
     oFileInfo = New InfoPathDocument 
     Dim XmlNodes As Xml.XmlNodeList = XmlDoc.GetElementsByTagName(oTagName) 
     For Each xNode As Xml.XmlNode In XmlNodes 
      With xNode 
       oFileInfo.SolutionVersion = .Attributes(InfoPathSolution.solutionVersion).Value 
       oFileInfo.ProductVersion = .Attributes(InfoPathSolution.productVersion).Value 
       oFileInfo.PIVersion = .Attributes(InfoPathSolution.PIVersion).Value 
       oFileInfo.href = .Attributes(InfoPathSolution.href).Value 
       oFileInfo.name = .Attributes(InfoPathSolution.name).Value 
      End With 
     Next 
    Catch ex As Exception 
     MsgBox(ex.Message, MsgBoxStyle.OkOnly, "ReadXMLAttributes") 
    End Try 
End Sub 

这工作,但它仍然会受到来自下面的问题,如果属性被重新排序。我能想到的避免这个问题的唯一方法是将属性名称硬编码到我的程序中,并让它通过循环解析标签并搜索指定标签来处理条目。读取XML标签信息

注:InfoPathDocument是一个自定义类我做了,这是什么复杂:

Public Class InfoPathDocument 
    Private _sVersion As String 
    Private _pVersion As String 
    Private _piVersion As String 
    Private _href As String 
    Private _name As String 
    Public Property SolutionVersion() As String 
     Get 
      Return _sVersion 
     End Get 
     Set(ByVal value As String) 
      _sVersion = value 
     End Set 
    End Property 
    Public Property ProductVersion() As String 
     Get 
      Return _pVersion 
     End Get 
     Set(ByVal value As String) 
      _pVersion = value 
     End Set 
    End Property 
    Public Property PIVersion() As String 
     Get 
      Return _piVersion 
     End Get 
     Set(ByVal value As String) 
      _piVersion = value 
     End Set 
    End Property 
    Public Property href() As String 
     Get 
      Return _href 
     End Get 
     Set(ByVal value As String) 
      If value.ToLower.StartsWith("file:///") Then 
       value = value.Substring(8) 
      End If 
      _href = Form1.PathToUNC(URLDecode(value)) 
     End Set 
    End Property 
    Public Property name() As String 
     Get 
      Return _name 
     End Get 
     Set(ByVal value As String) 
      _name = value 
     End Set 
    End Property 
    Sub New() 

    End Sub 
    Sub New(ByVal oSolutionVersion As String, ByVal oProductVersion As String, ByVal oPIVersion As String, ByVal oHref As String, ByVal oName As String) 
     SolutionVersion = oSolutionVersion 
     ProductVersion = oProductVersion 
     PIVersion = oPIVersion 
     href = oHref 
     name = oName 
    End Sub 
    Public Function URLDecode(ByVal StringToDecode As String) As String 
     Dim TempAns As String = String.Empty 
     Dim CurChr As Integer = 1 
     Dim oRet As String = String.Empty 
     Try 
      Do Until CurChr - 1 = Len(StringToDecode) 
       Select Case Mid(StringToDecode, CurChr, 1) 
        Case "+" 
         oRet &= " " 
        Case "%" 
         oRet &= Chr(Val("&h" & Mid(StringToDecode, CurChr + 1, 2))) 
         CurChr = CurChr + 2 
        Case Else 
         oRet &= Mid(StringToDecode, CurChr, 1) 
       End Select 
       CurChr += 1 
      Loop 
     Catch ex As Exception 
      MsgBox(ex.Message, MsgBoxStyle.OkOnly, "URLDecode") 
     End Try 
     Return oRet 
    End Function 
End Class 

原始的问题

我工作的一个项目,需要一个XML文档的阅读,尤其是保存来自Microsoft InfoPath的表单。

这里是什么,我将与一起正在与可能有帮助的一些背景资料,一个简单的例子:

<?xml version="1.0" encoding="UTF-8"?> 
<?mso-infoPathSolution solutionVersion="1.0.0.2" productVersion="12.0.0" PIVersion="1.0.0.0" href="file:///C:\Users\darren\Desktop\simple_form.xsn" name="urn:schemas-microsoft-com:office:infopath:simple-form:-myXSD-2009-05-15T14-16-37" ?> 
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?> 
<my:myFields xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2009-05-15T14:16:37" xml:lang="en-us"> 
    <my:first_name>John</my:first_name> 
    <my:last_name>Doe</my:last_name> 
</my:myFields> 

我现在的目标是提取VERSIONID和形式的位置。与正则表达式很容易:

Dim _doc As New XmlDocument 
_doc.Load(_thefile) 
Dim oRegex As String = "^solutionVersion=""(?<sVersion>[0-9.]*)"" productVersion=""(?<pVersion>[0-9.]*)"" PIVersion=""(?<piVersion>[0-9.]*)"" href=""(?<href>.*)"" name=""(?<name>.*)""$" 
Dim rx As New Regex(oRegex), m As Match = Nothing 
For Each section As XmlNode In _doc.ChildNodes 
    m = rx.Match(section.InnerText.Trim) 
    If m.Success Then 
     Dim temp As String = m.Groups("name").Value.Substring(m.Groups("name").Value.ToLower.IndexOf("infopath") + ("infopath").Length + 1) 
     fileName = temp.Substring(0, temp.LastIndexOf(":")) 
     fileVersion = m.Groups("sVersion").Value 
    End If 
Next 

,这有效的解决方案带来了唯一的问题是,如果在InfoPath文件头中的架构更改...例如解决方案的版本和产品版本属性交换位置(微软喜欢做的事情像这样,似乎)。

所以我选择尝试使用VB.NET的XML解析能力来帮助我实现上述结果sans-regex。

ChildNode从包含我需要的信息_doc对象,但它不具有任何的childNodes:

_doc.ChildNode(1).HasChildNodes = False 

谁能帮我这个?

回答

1

处理指令是XML文档的一部分,但其属性不会被解析。试试这个代码:

// Load the original xml... 
var xml = new XmlDocument(); 
xml.Load(_thefile); 

// Select out the processing instruction... 
var infopathProcessingInstruction = xml.SelectSingleNode("/processing-instruction()[local-name(.) = \"mso-infoPathSolution\"]"); 

// Since the processing instruction does not expose it's attributes, create a new XML document... 
var xmlInfoPath = new XmlDocument(); 
xmlInfoPath.LoadXml("<data " + infopathProcessingInstruction.InnerText + " />"); 

// Get the data... 
var solutionVersion = xmlInfoPath.DocumentElement.GetAttribute("solutionVersion"); 
var productVersion = xmlInfoPath.DocumentElement.GetAttribute("productVersion"); 
+0

真棒,谢谢你! – Anders 2009-05-19 14:28:48

0

问题是您要解析的标签实际上不是XML文档的一部分。它们是包含处理指令的XML-Prolog。因此它们不会作为元素在XmlDocument中可用。

我唯一的想法是(除了查看文档如何访问这些元素),在剥离<之后,仅将mso-infoPathSolution-元素移动到它自己的XmlDocument中? ? >,并用< />替换它们。然后你可以访问这些属性,而不管它们的顺序。

+0

任何想法如何将这个特殊的节点进入一个新的XmlDocument?我对Xml解析和操作比较陌生。目前我正在尝试修改newNode的OuterXml,但它是ReadOnly,因此任务仍在继续! – Anders 2009-05-18 17:07:07