2015-06-25 75 views
3

我有一个非常大的XML文件。这是xml格式的简化版本。XmlReader不断读取

<?xml version='1.0' encoding='UTF-8'?> 
<Sender> 
<SenderID>571099948</SenderID> 
<Sponsors> 
    <Sponsor> 
    <SponsorID>TEST01</SponsorID> 
    <Contracts> 
     <Contract> 
     <ContractID>000001</ContractID> 
     <Member> 
      <SSN>1111111111</SSN> 
      <Gender>M</Gender> 
      <Benefits> 
      <Benefit BenefitType="AAA"> 
      </Benefit> 
      <Benefit BenefitType="BBB"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     <Member> 
      <SSN>4444444444</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      <Benefit BenefitType="AAA"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     </Contract> 
     <Contract> 
     <ContractID>0000002</ContractID> 
     <Member> 
      <SSN>2222222222</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      <Benefit BenefitType="CCC"> 
      </Benefit> 
      <Benefit BenefitType="DDD"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     </Contract> 
     <Contract> 
     <ContractID>0000003</ContractID> 
     <Member> 
      <SSN>333333333</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      <Benefit BenefitType="CCC"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     </Contract> 
    </Contracts> 
    </Sponsor> 
    <Sponsor> 
    <SponsorID>TEST02</SponsorID> 
    <Contracts> 
     <Contract> 
     <ContractID>0000011</ContractID> 
     <Member> 
      <SSN>1111111111</SSN> 
      <Gender>M</Gender> 
      <Benefits> 
      </Benefits> 
     </Member> 
     </Contract> 
     <Contract> 
     <ContractID>0000002</ContractID> 
     <Member> 
      <SSN>2222222222</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      </Benefits> 
     </Member> 
     </Contract> 
    </Contracts> 
    </Sponsor> 
</Sponsors> 
</Sender> 

我想要从父节点获取合约节点以及SponsorID的所有信息。以下是使用XmlReader部分读取xml文件的代码:

 static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string elementName)  
    { 

      using (XmlReader reader = XmlReader.Create(inputUrl)) 
      { 
       reader.MoveToContent(); 
       while (reader.Read()) 
       { 
        if (reader.NodeType == XmlNodeType.Element) 
        { 
         if (reader.Name == elementName) 
         { 
          XElement el = XNode.ReadFrom(reader) as XElement; 
          if (el != null) 
          { 
           yield return el; 
          } 
         } 
        } 
       } 
      }     
    } 

这是问题所在。我无法使用它,因为整个赞助商树可能对记忆太大。

var sponsor = SimpleStreamAxis(file, "Sponsor"); 

我也不能使用这个,因为我不能告诉SponsorID只有合约节点的信息。

var contract = SimpleStreamAxis(file, "Contract"); 

有没有我可以在赞助阅读SponsorID,向前移动光标,并读取此赞助下的所有合同节点的方式,然后移动到下一个赞助商和阅读SponsorID及其合同节点等等?

回答

1

尝试这种情况:

using (XmlReader xmlReader = XmlReader.Create("file.xml")) 
{ 
    while (xmlReader.Read()) 
    { 
     if (xmlReader.ReadToFollowing("SponsorID")) 
     { 
      string sponsorId = xmlReader.ReadElementContentAsString(); 

      // process SponsorID 
      Console.WriteLine(sponsorId); 

      if (xmlReader.ReadToFollowing("Contract")) 
      { 
       do 
       { 
        XmlReader contractSubtree = xmlReader.ReadSubtree(); 
        XElement contractElement = XElement.Load(contractSubtree); 

        // process Contract 
        Console.WriteLine(contractElement.Element("ContractID")); 

       } while (xmlReader.ReadToNextSibling("Contract")); 
      } 
     } 
    } 
} 
1

是的,这可以做到假设SponsorID总是在Contract节点之前。

的基本思想是通过XML文件中读取,直到你找到想要的名称"SponsorID""Contract"元素,然后产生他们更高的加工

public static IEnumerable<XElement> StreamNamedElements(XmlReader reader, IEnumerable<XName> names) 
    { 
     var nameSet = new HashSet<XName>(names); 

     while (reader.Read()) 
     { 
      if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.Name, reader.NamespaceURI))) 
      { 
       XElement el = XNode.ReadFrom(reader) as XElement; 
       if (el != null) 
        yield return el; 
      } 
     } 
    } 

SponsorID总是存在的,先Contract箱子,这将正确地列举出这些元素。但是,如果赞助商ID缺失或出现故障,则可能会收到先前赞助商的赞助商ID。此错误可以通过使用ReadSubtree()限制的每个“SponsorID”的范围,以所述含“Sponsor”元素被截留:

public static IEnumerable<XmlReader> StreamNamedSubtrees(XmlReader reader, IEnumerable<XName> names) 
    { 
     var nameSet = new HashSet<XName>(names); 

     while (reader.Read()) 
     { 
      if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.Name, reader.NamespaceURI))) 
      { 
       var subReader = reader.ReadSubtree(); 
       yield return subReader; 
       ((IDisposable)subReader).Dispose(); // Be sure to advance to the end of the subtree if the caller did not. 
      } 
     } 
    } 

,然后用它喜欢:

 using (var sr = new StringReader(xml)) 
     using (var reader = XmlReader.Create(sr)) 
     { 
      foreach (var subReader in StreamNamedSubtrees(reader, new[] { (XName)"Sponsor" })) 
      { 
       XElement sponsorID = null; 
       foreach (var el in StreamNamedElements(subReader, new[] { (XName)"SponsorID", (XName)"Contract" })) 
       { 
        if (el.Name == "SponsorID") 
        { 
         sponsorID = el; 
        } 
        else if (el.Name == "Contract") 
        { 
         if (sponsorID == null) 
          throw new InvalidOperationException(); 
         // Example "higher processing" 
         Debug.WriteLine(string.Format("{0}: {1}", sponsorID.Value, el.ToString())); 
        } 
       } 
      } 
     } 
+0

谢谢!使用字典保留sponsorID的问题是,当sponsorID更改时,它总是会产生额外的回报,新的sponsorID和旧的Contract。 – seattleSummer

+0

@seattleSummer - 答案已更新,以解决您发现的问题。删除字典实际上使它更简单。 – dbc

+0

我没有看到使用这个循环的重点。 foreach(var SubReader在StreamNamedSubtrees(reader,new [] {(XName)“Sponsor”})) – seattleSummer