2015-10-31 104 views
0

我在使用Json.net并创建一个大的Bson文件时遇到问题。我有以下测试代码:OutOfMemory异常与Json.Net中的流和BsonWriter

Imports System.IO 
Imports Newtonsoft.Json 

Public Class Region 
    Public Property Id As Integer 
    Public Property Name As String 
    Public Property FDS_Id As String 
End Class 

Public Class Regions 
    Inherits List(Of Region) 

    Public Sub New(capacity As Integer) 
     MyBase.New(capacity) 
    End Sub 
End Class 

Module Module1 
    Sub Main() 
     Dim writeElapsed2 = CreateFileBson_Stream(GetRegionList(5000000)) 
     GC.Collect(0) 
    End Sub 

    Public Function GetRegionList(count As Integer) As List(Of Region) 
     Dim regions As New Regions(count - 1) 
     For lp = 0 To count - 1 
      regions.Add(New Region With {.Id = lp, .Name = lp.ToString, .FDS_Id = lp.ToString}) 
     Next 
     Return regions 
    End Function 

    Public Function CreateFileBson_Stream(regions As Regions) As Long 
     Dim sw As New Stopwatch 
     sw.Start() 
     Dim lp = 0 

     Using stream = New StreamWriter("c:\atlas\regionsStream.bson") 
      Using writer = New Bson.BsonWriter(stream.BaseStream) 
       writer.WriteStartArray() 

       For Each item In regions 
        writer.WriteStartObject() 
        writer.WritePropertyName("Id") 
        writer.WriteValue(item.Id) 
        writer.WritePropertyName("Name") 
        writer.WriteValue(item.Name) 
        writer.WritePropertyName("FDS_Id") 
        writer.WriteValue(item.FDS_Id) 
        writer.WriteEndObject() 

        lp += 1 
        If lp Mod 1000000 = 0 Then 
         writer.Flush() 
         stream.Flush() 
         stream.BaseStream.Flush() 
        End If 
       Next 

       writer.WriteEndArray() 
      End Using 
     End Using 

     sw.Stop() 
     Return sw.ElapsedMilliseconds 
    End Function 
End Module 

我在第一个using语句中使用了FileStream而不是StreamWriter,它没有区别。

CreateBsonFile_Stream在出现OutOfMemory异常的超过300万条记录时失败。在Visual Studio中使用内存分析器显示内存继续攀升,即使我正在冲洗我所能做的一切。

5m区域的列表在内存中约为468Mb。

有趣的是,如果我用下面的代码产生的Json它的工作原理和内存有500MB statys稳定:

Public Function CreateFileJson_Stream(regions As Regions) As Long 
     Dim sw As New Stopwatch 
     sw.Start() 
     Using stream = New StreamWriter("c:\atlas\regionsStream.json") 
      Using writer = New JsonTextWriter(stream) 
       writer.WriteStartArray() 

       For Each item In regions 
        writer.WriteStartObject() 
        writer.WritePropertyName("Id") 
        writer.WriteValue(item.Id) 
        writer.WritePropertyName("Name") 
        writer.WriteValue(item.Name) 
        writer.WritePropertyName("FDS_Id") 
        writer.WriteValue(item.FDS_Id) 
        writer.WriteEndObject() 
       Next 

       writer.WriteEndArray() 
      End Using 
     End Using 
     sw.Stop() 
     Return sw.ElapsedMilliseconds 
    End Function 

我敢肯定这是与BsonWriter问题,但看不出还有什么我可以。有任何想法吗?

回答

-1

发现它--BsonWriter试图成为'智能'...因为我将json生成为一个区域数组,它似乎将整个数组保存在内存中,而不管你做什么刷新。

为了证明这一点,我拿出了开始和结束数组写入并运行例程 - 内存使用率保持在500Mb,程序正常运行。

我的猜测是,这是得到固定在JsonWriter但不是在使用BsonWriter

2

按照BSON specification较小的错误,每一个对象或数组 - 所谓文件标准 - 必须包含在开始包括所述文档中的总字节数的计数:

document ::=  int32 e_list "\x00"  BSON Document. int32 is the total number of bytes comprising the document. 
e_list  ::=  element e_list 
    | "" 
element  ::=  "\x01" e_name double 64-bit binary floating point 
    | "\x02" e_name string UTF-8 string 
    | "\x03" e_name document Embedded document 
    | "\x04" e_name document Array 
    | ... 

因此写入的根对象或数组时,将被写入到文件的字节的总数必须预先计算。

Json.NET的BsonWriter和基本​​通过缓存所有tokens写入在树上,然后当根令牌的内容已经定稿,写树之前递归地计算尺寸实现这一点。 (替代方法是使应用程序(即您的代码)以某种方式预先计算此信息 - 实际上不可能 - 或者在输出流中来回查找以写入此信息,可能仅适用于那些Stream.CanSeek == true。)的流。

在您的初始实现中,数组是根BSON文档,所以Json.NET必须缓存整个数组内容以计算它们的大小。在你的第二个实现中,你实际上是在文件中写入多个根BSON文档。这避免了计算总体字节数的需要,但可能不被认为是有效的BSON;一些BSON阅读器只会加载第一个文档,请参阅Insert multiple BSonDocuments from file into MongoDB

更新

基于BsonBinaryWriter我已经创建了一个逐步序列化的枚举到流的辅助方法,其Stream.CanSeek == true。它不需要在内存中缓存整个BSON文档,而是寻求流的开始以写入最终的字节数。由于Json.NET是用c#编写的,而且我的主要语言是c#,所以这也在c#中。如果你需要这个转换为VB.NET,让我知道,我可以尝试。

public static class BsonExtensions 
{ 
    public static void SerializeEnumerable<T>(IEnumerable<T> enumerable, Stream stream, JsonSerializerSettings settings = null) 
    { 
     // Adapted from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonBinaryWriter.cs 
     if (enumerable == null || stream == null) 
      throw new ArgumentNullException("enumerable == null || stream == null"); 
     if (!stream.CanSeek || !stream.CanWrite) 
      throw new ArgumentException("!stream.CanSeek || !stream.CanWrite"); 

     var serializer = JsonSerializer.CreateDefault(settings); 
     var contract = serializer.ContractResolver.ResolveContract(typeof(T)); 
     BsonType rootType; 
     if (contract is JsonObjectContract) 
      rootType = BsonType.Object; 
     else if (contract is JsonArrayContract) 
      rootType = BsonType.Array; 
     else 
      throw new ArgumentException(string.Format("\"{0}\" maps to neither a BSON object nor a BSON array", typeof(T).FullName)); 

     stream.Flush(); // Just in case. 
     var initialPosition = stream.Position; 
     var writer = new BinaryWriter(stream); // Do NOT dispose, leave the incoming Stream open for the caller to dispose if desired. 

     writer.Write((int)0); // CALCULATED SIZE TO BE CALCULATED LATER. 

     ulong index = 0; 
     var buffer = new byte[256]; 
     foreach (var item in enumerable) 
     { 
      writer.Write((sbyte)rootType); 
      WriteString(writer, index.ToString(CultureInfo.InvariantCulture), buffer); 
      using (var bsonWriter = new BsonWriter(writer) { CloseOutput = false }) 
      { 
       serializer.Serialize(bsonWriter, item); 
      } 
      index++; 
     } 

     writer.Write((byte)0); 
     writer.Flush(); 

     var finalPosition = stream.Position; 
     stream.Position = initialPosition; 
     writer.Write(checked((int)(finalPosition - initialPosition))); 
     stream.Position = finalPosition; 
    } 

    private static readonly Encoding Encoding = new UTF8Encoding(false); 

    private static void WriteString(BinaryWriter writer, string s, byte[] buffer) 
    { 
     if (s != null) 
     { 
      if (s.Length < buffer.Length/Encoding.GetMaxByteCount(1)) 
      { 
       var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0); 
       writer.Write(buffer, 0, byteCount); 
      } 
      else 
      { 
       byte[] bytes = Encoding.GetBytes(s); 
       writer.Write(bytes); 
      } 
     } 

     writer.Write((byte)0); 
    } 
} 

internal enum BsonType : sbyte 
{ 
    // Taken from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonType.cs 
    Number = 1, 
    String = 2, 
    Object = 3, 
    Array = 4, 
    Binary = 5, 
    Undefined = 6, 
    Oid = 7, 
    Boolean = 8, 
    Date = 9, 
    Null = 10, 
    Regex = 11, 
    Reference = 12, 
    Code = 13, 
    Symbol = 14, 
    CodeWScope = 15, 
    Integer = 16, 
    TimeStamp = 17, 
    Long = 18, 
    MinKey = -1, 
    MaxKey = 127 
} 

您可以使用该序列化到本地FileStreamMemoryStream - 但不是,比方说,一个DeflateStream,不能被重新定位。

+0

@Liam - 回答更新可能的解决方案。 – dbc