编年史地图中的Kryo序列化 - 慢字节读取

我在Scala中广泛使用编年史地图，最近决定尝试使用Kryo序列化。我添加了定制marshallers（代码如下），同时它将我的商店规模缩小了14G（大约62％），而且一切仍然有效，但速度令人无法接受。编年史地图中的Kryo序列化 - 慢字节读取

我创建了一个示例用例，并做了一些运行在相同的数据

[Using kryo] took 6883, and then 7187, 6954, 7225, 13051 
[Not using kryo] took 2326, and then 1352, 1493, 1500, 1187

所以这几次慢。下面是阅读编组：

class KryoMarshallerReader[T] extends BytesReader[T] { 
    val kryo = // Reference to KryoPool from Twitter's Chill library 

    override def read(in: Bytes[_], using: T): T = { 

    val bytes = benchmark("array allocation") { 
     new Array[Byte](in.readRemaining().toInt) 
    } 


    benchmark("reading bytes") { 
     in.read(bytes) 
    } 


    benchmark("deserialising") { 
     kryo.fromBytes(bytes).asInstanceOf[T] 
    } 
    } 

    override def readMarshallable(wire: WireIn): Unit = {} 

    override def writeMarshallable(wire: WireOut): Unit = {} 
}

我，然后平均在这三个阶段的执行时间（全部毫秒），并意识到，读取的字节是最慢的：

   stage Average time (ms) 
       (fctr)    (dbl) 
1 [array allocation]   0.9432907 
2 [deserialising]   0.9944112 
3 [reading bytes]  13.2367265

现在的问题是 - 什么我做错了吗？

我查看了Bytes[_]的界面，它看起来像是逐个读取字节 - 有没有办法使用缓冲区或神奇的能够批量加载的东西？

更新：最终我改变了数组分配+读取字节数到in.toByteArray，但它仍然很慢，因为它在一个接一个地拷贝字节。只是运行读取地图上显示字节读取的瓶颈：

来源

2016-11-29 Anton

字节，传递到BytesReader.read（）的in.readRemaining()，是不是序列化形式的对象，这是比更。您的对象的序列化形式保证从in.readPosition()开始，但通常会比in.readLimit()更早（如readRemaining() = readLimit() - readPosition()）。通常BytesReader/BytesWriter对的实现应该关心确定对象字节本身的结束（如果需要的话），例如， G。看到CharSequenceArrayBytesMarshaller的实施BytesReader and BytesWriter section of the Chronicle Map tutorial：

public final class CharSequenceArrayBytesMarshaller 
    implements BytesWriter<CharSequence[]>, BytesReader<CharSequence[]> { 
    ... 

    @Override 
    public void write(Bytes out, @NotNull CharSequence[] toWrite) { 
     out.writeInt(toWrite.length); // care about writing the size ourselves! 
     ... 
    } 

    @NotNull 
    @Override 
    public CharSequence[] read(Bytes in, @Nullable CharSequence[] using) { 
     int len = in.readInt(); // care about reading the size ourselves! 
     ... 
    } 
}

但既然你正在实施KRYO系列化这应该是概念上类似于Java标准系列化，你应该采取的SerializableReader和SerializableDataAccess源，并修改它使用KRYO代替标准的Java序列化（但请注意，这些源代码是LGPLv3许可的）。特别是那些实现使用Bytes.inputStream()和Bytes.outputStream()来桥接不知道字节但知道大约InputStream/OutputStream的标准Java序列化，而没有不必要地复制字节。我很确定Kryo也支持InputStream/OutputStream。

对kryo作为任何串行器接口的实例字段（在您的情况下为BytesReader）而不执行StatefulCopyable时要非常小心。您可能很容易引入并发瓶颈或并发错误（数据竞争）。检查Understanding StatefulCopyable section in the Chronicle Map tutorial和Chronicle Map custom serialization checklist。

来源

2016-12-01 07:33:31 leventov

编年史地图中的Kryo序列化 - 慢字节读取

回答

相关问题