2016-01-21 18 views
0
  • 我使用以下运行时火花配置值
  • spark-submit --executor-memory 8G --spark.yarn.executor.memoryOverhead 2G火花应用java.lang.OutOfMemoryError:直接缓冲存储器

    但它仍然提高下列出存储器错误:

    我有一个pairRDD有8362269460行,分区大小是128.当pairRDD.groupByKey.saveAsTextFile .Any线索提出这个错误?

    更新: 我添加了一个过滤器,现在数据行是2300000000.在spark shell中运行,没有错误。 我的集群: 19 datenode 1个namdnode

       Min Resources: <memory:150000, vCores:150> 
          Max Resources: <memory:300000, vCores:300> 
    

    感谢您的帮助。

    org.apache.spark.shuffle.FetchFailedException: java.lang.OutOfMemoryError: Direct buffer memory 
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:321) 
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:306) 
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51) 
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) 
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) 
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:132) 
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60) 
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:89) 
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
        at org.apache.spark.scheduler.Task.run(Task.scala:88) 
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
        at java.lang.Thread.run(Thread.java:745) 
    Caused by: io.netty.handler.codec.DecoderException: Direct buffer memory 
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:234) 
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) 
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) 
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) 
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) 
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
        ... 1 more 
    Caused by: java.lang.OutOfMemoryError: Direct buffer memory 
        at java.nio.Bits.reserveMemory(Bits.java:658) 
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) 
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
        at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:651) 
        at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237) 
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:215) 
        at io.netty.buffer.PoolArena.reallocate(PoolArena.java:358) 
        at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:121) 
        at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) 
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) 
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) 
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) 
        at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92) 
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:228) 
        ... 10 more 
    ) 
    

    我想知道如何正确配置直接内存大小。 最好的问候

    +1

    请妥善格式化您的问题,并给予一定的情况下将其 – manRo

    +0

    @ssyue -XX:MaxDirectMemorySize –

    +0

    @曼罗抱歉,英文是我的弱点。 – ssyue

    回答

    2

    我不知道火花应用程序的任何细节,但我觉得内存配置here 需要设置-XX:MaxDirectMemorySize任何其他JVM MEM相似。设定(通过-XX :) 尝试使用spark.executor.extraJavaOptions

    如果您正在使用​​你可以使用:

    ./bin/spark-submit --name "My app" ... 
        --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:MaxDirectMemorySize=512m" myApp.jar 
    
    +0

    但这个内存错误相当意味着你的应用程序有任何内存问题,例如你读取整个流内容到内存缓冲区 –

    +0

    我会试一试。谢谢,谢谢, – ssyue

    +0

    不行,其他解决方案? – ssyue