2015-10-01 35 views
0

我每次尝试使用hadoop-2.6.0-for-spark连接器加载一个巨大的elasticsearch索引时都会收到错误。hadoop for spark:增加分区数

我在纱线上运行着火花。

15/09/30 17:57:21 ERROR shuffle.OneForOneBlockFetcher: Failed while starting block fetches 
java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE 
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836) 
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125) 
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113) 
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206) 
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127) 
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134) 
at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:512) 
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:302) 
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) 
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) 
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) 
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57) 
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114) 
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87) 
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) 
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) 
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) 
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) 
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) 
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) 
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) 
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) 
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) 
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) 
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) 
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) 
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) 
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) 
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) 
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) 
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
at java.lang.Thread.run(Thread.java:745) 

at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162) 
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103) 
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) 
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) 

到目前为止,我看到的解决方案是增加partions的数目,但我怎么做吓坏做,使用Hadoop-2.6.0-的火花。

有什么想法?

回答

0

我最终通过增加执行程序内存来​​解决了这个问题,似乎在尝试将RDD缓存到磁盘时由于将块缓存到磁盘的大小而抛出异常。