2015-08-18 24 views
1

嗨,GC开销限制错误令我疯狂。我有20个执行者每个使用25 GB我根本不明白它怎么能把GC开销我也不那么大的数据集。一旦这个GC错误发生在执行器中,它将会丢失,并且其他执行器由于IOException异常,Rpc客户端解除关联,洗牌未找到而缓慢丢失。请帮助我解决这个问题,因为我是Spark新手,所以我很生气。提前致谢。即使使用20个执行程序每个使用25GB的执行程序,火花执行程序也会因为超出GC开销限制而丢失

WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373, myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded 
      at org.apache.spark.sql.types.UTF8String.toString(UTF8String.scala:150) 
      at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(rows.scala:120) 
      at org.apache.spark.sql.columnar.STRING$.actualSize(ColumnType.scala:312) 
      at org.apache.spark.sql.columnar.compression.DictionaryEncoding$Encoder.gatherCompressibilityStats(compressionSchemes.scala:224) 
      at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.gatherCompressibilityStats(CompressibleColumnBuilder.scala:72) 
      at org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:80) 
      at org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87) 
      at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:148) 
      at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:124) 
      at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277) 
      at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) 
      at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) 
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) 
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) 
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
      at org.apache.spark.scheduler.Task.run(Task.scala:70) 
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 

回答

1

当CPU花费超过98%的垃圾收集任务时,会抛出超出GC开销限制。它发生在Scala使用不可变数据结构时,因为对于每次转换,JVM将不得不重新创建大量新对象并从堆中删除以前的对象。因此,如果这是您的问题,请尝试使用一些可变数据结构。

请阅读此页http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning了解如何调整GC。