2016-02-05 97 views
0

我有一个带有3名工人的Spark独立安装程序(v 1.4.1)。Apache Spark工作者执行程序退出退出状态1

我有一个应用程序,从卡夫卡主题读取流详细说明数据并将其存储在另一个卡夫卡主题。

昨晚,申请下降,所有工人都摔倒了。

工人的日志报告如下所示:

16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:52180/user/CoarseGrainedScheduler" "--executor-id" "24279" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stdout with daily rolling 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stderr with daily rolling 
16/02/04 21:02:10 INFO Worker: Executor app-20160129184621-0001/1430 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:10 INFO Worker: Asked to launch executor app-20160129184621-0001/1431 for stream-elaboration 
16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:57297/user/CoarseGrainedScheduler" "--executor-id" "1431" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stdout with daily rolling 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stderr with daily rolling 
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24279 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24280 for stream-elaboration 
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:52180/user/CoarseGrainedScheduler" "--executor-id" "24280" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stdout with daily rolling 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stderr with daily rolling 
16/02/04 21:02:11 INFO Worker: Executor app-20160129184621-0001/1431 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160129184621-0001/1432 for stream-elaboration 
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:57297/user/CoarseGrainedScheduler" "--executor-id" "1432" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stdout with daily rolling 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stderr with daily rolling 
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24280 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24281 for stream-elaboration 

在日志的末尾:

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-42" 
Exception in thread "qtp291507283-37" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "ExecutorRunner for app-20160201182749-0007/29488" java.lang.OutOfMemoryError: GC overhead limit exceeded 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-scheduler-1" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "qtp291507283-38" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "JMX server connection timeout 81" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "JMX server connection timeout 81" 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-10" 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-40" 
Exception in thread "qtp291507283-35" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 

Exception in thread "qtp291507283-39" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "qtp291507283-41" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "qtp291507283-36" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 

.... 

运行:

ps aux | grep "worker" 

过程仍处于活动状态,但我在sparkUI上看不到它。

为什么工作人员执行程序会频繁重启?

回答

1

日志显示多个java.lang.OutOfMemoryError: GC overhead limit exceeded消息,这意味着您的执行程序会抛出导致它们退出的错误。

此错误表示您的程序花费太多时间运行GC(请参阅更多详细信息here)。要解决这一点 - 你可以试试这些路径之一:

  • 蛮力的办法就是通过增加-XX:-UseGCOverheadLimit你的执行者JVM选项来禁用该安全性,但它可能会离开你的应用程序做的大多是GC,因此运行非常缓慢
  • 分析你的工作的内存使用和优化 - 你的代码可能会被消耗比需要更多的内存,迫使GC悠着点
  • 优化内存设置 - 例如,如果你可以增加堆空间的执行者,气压可能会降低
0

您实质上内存不足以顺利运行此过程。想到的选项:

  1. 指定更多的记忆像你提到的,尝试之间的东西像-Xmx512m之间。
  2. 调试您的代码以查找内存泄漏的可能性。

为什么工作者执行器会频繁重启?

您正在使用的是建立在SPARK乐驰流,所以它享有相同的容错的工作节点,这意味着如果工人下降是由于在你的情况下,星火引擎将尝试重新启动一些意外的错误它。