2017-07-10 33 views
0

我有一个不断退出码1返回SPARK的工作,我无法弄清楚这个特殊的退出代码意味着,为什么应用用此代码返回。这是我在节点管理器看看日志 -错误(应用程序与退出码1返回)运行在纱线集群模式星火当

2017-07-10 07:54:03,839 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1499673023544_0001_01_000001 and exit code: 1 
ExitCodeException exitCode=1: 
     at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) 
     at org.apache.hadoop.util.Shell.run(Shell.java:456) 
     at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) 
     at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 
     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1499673023544_0001_01_000001 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=1: 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at org.apache.hadoop.util.Shell.run(Shell.java:456) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:  at java.lang.Thread.run(Thread.java:745) 
2017-07-10 07:54:03,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1 
2017-07-10 07:54:03,846 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1499673023544_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE 
2017-07-10 07:54:03,846 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1499673023544_0001_01_000001 

当我检查日志中的具体应用(和容器),它没有返回任何特定的堆栈跟踪信息或错误信息。这是我在作业终止时在容器的日志(stderr)中看到的内容。

INFO impl.ContainerManagementProtocolProxy: Opening proxy : myplayground:52311 
17/07/10 07:54:02 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. myplayground:36322 
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:49562/user/Executor#509101946]) with ID 1 
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 
17/07/10 07:54:03 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 
17/07/10 07:54:03 ERROR yarn.ApplicationMaster: User application exited with status 1 
17/07/10 07:54:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1) 
17/07/10 07:54:03 INFO spark.SparkContext: Invoking stop() from shutdown hook 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 
17/07/10 07:54:03 INFO ui.SparkUI: Stopped Spark web UI at http://x.x.x.x:37961 
17/07/10 07:54:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler 
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors 
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down 
17/07/10 07:54:03 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
17/07/10 07:54:03 INFO storage.MemoryStore: MemoryStore cleared 
17/07/10 07:54:03 INFO storage.BlockManager: BlockManager stopped 
17/07/10 07:54:03 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 
17/07/10 07:54:03 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
17/07/10 07:54:03 INFO spark.SparkContext: Successfully stopped SparkContext 
17/07/10 07:54:03 INFO util.ShutdownHookManager: Shutdown hook called 
17/07/10 07:54:03 INFO util.ShutdownHookManager: Deleting directory /tmp/Hadoop-hadoop/nm-local-dir/usercache/myprdusr/appcache/application_1499673023544_0001/spark-2adeda9f-9244-4519-b87f-ec895a50cfcd 
17/07/10 07:54:03 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
17/07/10 07:54:03 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 

因此,在这两种日志,所有我能看到的是,应用程序退出,退出代码1.谁能告诉我这是什么特定的错误代码的含义和可能的原因纱抛出此异常?

回答

0

我终于能够解决这个问题。发生了什么事是我调用spark-submit的bash脚本传递了一个无效的参数。当作业开始时,一个脚本调用launch_container.sh将执行org.apache.spark.deploy.yarn.ApplicationMaster用传递给火花提交参数和ApplicationMaster为1的退出代码返回时,任何参数是无效的。

更多信息here

相关问题