2016-10-27 75 views
0

我在docker容器中安装了spark 1.6.1。我可以在本地运行我的spark python应用程序,但是当我尝试将其提交到主机外部的纱线群集(spark-submit --master yarn myapp.py)时,它仍处于ACCEPTED状态。如果我进入标准错误日志,从我的应用程序具备以下条件:将码头集装箱内的spark-submit发送到YARN集群

16/10/26 11:07:25 INFO ApplicationMaster: Waiting for Spark driver to be  reachable. 
16/10/26 11:08:28 ERROR ApplicationMaster: Failed to connect to driver at 172.18.0.4:50229, retrying ... 
16/10/26 11:09:31 ERROR ApplicationMaster: Failed to connect to driver at 172.18.0.4:50229, retrying ... 
16/10/26 11:09:32 ERROR ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver! 
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:501) 
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:362) 
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:204) 
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:672) 
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) 
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) 
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) 
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:670) 
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:697) 
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 

司机在172.18.0.4:50229这是我的容器。由于我的容器位于IP为10.xx.xx.xx的主机中,我发现它无法达到它是正常的。我如何指定spark要尝试连接到主机而不是容器?还是有人有这个解决方案?

Ps:我检查了以下链接:Making spark use /etc/hosts file for binding in YARN cluster mode,这与我的问题非常相似。但正如火花的问题说,它不会修复它

回答

0

所以要回答我的问题,我不得不在主机网络上运行我的容器。如果您位于代理之后,请谨慎使用适用于SPARK_LOCAL_IP(env变量)和spark.driver.host(conf选项)的正确虚拟接口(eth1)。

由于集装箱的IP根据网络进行设置,因此纱线集群遇到了驱动程序问题。

由于容器位于主机网络中,容器部署的任何服务都将自动公开,无需公开或绑定。

Ps:我在客户端模式下部署我的应用程序。