2015-06-17 38 views
0

我想使用纱线客户端在Google Cloud上的Hadoop上运行JAR。使用纱线客户端在Google Cloud上的Hadoop中运行JAR

我在Hadoop中

spark-submit --class find --master yarn-client find.jar 

的主节点,使用此命令,但它返回该错误

15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032 
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 

问题是什么?如果它是有用的,这是我的yarn-site.xml

<?xml version="1.0" ?> 
<!-- 
    <configuration> 
     <!-- Site specific YARN configuration properties --> 
     <property> 
     <name>yarn.nodemanager.remote-app-log-dir</name> 
     <value>/yarn-logs/</value> 
     <description> 
      The remote path, on the default FS, to store logs. 
     </description> 
     </property> 
     <property> 
     <name>yarn.nodemanager.aux-services</name> 
     <value>mapreduce_shuffle</value> 
     </property> 
     <property> 
     <name>yarn.resourcemanager.hostname</name> 
     <value>hadoop-m-on8g</value> 
     </property> 
     <property> 
     <name>yarn.nodemanager.resource.memory-mb</name> 
     <value>5999</value> 
     <description> 

回答

2

在你的情况,它看起来像YARN ResourceManager可能是不健康的原因不明;您可以尝试用以下方法固定纱线:

sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh 
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh 

但是,看起来您使用的是Click-to-Deploy解决方案;由于存在一些错误和内存配置不足,Click-to-Deploy的Spark + Hadoop 2部署实际上目前不支持Spark on YARN。你通常碰到这样的事情,如果你只是尝试与--master yarn-client运行外的开箱:

15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
    appMasterRpcPort: -1 
    appStartTime: 1434561664937 
    yarnAppState: ACCEPTED 

15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
    appMasterRpcPort: -1 
    appStartTime: 1434561664937 
    yarnAppState: ACCEPTED 

15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
    appMasterRpcPort: 0 
    appStartTime: 1434561664937 
    yarnAppState: RUNNING 

15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED 
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null} 
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null} 

部署的良好支持的方式是在谷歌Compute Engine的集群使用Hadoop 2并且配置为能够在YARN上运行的Spark是使用bdutil。你会运行类似:

./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d \ 
    -e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh 
./bdutil -e my_custom_env.sh deploy 

# Shorthand for logging in to the master 
./bdutil -e my_custom_env.sh shell 

# Handy way to run a socks proxy to make it easy to access the web UIs 
./bdutil -e my_custom_env.sh socksproxy 

# When done, delete your cluster 
./bdutil -e my_custom_env.sh delete 

随着spark_on_yarn_env.sh星火应该默认为yarn-client,但您可以随时,如果你想重新指定--master yarn-client。您可以通过./bdutil --help查看bdutil中可用标志的更详细说明。下面就为标志我上面包括帮助条目:

-b, --bucket 
    Google Cloud Storage bucket used in deployment and by the cluster. 

-d, --use_attached_pds 
    If true, uses additional non-boot volumes, optionally creating them on 
    deploy if they don't exist already and deleting them on cluster delete. 

-e, --env_var_files 
    Comma-separated list of bash files that are sourced to configure the cluster 
    and installed software. Files are sourced in order with later files being 
    sourced last. bdutil_env.sh is always sourced first. Flag arguments are 
    set after all sourced files, but before the evaluate_late_variable_bindings 
    method of bdutil_env.sh. see bdutil_env.sh for more information. 

-P, --prefix 
    Common prefix for cluster nodes. 

-p, --project 
    The Google Cloud Platform project to use to create the cluster. 

-z, --zone 
    Specify the Google Compute Engine zone to use. 
+0

您好,感谢您的帮助,我尝试你的命令,但如果我尝试启动火花提交,它报告此信息yarn.Client:应用从ResourceManager的报告: 应用标识符:application_1434614478260_0003 APPID:3 clientToAMToken:空 appDiagnostics: appMasterHost:N/A appQueue:默认 appMasterRpcPort:-1 appStartTime:1434617006538 yarnAppState:ACCEPTED distributedFinalState:UNDEFINED appTrackingUrl:http:// hadoop-m-565h:8088/proxy/application_1434614478260_0003/ appUser – user3836982

+0

如果我尝试使用bdutil,在第二步当我deply custom_env它将返回此Thu Jun 18 13:00:11 UTC 2015:命令失败:在行326上等待$ {SUBPROC}。 Thu Jun 18 13:00:11 UTC 2015:退出失败命令的代码:1 Thu Jun 18 13:00:11 UTC 2015:详细的调试信息可用于file:/tmp/bdutil-20150618-130008-iVA/debuginfo.tx t – user3836982

+0

您有/tmp/bdutil-20150618-130008-iVA/debuginfo.txt的内容吗?如果您不想在此发布,可以将它们发送到[email protected]。 –

相关问题