2015-12-29 130 views
1

我一直在使用Spark遇到过一个又一个问题,我相信它有一些与网络或权限或两者兼有的问题。主或日志中没有任何内容或抛出的错误会提示问题。Apache Spark Worker Timeout

15/12/29 19:19:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:43 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/8 is now EXITED (Command exited with code 1) 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Executor app-20151229141057-0000/8 removed: Command exited with code 1 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 8 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor added: app-20151229141057-0000/10 on worker-20151229141026-127.0.0.1-48818 (127.0.0.1:48818) with 2 cores 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151229141057-0000/10 on hostPort 127.0.0.1:48818 with 2 cores, 1024.0 MB RAM 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/10 is now LOADING 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/10 is now RUNNING 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/9 is now EXITED (Command exited with code 1) 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Executor app-20151229141057-0000/9 removed: Command exited with code 1 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 9 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor added: app-20151229141057-0000/11 on worker-20151229141023-127.0.0.1-35452 (127.0.0.1:35452) with 2 cores 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151229141057-0000/11 on hostPort 127.0.0.1:35452 with 2 cores, 1024.0 MB RAM 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/11 is now LOADING 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/11 is now RUNNING 
15/12/29 19:21:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 

我试图在spark 14.2上运行Ubuntu 14.04的独立安装程序。一切似乎都正确配置,但工作似乎从未完成,每个员工都超时。

enter image description here

这是远程机器从我上执行工作的一个...

enter image description here

的代码只是他们的一个例子。我也尝试过Pi Estimation的例子,并且有同样的问题。

def main(args: Array[String]) { 
    val logFile = "/Users/user/spark.txt" // Should be some file on your system 
    val conf = new SparkConf().setAppName("Simple App").setMaster("spark://46.101.xxx.xxx:7077") 
    val sc = new SparkContext(conf) 
    val logData = sc.textFile(logFile, 2).cache() 
    val numAs = logData.filter(line => line.contains("a")).count() 
    val numBs = logData.filter(line => line.contains("b")).count() 
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) 
} 

有没有人遇到过这个问题?如果有人能帮我解决这个问题,我会非常感激。

--edit - 附加信息。

#spark-env.sh 
export SPARK_LOCAL_IP="46.101.xxx.xxx" 
export SPARK_MASTER_IP="46.101.xxx.xxx" 
export SPARK_PUBLIC_DNS="46.101.xxx.xxx" 

试过的Java 7 &爪哇8使用Scala 2.10.6和2.11.latest。

法师开始./start-master.sh 工人开始./start-slave.sh火花://46.101.xxx.xxx:7077

运行在Ubuntu 14.04.3 LTS。 (数字海洋) - 没有防火墙。可以从远程机器远程登录到主人和工作人员。主人和工人都在同一台机器上。

测试了Spark 1.5.2和1.5.0。客户机(请求)和远程服务器(主机和工作人员)之间的Java,Scala和Spark版本保持一致。

回答

0

它看起来像你的应用程序找不到工人。当你启动集群时,你是否启动了任何从设备并将它们连接到主设备上?

要开始你的工人,并将它们连接到主,运行以下命令:

./bin/spark-class org.apache.spark.deploy.worker.Worker spark://ip:port 

火花:// IP:端口是主的。

+0

您可以在上面的屏幕截图中看到连接到主人的两名工作人员。我只是运行'./start-all.sh'命令。手动对'。/ start-master.sh'和'。/ start-worker.sh'做同样的效果。 –