Apache Spark Worker Timeout

我一直在使用Spark遇到过一个又一个问题，我相信它有一些与网络或权限或两者兼有的问题。主或日志中没有任何内容或抛出的错误会提示问题。Apache Spark Worker Timeout

15/12/29 19:19:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:43 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/8 is now EXITED (Command exited with code 1) 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Executor app-20151229141057-0000/8 removed: Command exited with code 1 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 8 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor added: app-20151229141057-0000/10 on worker-20151229141026-127.0.0.1-48818 (127.0.0.1:48818) with 2 cores 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151229141057-0000/10 on hostPort 127.0.0.1:48818 with 2 cores, 1024.0 MB RAM 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/10 is now LOADING 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/10 is now RUNNING 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/9 is now EXITED (Command exited with code 1) 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Executor app-20151229141057-0000/9 removed: Command exited with code 1 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 9 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor added: app-20151229141057-0000/11 on worker-20151229141023-127.0.0.1-35452 (127.0.0.1:35452) with 2 cores 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151229141057-0000/11 on hostPort 127.0.0.1:35452 with 2 cores, 1024.0 MB RAM 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/11 is now LOADING 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/11 is now RUNNING 
15/12/29 19:21:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我试图在spark 14.2上运行Ubuntu 14.04的独立安装程序。一切似乎都正确配置，但工作似乎从未完成，每个员工都超时。

这是远程机器从我上执行工作的一个...

的代码只是他们的一个例子。我也尝试过Pi Estimation的例子，并且有同样的问题。

def main(args: Array[String]) { 
    val logFile = "/Users/user/spark.txt" // Should be some file on your system 
    val conf = new SparkConf().setAppName("Simple App").setMaster("spark://46.101.xxx.xxx:7077") 
    val sc = new SparkContext(conf) 
    val logData = sc.textFile(logFile, 2).cache() 
    val numAs = logData.filter(line => line.contains("a")).count() 
    val numBs = logData.filter(line => line.contains("b")).count() 
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) 
}

有没有人遇到过这个问题？如果有人能帮我解决这个问题，我会非常感激。

--edit - 附加信息。

#spark-env.sh 
export SPARK_LOCAL_IP="46.101.xxx.xxx" 
export SPARK_MASTER_IP="46.101.xxx.xxx" 
export SPARK_PUBLIC_DNS="46.101.xxx.xxx"

试过的Java 7 &爪哇8使用Scala 2.10.6和2.11.latest。

法师开始./start-master.sh 工人开始./start-slave.sh火花：//46.101.xxx.xxx：7077

运行在Ubuntu 14.04.3 LTS。（数字海洋） - 没有防火墙。可以从远程机器远程登录到主人和工作人员。主人和工人都在同一台机器上。

测试了Spark 1.5.2和1.5.0。客户机（请求）和远程服务器（主机和工作人员）之间的Java，Scala和Spark版本保持一致。

来源

2015-12-29 Chris Beck

它看起来像你的应用程序找不到工人。当你启动集群时，你是否启动了任何从设备并将它们连接到主设备上？

要开始你的工人，并将它们连接到主，运行以下命令：

./bin/spark-class org.apache.spark.deploy.worker.Worker spark://ip:port

凡火花：// IP：端口是主的。

来源

2015-12-29 22:35:33

您可以在上面的屏幕截图中看到连接到主人的两名工作人员。我只是运行'./start-all.sh'命令。手动对'。/ start-master.sh'和'。/ start-worker.sh'做同样的效果。 –

Apache Spark Worker Timeout

回答

相关问题