如何正确提交火花的作业单机集群

我刚刚建立了一个火花2.0单机单节点集群在Ubuntu 14 试图提交pyspark任务上：如何正确提交火花的作业单机集群

~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master spark://ip-10-180-191-14:7077 examples/src/main/python/pi.py

火花给我这个消息：

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

下面是完整的输出：

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
16/07/27 17:45:18 INFO SparkContext: Running Spark version 2.0.0 
16/07/27 17:45:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/07/27 17:45:18 INFO SecurityManager: Changing view acls to: ubuntu 
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls to: ubuntu 
16/07/27 17:45:18 INFO SecurityManager: Changing view acls groups to: 
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls groups to: 
16/07/27 17:45:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set() 
16/07/27 17:45:19 INFO Utils: Successfully started service 'sparkDriver' on port 36842. 
16/07/27 17:45:19 INFO SparkEnv: Registering MapOutputTracker 
16/07/27 17:45:19 INFO SparkEnv: Registering BlockManagerMaster 
16/07/27 17:45:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e25f3ae9-be1f-4ea3-8f8b-b3ff3ec7e978 
16/07/27 17:45:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 
16/07/27 17:45:19 INFO SparkEnv: Registering OutputCommitCoordinator 
16/07/27 17:45:19 INFO log: Logging initialized @1986ms 
16/07/27 17:45:19 INFO Server: jetty-9.2.16.v20160414 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs/job,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/jobs/job/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/stage,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/stage/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/pool,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/pool/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage/rdd,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/storage/rdd/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/environment,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/environment/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors/threadDump,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/executors/threadDump/json,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/static,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/api,null,AVAILABLE} 
16/07/27 17:45:19 INFO ContextHandler: Started [email protected]{/stages/stage/kill,null,AVAILABLE} 
16/07/27 17:45:19 INFO ServerConnector: Started [email protected]{HTTP/1.1}{0.0.0.0:4040} 
16/07/27 17:45:19 INFO Server: Started @2150ms 
16/07/27 17:45:19 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
16/07/27 17:45:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.180.191.14:4040 
16/07/27 17:45:19 INFO Utils: Copying /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py to /tmp/spark-ee1ceb06-a7c4-4b18-8577-adb02f97f31e/userFiles-565d5e0b-5879-40d3-8077-d9d782156818/pi.py 
16/07/27 17:45:19 INFO SparkContext: Added file file:/home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py at spark://10.180.191.14:36842/files/pi.py with timestamp 1469641519759 
16/07/27 17:45:19 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-180-191-14:7077... 
16/07/27 17:45:19 INFO TransportClientFactory: Successfully created connection to ip-10-180-191-14/10.180.191.14:7077 after 25 ms (0 ms spent in bootstraps) 
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20160727174520-0006 
16/07/27 17:45:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39047. 
16/07/27 17:45:20 INFO NettyBlockTransferService: Server created on 10.180.191.14:39047 
16/07/27 17:45:20 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.180.191.14, 39047) 
16/07/27 17:45:20 INFO BlockManagerMasterEndpoint: Registering block manager 10.180.191.14:39047 with 366.3 MB RAM, BlockManagerId(driver, 10.180.191.14, 39047) 
16/07/27 17:45:20 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.180.191.14, 39047) 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/metrics/json,null,AVAILABLE} 
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL/json,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL/execution,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/SQL/execution/json,null,AVAILABLE} 
16/07/27 17:45:20 INFO ContextHandler: Started [email protected]{/static/sql,null,AVAILABLE} 
16/07/27 17:45:20 INFO SharedState: Warehouse path is 'file:/home/ubuntu/spark/spark-2.0.0/spark-warehouse'. 
16/07/27 17:45:20 INFO SparkContext: Starting job: reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43 
16/07/27 17:45:20 INFO DAGScheduler: Got job 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) with 2 output partitions 
16/07/27 17:45:20 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) 
16/07/27 17:45:20 INFO DAGScheduler: Parents of final stage: List() 
16/07/27 17:45:20 INFO DAGScheduler: Missing parents: List() 
16/07/27 17:45:20 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43), which has no missing parents 
16/07/27 17:45:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.6 KB, free 366.3 MB) 
16/07/27 17:45:21 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.0 KB, free 366.3 MB) 
16/07/27 17:45:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.180.191.14:39047 (size: 3.0 KB, free: 366.3 MB) 
16/07/27 17:45:21 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012 
16/07/27 17:45:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) 
16/07/27 17:45:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
16/07/27 17:45:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
16/07/27 17:45:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我没有运行火花Ø ñ顶部hadoop或纱线，只是本身，独立。我能做些什么来使这些工作火花进程？

来源

2016-07-27 Eugene Goldberg

尝试设置主当地像这样，为了使用本地模式：

~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master local[2] examples/src/main/python/pi.py

您可能还需要使用

--py-files

选项也是如此。 Spark submit options

来源

2016-07-27 18:34:46

如上所示，将master设置为local将仅使您的程序以本地模式运行 - 这对初学者/单个计算机的小负载很有用 - 但它不会配置为在群集上运行。你需要什么，以便在真正的集群（在多台机器和可能的）运行您的程序做的是设置使用脚本主机和从机位于：

<spark-install-dir>/start-master.sh

你的奴隶（你必须有 - 用户界面会显示您的工人和工作等你会看到主界面

<spark-install-dir> start-slave.sh spark://<master-address>:7077

这样，您就能够在一个真实的集群模式下运行：至少一个）应该用启动在主端口8080上机。运行驱动程序的机器上的端口4040将向您显示应用程序UI。端口8081将显示工人UI（如果你在同一台机器上使用多个从站，第一个端口将为8081，第二个端口将为8082等）。

您可以根据需要运行尽可能多的从站许多机器 - 并为每个从机提供大量内核（可以从同一台机器提供少量的从机 - 只需给它们适当数量的内核/内存就可以了 - 这样就不会混淆调度程序）。

来源

2016-08-19 04:48:51

如何正确提交火花的作业单机集群

回答

相关问题