2014-05-23 70 views
3

我一直在设置Spark独立群集设置following this link。我有2台机器;第一个(ubuntu0)作为主人和工作人员,第二个人(ubuntu1)只是一名工作人员。已经为两台机器正确配置了无密码ssh,并通过双方手动进行SSH测试。SPARK +独立群集:无法从其他机器启动工作人员

现在,当我尝试./start-all.ssh时,主计算机(ubuntu0)上的主服务器和辅助服务器都正常启动。这表示为(1)WebUI可访问(我的本地主机:8081)和(2)WebUI上注册/显示的工作人员。 但是,第二台机器上的另一名工人(ubuntu1)未启动。显示的错误是:

ubuntu1: ssh: connect to host ubuntu1 port 22: Connection timed out 

现在,这是很奇怪的已经给了,我已经正确配置了SSH是双方无密码。鉴于此,我访问的第二台机器,并试图使用这些命令手动启动工作:

./spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu0:7707 
./spark-class org.apache.spark.deploy.worker.Worker spark://<ip>:7707 

然而,下面是结果:

14/05/23 13:49:08 INFO Utils: Using Spark's default log4j profile:  
           org/apache/spark/log4j-defaults.properties 
14/05/23 13:49:08 WARN Utils: Your hostname, ubuntu1 resolves to a loopback address:  
         127.0.1.1; using 192.168.122.1 instead (on interface virbr0) 
14/05/23 13:49:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
14/05/23 13:49:09 INFO Slf4jLogger: Slf4jLogger started 
14/05/23 13:49:09 INFO Remoting: Starting remoting 
14/05/23 13:49:09 INFO Remoting: Remoting started; listening on addresses : 
           [akka.tcp://[email protected]:42739] 
14/05/23 13:49:09 INFO Worker: Starting Spark worker ubuntu1.local:42739 with 8 cores, 
           4.8 GB RAM 
14/05/23 13:49:09 INFO Worker: Spark home: /home/ubuntu1/jaysonp/spark/spark-0.9.1 
14/05/23 13:49:09 INFO WorkerWebUI: Started Worker web UI at http://ubuntu1.local:8081 
14/05/23 13:49:09 INFO Worker: Connecting to master spark://ubuntu0:7707... 
14/05/23 13:49:29 INFO Worker: Connecting to master spark://ubuntu0:7707... 
14/05/23 13:49:49 INFO Worker: Connecting to master spark://ubuntu0:7707... 
14/05/23 13:50:09 ERROR Worker: All masters are unresponsive! Giving up. 

下面是我的主人和奴隶的内容\ worker spark-env.ssh:

SPARK_MASTER_IP=192.168.3.222 
STANDALONE_SPARK_MASTER_HOST=`hostname -f` 

我应该如何解决这个问题?提前致谢!

回答

2

对于那些在启动不同机器上的工作人员时仍然遇到错误的人,我只想分享使用IP地址在conf/slave为我工作。 希望这有助于!

0

使用主机在/丛/奴隶的工作很适合我。 这里有一些步骤我将采取它,

  • 经过SSH公钥
  • SCP /etc/spark/conf.dist/spark-env.sh到你的工人

我的一部分在spark-env.sh

出口STANDALONE_SPARK_MASTER_HOST设定值= hostname

出口SPARK_MASTER_IP = $ STANDALONE_SPARK_MASTER_HOST

0

我猜你错过了配置中的某些东西,这就是我从日志中学到的东西。

  1. 检查/etc/hosts,确保ubuntu1在主人的主机列表,其IP为符合奴隶的IP地址。
  2. 添加export SPARK_LOCAL_IP='ubuntu1'spark-env.sh文件你的奴隶
1

我有添加类似的问题今天RHEL 6.7运行火花1.5.1。 我有2台机器,他们的主机名是 - master.domain.com - slave.domain.com

我安装火花的独立版本(前筑起的Hadoop 2.6)并安装我的Oracle JDK-8u66。

星火下载:

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz 

下载Java

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u66-b17/jdk-8u66-linux-x64.tar.gz" 

后的火花和Java是我的主目录解压后我做了以下内容:

上 'master.domain.com'我跑了:

./sbin/start-master.sh

成为可在http://master.domain.com:8080(没有从运行)

上的WebUI 'slave.domain.com' 我曾尝试: ./sbin/start-slave.sh spark://master.domain.com:7077失败如下

Spark Command: /root/java/bin/java -cp /root/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master.domain.com:7077 
======================================== 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
15/11/06 11:03:51 INFO Worker: Registered signal handlers for [TERM, HUP, INT] 
15/11/06 11:03:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/11/06 11:03:51 INFO SecurityManager: Changing view acls to: root 
15/11/06 11:03:51 INFO SecurityManager: Changing modify acls to: root 
15/11/06 11:03:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
15/11/06 11:03:52 INFO Slf4jLogger: Slf4jLogger started 
15/11/06 11:03:52 INFO Remoting: Starting remoting 
15/11/06 11:03:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:50573] 
15/11/06 11:03:52 INFO Utils: Successfully started service 'sparkWorker' on port 50573. 
15/11/06 11:03:52 INFO Worker: Starting Spark worker 10.80.70.38:50573 with 8 cores, 6.7 GB RAM 
15/11/06 11:03:52 INFO Worker: Running Spark version 1.5.1 
15/11/06 11:03:52 INFO Worker: Spark home: /root/spark-1.5.1-bin-hadoop2.6 
15/11/06 11:03:53 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 
15/11/06 11:03:53 INFO WorkerWebUI: Started WorkerWebUI at http://10.80.70.38:8081 
15/11/06 11:03:53 INFO Worker: Connecting to master master.domain.com:7077... 
15/11/06 11:04:05 INFO Worker: Retrying connection to master (attempt # 1) 
15/11/06 11:04:05 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-4,5,main] 
java.util.concurrent.RejectedExecutionException: Task [email protected] rejected from [email protected][Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1] 
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) 
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) 
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) 
    at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) 
    at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:211) 
    at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:210) 
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) 
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) 
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
    at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters(Worker.scala:210) 
    at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:288) 
    at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119) 
    at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234) 
    at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521) 
    at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177) 
    at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126) 
    at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197) 
    at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125) 
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) 
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) 
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) 
    at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59) 
    at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) 
    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) 
    at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) 
    at akka.actor.Actor$class.aroundReceive(Actor.scala:467) 
    at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92) 
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) 
    at akka.actor.ActorCell.invoke(ActorCell.scala:487) 
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) 
    at akka.dispatch.Mailbox.run(Mailbox.scala:220) 
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) 
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
15/11/06 11:04:05 INFO ShutdownHookManager: Shutdown hook called 

start-slave spark://<master-IP>:7077如上面也失败了。

start-slave spark://master:7077工作,并在主WEBUI工人表示

Spark Command: /root/java/bin/java -cp /root/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077 
======================================== 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
15/11/06 11:08:15 INFO Worker: Registered signal handlers for [TERM, HUP, INT] 
15/11/06 11:08:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/11/06 11:08:15 INFO SecurityManager: Changing view acls to: root 
15/11/06 11:08:15 INFO SecurityManager: Changing modify acls to: root 
15/11/06 11:08:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
15/11/06 11:08:16 INFO Slf4jLogger: Slf4jLogger started 
15/11/06 11:08:16 INFO Remoting: Starting remoting 
15/11/06 11:08:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:40780] 
15/11/06 11:08:17 INFO Utils: Successfully started service 'sparkWorker' on port 40780. 
15/11/06 11:08:17 INFO Worker: Starting Spark worker 10.80.70.38:40780 with 8 cores, 6.7 GB RAM 
15/11/06 11:08:17 INFO Worker: Running Spark version 1.5.1 
15/11/06 11:08:17 INFO Worker: Spark home: /root/spark-1.5.1-bin-hadoop2.6 
15/11/06 11:08:17 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 
15/11/06 11:08:17 INFO WorkerWebUI: Started WorkerWebUI at http://10.80.70.38:8081 
15/11/06 11:08:17 INFO Worker: Connecting to master master:7077... 
15/11/06 11:08:17 INFO Worker: Successfully registered with master spark://master:7077 

注:我还没有加入的conf任何额外的配置/ spark-env.sh

注2:在主看时webUI,顶部的火花主URL实际上是为我工作的那个,所以我会说怀疑只是使用那个。

我希望这有助于;)

+0

我只是碰到了同样的问题,可以确认确实你*必须*开始使用主URL工人(一个或多个),因为它显示在Web UI(即用与主设备相同的$ SPARK_MASTER_IP值)。 –

相关问题