2

我试图模拟使用Docker和Zookeeper的多节点Mesos群集,并尝试在其上运行简单(py)Spark作业。这些Docker容器和pyspark脚本都运行在同一台机器上。然而,当我执行我的星火脚本,它挂在:Spark与Docker Mesos群集的身份验证挂起

No credentials provided. Attempting to register without authentication 

的Mesos从不断输出:

I0929 14:59:32.925915 62 slave.cpp:1959] Asked to shut down framework 20150929-143802-1224741292-5050-33-0060 by [email protected]:5050 
W0929 14:59:32.926035 62 slave.cpp:1974] Cannot shut down unknown framework 20150929-143802-1224741292-5050-33-0060 

而且Mesos主不断输出:

I0929 14:38:15.169683 39 master.cpp:2094] Received SUBSCRIBE call for framework 'test' at [email protected]:46693 
I0929 14:38:15.169845 39 master.cpp:2164] Subscribing framework test with checkpointing disabled and capabilities [ ] 
E0929 14:38:15.170361 42 socket.hpp:174] Shutdown failed on fd=15: Transport endpoint is not connected [107] 
I0929 14:38:15.170409 36 hierarchical.hpp:391] Added framework 20150929-143802-1224741292-5050-33-0000 
I0929 14:38:15.170534 39 master.cpp:1051] Framework 20150929-143802-1224741292-5050-33-0000 (test) at [email protected]:46693 disconnected 
I0929 14:38:15.170549 39 master.cpp:2370] Disconnecting framework 20150929-143802-1224741292-5050-33-0000 (test) at [email protected]:46693 
I0929 14:38:15.170555 39 master.cpp:2394] Deactivating framework 20150929-143802-1224741292-5050-33-0000 (test) at [email protected]:46693 
E0929 14:38:15.170560 42 socket.hpp:174] Shutdown failed on fd=16: Transport endpoint is not connected [107] 
I0929 14:38:15.170593 39 master.cpp:1075] Giving framework 20150929-143802-1224741292-5050-33-0000 (test) at [email protected]:46693 0ns to failover 
W0929 14:38:15.170835 41 master.cpp:4482] Master returning resources offered to framework 20150929-143802-1224741292-5050-33-0000 because the framework has terminated or is inactive 
I0929 14:38:15.170855 36 hierarchical.hpp:474] Deactivated framework 20150929-143802-1224741292-5050-33-0000 
I0929 14:38:15.170990 37 hierarchical.hpp:814] Recovered cpus(*):8; mem(*):31092; disk(*):443036; ports(*):[31000-32000] (total: cpus(*):8; mem(*):31092; disk(*):443036; ports(*):[31000-32000 
], allocated:) on slave 20150929-051336-1224741292-5050-19-S0 from framework 20150929-143802-1224741292-5050-33-0000 
I0929 14:38:15.171820 41 master.cpp:4469] Framework failover timeout, removing framework 20150929-143802-1224741292-5050-33-0000 (test) at [email protected] 
.1.1:46693 
I0929 14:38:15.171835 41 master.cpp:5112] Removing framework 20150929-143802-1224741292-5050-33-0000 (test) at [email protected]:46693 
I0929 14:38:15.172130 41 hierarchical.hpp:428] Removed framework 20150929-143802-1224741292-5050-33-0000 

的Mesos主Docker镜像使用以下Dockerfile构建:

FROM ubuntu:14.04 

ENV MESOS_V 0.24.0 

# update 
RUN apt-get update 
RUN apt-get upgrade -y 

# dependencies 
RUN apt-get install -y wget openjdk-7-jdk build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev 

# mesos 
RUN wget http://www.apache.org/dist/mesos/${MESOS_V}/mesos-${MESOS_V}.tar.gz 
RUN tar -zxf mesos-*.tar.gz 
RUN rm mesos-*.tar.gz 
RUN mv mesos-* mesos 
WORKDIR mesos 
RUN mkdir build 
RUN ./configure 
RUN make 
RUN make install 

RUN ldconfig 

EXPOSE 5050 

ENTRYPOINT ["/bin/bash"] 

我手动执行mesos-master命令:

LIBPROCESS_IP=${MASTER_IP} mesos-master --registry=in_memory --ip=${MASTER_IP} --zk=zk://172.17.0.75:2181/mesos --advertise_ip=${MASTER_IP} 

的Mesos从属多克尔图像使用相同的Dockerfile内置除了端口5051被改为暴露。然后,我在它的容器中运行以下命令:

LIBPROCESS_IP=172.17.0.72 mesos-slave --master=zk://172.17.0.75:2181/mesos 

的pyspark脚本是:

import os 
import pyspark 

src = 'file:///{}/README.md'.format(os.environ['SPARK_HOME']) 

leader_ip = '172.17.0.75' 
conf = pyspark.SparkConf() 
conf.setMaster('mesos://zk://{}:2181/mesos'.format(leader_ip)) 
conf.set('spark.executor.uri', 'http://d3kbcqa49mib13.cloudfront.net/spark-1.5.0-bin-hadoop2.6.tgz') 
conf.setAppName('my_test_app') 

sc = pyspark.SparkContext(conf=conf) 

lines = sc.textFile(src) 
words = lines.flatMap(lambda x: x.split(' ')) 
word_count = (words.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x+y)) 
print(word_count.collect()) 

这里是pyspark脚本的完整输出:

15/09/29 11:07:59 INFO SparkContext: Running Spark version 1.5.0 
15/09/29 11:07:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/09/29 11:07:59 WARN Utils: Your hostname, hubble resolves to a loopback address: 127.0.1.1; using 192.168.1.2 instead (on interface em1) 
15/09/29 11:07:59 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
15/09/29 11:07:59 INFO SecurityManager: Changing view acls to: ftseng 
15/09/29 11:07:59 INFO SecurityManager: Changing modify acls to: ftseng 
15/09/29 11:07:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ftseng); users with modify permissions: Set(ftseng) 
15/09/29 11:08:00 INFO Slf4jLogger: Slf4jLogger started 
15/09/29 11:08:00 INFO Remoting: Starting remoting 
15/09/29 11:08:00 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:38860] 
15/09/29 11:08:00 INFO Utils: Successfully started service 'sparkDriver' on port 38860. 
15/09/29 11:08:00 INFO SparkEnv: Registering MapOutputTracker 
15/09/29 11:08:00 INFO SparkEnv: Registering BlockManagerMaster 
15/09/29 11:08:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-28695bd2-fc83-45f4-b0a0-eefcfb80a3b5 
15/09/29 11:08:00 INFO MemoryStore: MemoryStore started with capacity 530.3 MB 
15/09/29 11:08:00 INFO HttpFileServer: HTTP File server directory is /tmp/spark-89444c7a-725a-4454-87db-8873f4134580/httpd-341c3da9-16d5-43a4-93ee-0e8b47389fdb 
15/09/29 11:08:00 INFO HttpServer: Starting HTTP Server 
15/09/29 11:08:00 INFO Utils: Successfully started service 'HTTP file server' on port 51405. 
15/09/29 11:08:00 INFO SparkEnv: Registering OutputCommitCoordinator 
15/09/29 11:08:00 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/09/29 11:08:00 INFO SparkUI: Started SparkUI at http://192.168.1.2:4040 
15/09/29 11:08:00 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:zookeeper.version=zookeeper C client 3.4.5 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:host.name=hubble 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:os.name=Linux 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:os.arch=3.19.0-25-generic 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:os.version=#26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:user.name=ftseng 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:user.home=/home/ftseng 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Client environment:user.dir=/home/ftseng 
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):[email protected][email protected]: Initiating client connection, host=172.17.0.75:2181 sessionTimeout=10000 watcher=0x7fc0962b7176 sessionId=0 sessionPasswd=<null> context=0x7fc078001860 flags=0 
I0929 11:08:00.651923 32328 sched.cpp:164] Version: 0.24.0 
2015-09-29 11:08:00,652:32221(0x7fc06bfff700):[email protected][email protected]: initiated connection to server [172.17.0.75:2181] 
2015-09-29 11:08:00,657:32221(0x7fc06bfff700):[email protected][email protected]: session establishment complete on server [172.17.0.75:2181], sessionId=0x150177fcfc40014, negotiated timeout=10000 
I0929 11:08:00.658051 32322 group.cpp:331] Group process (group(1)@127.0.1.1:48692) connected to ZooKeeper 
I0929 11:08:00.658083 32322 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) 
I0929 11:08:00.658100 32322 group.cpp:403] Trying to create path '/mesos' in ZooKeeper 
I0929 11:08:00.659600 32326 detector.cpp:156] Detected a new leader: (id='2') 
I0929 11:08:00.659904 32325 group.cpp:674] Trying to get '/mesos/json.info_0000000002' in ZooKeeper 
I0929 11:08:00.661052 32326 detector.cpp:481] A new leading master ([email protected]:5050) is detected 
I0929 11:08:00.661201 32320 sched.cpp:262] New master detected at [email protected]:5050 
I0929 11:08:00.661798 32320 sched.cpp:272] No credentials provided. Attempting to register without authentication 
+0

那些低调的问题 - 谨慎解释它有什么问题吗? – frnsys

回答

2

后多了很多实验中,当它应该使用Docker主机IP(172.17.xx.xx)时,它看起来像是主机IP地址(使用本地网络地址192.168.xx.xx)的问题。

我设法得到的东西与运行:

LIBPROCESS_IP=172.17.xx.xx python test_spark.py 

现在我打一个不同的错误,但它似乎不相关的,所以我觉得这个命令可以解决我的问题。

我对Mesos/Spark还不太了解,但还没有明白为什么这个修复了一些东西,所以如果有人想添加一个解释,那将非常有帮助。