2017-02-14 59 views
0

当我输入下面的命令无法在纱线客户机模式运行pyspark(pyspark独立工作虽然)

$ pyspark

$ pyspark --master地方,我可以运行火花[2]

但不是当我运行这一个 -

$ pyspark --master纱客户

它给了我一个巨大的堆栈跟踪,这是给出下面或更多更清晰可用here以及。

$ pyspark --master yarn-client 
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead. 
Setting default log level to "WARN". 
To adjust logging level use sc.setLogLevel(newLevel). 
17/02/13 22:04:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/02/13 22:04:15 WARN util.Utils: Your hostname, aamir-UX303LAB resolves to a loopback address: 127.0.1.1; using 10.0.0.240 instead (on interface wlan0) 
17/02/13 22:04:15 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
17/02/13 22:04:17 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 
17/02/13 22:04:33 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED! 
17/02/13 22:04:33 ERROR spark.SparkContext: Error initializing SparkContext. 
java.lang.IllegalStateException: Spark context stopped while waiting for backend 
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:584) 
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162) 
at org.apache.spark.SparkContext.<init>(SparkContext.scala:546) 
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
at py4j.Gateway.invoke(Gateway.java:236) 
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
at py4j.GatewayConnection.run(GatewayConnection.java:214) 
at java.lang.Thread.run(Thread.java:745) 
17/02/13 22:04:33 ERROR client.TransportClient: Failed to send RPC 8657965417329630894 to /10.0.0.240:60580: java.nio.channels.ClosedChannelException 
java.nio.channels.ClosedChannelException 
17/02/13 22:04:33 ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful 
java.io.IOException: Failed to send RPC 8657965417329630894 to /10.0.0.240:60580: java.nio.channels.ClosedChannelException 
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249) 
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233) 
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) 
at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) 
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: java.nio.channels.ClosedChannelException 
17/02/13 22:04:33 ERROR util.Utils: Uncaught exception in thread Yarn application state monitor 
org.apache.spark.SparkException: Exception thrown in awaitResult 
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) 
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) 
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) 
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:508) 
at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:93) 
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:151) 
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:455) 
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1605) 
at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1798) 
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1287) 
at org.apache.spark.SparkContext.stop(SparkContext.scala:1797) 
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:108) 
Caused by: java.io.IOException: Failed to send RPC 8657965417329630894 to /10.0.0.240:60580: java.nio.channels.ClosedChannelException 
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249) 
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233) 
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) 
at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) 
at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) 
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: java.nio.channels.ClosedChannelException 
17/02/13 22:04:33 WARN spark.SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
py4j.Gateway.invoke(Gateway.java:236) 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
py4j.GatewayConnection.run(GatewayConnection.java:214) 
java.lang.Thread.run(Thread.java:745) 
17/02/13 22:04:33 ERROR spark.SparkContext: Error initializing SparkContext. 
org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode! 
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:352) 
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:366) 
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834) 
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167) 
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) 
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) 
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) 
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
at py4j.Gateway.invoke(Gateway.java:236) 
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
at py4j.GatewayConnection.run(GatewayConnection.java:214) 
at java.lang.Thread.run(Thread.java:745) 
17/02/13 22:04:33 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 
17/02/13 22:04:33 ERROR util.Utils: Uncaught exception in thread Thread-2 
org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode! 
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:352) 
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:152) 
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:455) 
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1605) 
at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1798) 
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1287) 
at org.apache.spark.SparkContext.stop(SparkContext.scala:1797) 
at org.apache.spark.SparkContext.<init>(SparkContext.scala:565) 
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
at py4j.Gateway.invoke(Gateway.java:236) 
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
at py4j.GatewayConnection.run(GatewayConnection.java:214) 
at java.lang.Thread.run(Thread.java:745) 
17/02/13 22:04:33 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running 
Traceback (most recent call last): 
File "/usr/local/spark/python/pyspark/shell.py", line 47, in <module> 
spark = SparkSession.builder.getOrCreate() 
File "/usr/local/spark/python/pyspark/sql/session.py", line 169, in getOrCreate 
sc = SparkContext.getOrCreate(sparkConf) 
File "/usr/local/spark/python/pyspark/context.py", line 294, in getOrCreate 
SparkContext(conf=conf or SparkConf()) 
File "/usr/local/spark/python/pyspark/context.py", line 115, in __init__ 
conf, jsc, profiler_cls) 
File "/usr/local/spark/python/pyspark/context.py", line 168, in _do_init 
self._jsc = jsc or self._initialize_context(self._conf._jconf) 
File "/usr/local/spark/python/pyspark/context.py", line 233, in _initialize_context 
return self._jvm.JavaSparkContext(jconf) 
File "/usr/local/spark/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1401, in __call__ 
File "/usr/local/spark/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode! 
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:352) 
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:366) 
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834) 
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167) 
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) 
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) 
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) 
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240) 
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
at py4j.Gateway.invoke(Gateway.java:236) 
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) 
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) 
at py4j.GatewayConnection.run(GatewayConnection.java:214) 
at java.lang.Thread.run(Thread.java:745) 

>>> 

我hadoop安装在伪分布式模式,已经开始dfs.sh和yarn.sh.他们似乎为$ JPS正常运行,给我 -

14002 SecondaryNameNode 
13796 DataNode 
14311 NodeManager 
15658 Jps 
14171 ResourceManager 
13631 NameNode 

我不是在虚拟机上,我使用Ubuntu和Hadoop 2.7和我使用的火花2.0.1。

条目spark-env.sh -

export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop 

export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop 

.bashrc中看起来是这样的 -

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/ 
export PATH=$PATH:$JAVA_HOME/bin 

export HADOOP_HOME=/usr/local/hadoop 
export PATH=$PATH:$HADOOP_HOME/bin 
export PATH=$PATH:$HADOOP_HOME/sbin 
export HADOOP_MAPRED_HOME=$HADOOP_HOME 
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export YARN_HOME=$HADOOP_HOME 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" 

export HIVE_HOME=/usr/local/hive 
export PATH=$PATH:$HIVE_HOME/bin 
export CLASSPATH=$CLASSPATH:HADOOP_HOME/lib/*:. 
export CLASSPATH=$CLASSPATH:HIVE_HOME/lib/*:. 

export DERBY_HOME=/usr/local/derby 
export PATH=$PATH:$DERBY_HOME/bin 
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar 

export SPARK_HOME=/usr/local/spark 
export PATH=$PATH:$SPARK_HOME/bin 
export PATH=$PATH:$SPARK_HOME/sbin 

由于一吨的帮助!

回答

-1

尝试运行pyspark --master yarn --deploy-mode client

+0

不工作:(同样的错误我猜.. – aamirr