2015-09-04 28 views
6

我在本地Mac笔记本上运行Spark 1.4.1,并且能够交互式地使用pyspark而没有任何问题。 Spark是通过Homebrew安装的,我使用的是Anaconda Python。然而,当我尝试使用​​,我得到以下错误:Spark-submit无法导入SparkContext

15/09/04 08:51:09 ERROR SparkContext: Error initializing SparkContext. 
java.io.FileNotFoundException: Added file file:test.py does not exist. 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329) 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at scala.collection.immutable.List.foreach(List.scala:318) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:458) 
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
    at py4j.Gateway.invoke(Gateway.java:214) 
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745) 
15/09/04 08:51:09 ERROR SparkContext: Error stopping SparkContext after init error. 
java.lang.NullPointerException 
    at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152) 
    at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1216) 
    at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) 
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1659) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:565) 
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
    at py4j.Gateway.invoke(Gateway.java:214) 
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745) 
Traceback (most recent call last): 
    File "test.py", line 35, in <module> sc = SparkContext("local","test") 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 113, in __init__ 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 165, in _do_init 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 219, in _initialize_context 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__ 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.io.FileNotFoundException: Added file file:test.py does not exist. 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329) 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at scala.collection.immutable.List.foreach(List.scala:318) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:458) 
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
    at py4j.Gateway.invoke(Gateway.java:214) 
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745) 

这里是我的代码:

from pyspark import SparkContext 

if __name__ == "__main__": 
    sc = SparkContext("local","test") 
    sc.parallelize([1,2,3,4]) 
    sc.stop() 

如果我将文件移动到任何地方/usr/local/Cellar/apache-spark/1.4.1/目录,然后​​工作正常。我有我的环境变量设置如下:

export SPARK_HOME="/usr/local/Cellar/apache-spark/1.4.1" 
export PATH=$SPARK_HOME/bin:$PATH 
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/lib/py4j-0.8.2.1-src.zip 

我敢肯定,一些设置错误在我的环境,但我似乎无法追查。

+1

尝试使用'spark-submit /text.py',看起来好像'spark-submit'找不到您的Python脚本。 –

+0

我尝试了完整路径,仍然收到相同的错误。我也检查了文件夹的权限,这似乎不是问题。 – caleboverman

+4

尝试将包含'test.py'的目录添加到您的PYTHONPATH中。 –

回答

0

由​​执行的python文件应位于PYTHONPATH上。无论是通过做添加的目录的完整路径:

export PYTHONPATH=full/path/to/dir:$PYTHONPATH 

,或者您也可以添加'.'PYTHONPATH如果你已经在Python脚本是

export PYTHONPATH='.':$PYTHONPATH 

由于@Def_Os目录内为了指出这一点!