0
我一直在使用spark 2.0.1,但试图升级到更新的版本,即2.1.1通过下载tar文件到我的本地和更改路径。奇怪的错误初始化sparkContext python
但是,现在当我尝试运行任何程序时,它在初始化sparkContext时失败。即
sc = SparkContext()
,我试图运行整个示例代码:
import os
os.environ['SPARK_HOME']="/opt/apps/spark-2.1.1-bin-hadoop2.7/"
from pyspark import SparkContext
from pyspark.sql import *
sc = SparkContext()
sqlContext = SQLContext(sc)
df_tract_alpha= sqlContext.read.parquet("tract_alpha.parquet")
print (df_tract_alpha.count())
我得到的例外是在开始时本身即:我不是合格的Ubuntu
Traceback (most recent call last): File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 47, in sc = SparkContext() File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 118, in __init__ conf, jsc, profiler_cls) File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 182, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 249, in _initialize_context return self._jvm.JavaSparkContext(jconf) File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__ File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NumberFormatException: For input string: "Ubuntu" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
任何地方在我的变量或我的ENV变量以及..
我也试过改变sc = SparkConte xt(master ='local'),但问题是一样的。
请确认这个问题
编辑帮助:火花defaults.conf的内容
spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory 8g spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" spark.driver.extraClassPath /opt/apps/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.35-bin.jar spark.executor.extraClassPath /opt/apps/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.35-bin.jar
我检查了我的配置。他们似乎很好,现在在问题中增加了内容。我甚至没有使用火花执行器内核。 – Viv
即使是grep -R“Ubuntu”。在火花文件夹中没有产生任何结果 – Viv
奇怪。我可能会尝试使用命令行shell工具来查看是否可以打开上下文。有时候scala('spark-shell')会给出更好的错误信息; pyspark错误信息往往会被py4j接口遮蔽。 – santon