2015-08-18 23 views
1

我在Eclipse IDE中使用Pyspark进行编程,并试图转换到Spark 1.4.1,以便最终可以使用Python 3进行编程。下面的程序工作在星火1.3.1但在星火1.4.1抛出异常:Spark 1.4.1 py4j.Py4JException:方法读取([])不存在

from pyspark import SparkContext, SparkConf 
from pyspark.sql.types import * 
from pyspark.sql import SQLContext 

if __name__ == '__main__': 
    conf = SparkConf().setAppName("MyApp").setMaster("local") 

    global sc 
    sc = SparkContext(conf=conf)  

    global sqlc 
    sqlc = SQLContext(sc) 

    symbolsPath = 'SP500Industry.json' 
    symbolsRDD = sqlc.read.json(symbolsPath) 

    print "Done"" 

我得到的回溯如下:

Traceback (most recent call last): 
    File "/media/gavin/20A6-76BF/Current Projects Luna/PySpark    Test/Test.py", line 21, in <module> 
    symbolsRDD = sqlc.read.json(symbolsPath) #rdd with all symbols (and their industries 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 582, in read 
    return DataFrameReader(self) 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 39, in __init__ 
self._jreader = sqlContext._ssql_ctx.read() 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 304, in get_return_value 
py4j.protocol.Py4JError: An error occurred while calling o18.read.   Trace: 
py4j.Py4JException: Method read([]) does not exist 
    at   py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) 
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) 
    at py4j.Gateway.invoke(Gateway.java:252) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745)" 

我对外部库项目是 ... spark-1.4.1-bin-hadoop2.6/python ... spa rk-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip ... spark-1.4.1-bin-hadoop2.6/python/lib/pyspark.zip(已尝试包括和不包括这个)

有人能帮我解决我做错了什么吗?

回答

0

您需要在调用加载之前将格式设置为'json'。否则,spark会假定您正在尝试加载Parquet文件。

symbolsRDD = sqlc.read.format('json').json(symbolsPath) 

但是,我仍然无法弄清楚为什么你得到一个读取方法错误。 Spark应该抱怨说它找到了无效的Parquet文件。

+0

我得到与OP中提到的完全相同的错误和您的调整。感谢您的帮助。 –