2016-09-09 94 views
0

我正在使用Pyspark创建一个数据框,但是出现了一个错误。Python pyspark错误

我使用以下代码来创建使用从实施例中的数据的数据帧的文件夹:

df = spark.read.load(`c:/spark/examples/src/main/resources/users.parquet`) 

这将生成以下广泛错误消息:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". 
SLF4J: Defaulting to no-operation (NOP) logger implementation 
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 
16/09/09 15:41:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 
16/09/09 15:41:51 WARN Hive: Failed to access metastore. This class should not accessed in runtime. 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hiv 
e.ql.metadata.SessionHiveMetaStoreClient 
     at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) 
     at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) 
     at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166) 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) 
     at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) 
     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) 
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) 
     at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:280) 
     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) 
     at py4j.commands.CallCommand.execute(CallCommand.java:79) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien 
t 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) 
     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) 
     ... 36 more 
Caused by: java.lang.reflect.InvocationTargetException 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) 
     ... 42 more 
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/ 
bin/spark-warehouse 
     at org.apache.hadoop.fs.Path.initialize(Path.java:205) 
     at org.apache.hadoop.fs.Path.<init>(Path.java:171) 
     at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159) 
     at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:600) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199) 
     at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74) 
     ... 47 more 
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/bin/spark-warehouse 
     at java.net.URI.checkPath(URI.java:1823) 
     at java.net.URI.<init>(URI.java:745) 
     at org.apache.hadoop.fs.Path.initialize(Path.java:202) 
     ... 58 more 
16/09/09 15:41:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "c:\Spark\python\pyspark\sql\readwriter.py", line 147, in load 
    return self._df(self._jreader.load(path)) 
    File "c:\Spark\python\lib\py4j-0.10.1-src.zip\py4j\java_gateway.py", line 933, in __call__ 
    File "c:\Spark\python\pyspark\sql\utils.py", line 63, in deco 
    return f(*a, **kw) 
    File "c:\Spark\python\lib\py4j-0.10.1-src.zip\py4j\protocol.py", line 312, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling o27.load. 
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.Sessio 
nHiveMetaStoreClient 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359) 
     at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) 
     at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46) 
     at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50) 
     at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) 
     at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) 
     at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) 
     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) 
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) 
     at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) 
     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
     at py4j.Gateway.invoke(Gateway.java:280) 
     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) 
     at py4j.commands.CallCommand.execute(CallCommand.java:79) 
     at py4j.GatewayConnection.run(GatewayConnection.java:211) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien 
t 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) 
     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
     ... 33 more 
Caused by: java.lang.reflect.InvocationTargetException 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) 
     ... 39 more 
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/ 
bin/spark-warehouse 
     at org.apache.hadoop.fs.Path.initialize(Path.java:205) 
     at org.apache.hadoop.fs.Path.<init>(Path.java:171) 
     at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159) 
     at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:600) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199) 
     at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74) 
     ... 44 more 
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/bin/spark-warehouse 
     at java.net.URI.checkPath(URI.java:1823) 
     at java.net.URI.<init>(URI.java:745) 
     at org.apache.hadoop.fs.Path.initialize(Path.java:202) 
     ... 55 more 

我认为一个原因可能是这line:

java.net.URISyntaxException: Relative path in absolute URI: file:c:/Spark/bin/spark-warehouse 

我不相信如何解决这个问题,所以任何assistan ce非常感谢,

回答

0

这是Spark安装的问题。我在本地安装。我创建了rdd的&一切都很好,直到我想从rdds创建一个Spark DataFrame ...大错误。

的问题是与预建星火版本:火花2.0.0彬hadoop2.7

我删除了火花2.0.0彬hadoop2.7下载和安装时火花1.6使用PIP安装py4j,而不是解压缩,并使用附带的预建星火

我现在可以创建数据框

我想到结局版本1,2-滨hadoop2.6 是双重的: 1.如果在Windows7上安装,使用spark-1.6.2-bin-hadoop2.6并且想要使用Spark DataFrames 2. SparkSession不可用 - 只有Spark 2出现,必须使用SQLContext ...哦!

关于