2015-10-14 67 views
2

我在Apache Spark 1.3中使用HiveContext,因为我需要它的更好的查询支持(vs 1.3的SQLContext)。防止Spark HiveContext连接到Hive

我在Azure'HDInsight'Spark群集上运行。驱动程序的HiveContext试图连接到不存在的Hive Metastore。这是打破了司机。

我实际上并不需要Hive支持。

什么是最好的方式停止Spark的HiveContext试图连接到蜂巢?例如,取消设置特定的环境属性? (有100个可能相关的预设属性)。


编辑堆栈跟踪:

15/10/14 06:35:29 WARN metastore: Failed to connect to the MetaStore Server... 
15/10/14 06:35:50 WARN metastore: Failed to connect to the MetaStore Server... 
15/10/14 06:36:11 WARN metastore: Failed to connect to the MetaStore Server... 
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346) 
     at org.apache.spark.sql.hive.HiveContext.sessionState$lzycompute(HiveContext.scala:241) 
     at org.apache.spark.sql.hive.HiveContext.sessionState(HiveContext.scala:237) 
     at org.apache.spark.sql.hive.HiveContext$QueryExecution.<init>(HiveContext.scala:385) 
     at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:91) 
     at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:50) 
     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) 
     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) 
     at org.apache.spark.sql.SQLContext.load(SQLContext.scala:728) 
     at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:564) 
     ..<snip>.. 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72) 
     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465) 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) 
     ... 47 more 
Caused by: java.lang.reflect.InvocationTargetException 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410) 
     ... 52 more 
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 
connect timed out 
     at org.apache.thrift.transport.TSocket.open(TSocket.java:185) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72) 
     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465) 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) 
     at org.apache.spark.sql.hive.HiveContext.sessionState$lzycompute(HiveContext.scala:241) 
     at org.apache.spark.sql.hive.HiveContext.sessionState(HiveContext.scala:237) 
     at org.apache.spark.sql.hive.HiveContext$QueryExecution.<init>(HiveContext.scala:385) 
     at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:91) 
     at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:50) 
     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) 
     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) 
     at org.apache.spark.sql.SQLContext.load(SQLContext.scala:728) 
     at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:564) 
     ..<snip>.. 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.net.SocketTimeoutException: connect timed out 
     at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) 
     at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) 
     at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) 
     at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) 
     at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) 
     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) 
     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 
     at java.net.Socket.connect(Socket.java:579) 
     at org.apache.thrift.transport.TSocket.open(TSocket.java:180) 
     ... 59 more 
) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:382) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214) 
     ... 57 more 

回答

2

相关的属性是hive.metastore.uris

由于预加载C:\apps\dist\spark-1.3.1.2.2.7.1-0004\hive-site.xml而被预设为thrift://headnodehost:9083。这是在生成的CLASSPATH比我自己hive-site.xml更早被忽略。

我找不到一个简单的工作方式来覆盖该属性值。 (如果你知道一种方式,请评论)

作为一个黑客解决方案,我只是将hive-site.xml移开。当然,这必须通过RDP手动完成(您必须在您的headnode上启用它)。