2015-11-18 57 views
3

save to JDBC相关,尝试导入文本文件并保存到Hive JDBC文件以供报告工具导入。Apache Spark数据框createJDBCTable异常

我们正在运行spark-1.5.1-bin-hadoop2.6(master + 1 slave),JDBC节点服务器和直线客户端。他们似乎都互相联系和沟通。从我所能理解的情况来看,Hive包含在这个版本的数据核心罐子里。我已配置目录来保存Hive文件,但没有conf/hive-config.xml。

简单的输入CSV文件:

Administrator,FiveHundredAddresses1,92121 
Ann,FiveHundredAddresses2,92109 
Bobby,FiveHundredAddresses3,92101 
Charles,FiveHundredAddresses4,92111 

users表已在使用

CREATE TABLE users(first_name STRING, last_name STRING, zip_code STRING); 
show tables; // it's there 

有关主斯卡拉REPL会话的直线客户预创建:

val connectionUrl = "jdbc:hive2://x.y.z.t:10000/users?user=blah&password=" 
val userCsvFile = sc.textFile("/home/blah/Downloads/Users4.csv") 
case class User(first_name:String, last_name:String, work_zip:String) 
val users = userCsvFile.map(_.split(",")).map(l => User(l(0), l(1), l(2))) 
val usersDf = sqlContext.createDataFrame(users) 
usersDf.count() // 4 
usersDf.schema // res92: org.apache.spark.sql.types.StructType = StructType(StructField(first_name,StringType,true), StructField(last_name,StringType,true), StructField(work_zip,StringType,true)) 
usersDf.insertIntoJDBC(connectionUrl,"users",true) 

usersDf.createJDBCTable(connectionUrl, "users", true) // w/o beeline creation 

OR

val properties = new java.util.Properties 
properties.setProperty("user", "blah") 
properties.setProperty("password", "blah") 
val connectionUrl = "jdbc:hive2://172.16.3.10:10000" 
contactsDf.write.jdbc(connectionUrl,"contacts", properties) 

抛出

warning: there were 1 deprecation warning(s); re-run with -deprecation for details 
java.sql.SQLException: org.apache.spark.sql.AnalysisException: cannot recognize input near 'TEXT' ',' 'last_name' in column type; line 1 pos 
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) 
at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:406) 
at org.apache.hive.jdbc.HivePreparedStatement.executeUpdate(HivePreparedStatement.java:119) 
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:275) 
at org.apache.spark.sql.DataFrame.insertIntoJDBC(DataFrame.scala:1629) 

任何想法,我要去哪里错了吗?该版本是否可以从DataFrame中写入JDBC文件?

感谢您的帮助!

乔恩

+0

想通了: –

回答

1

经过大量搜索(现在的作品),你可以在REPL这样做的:

import org.apache.spark.sql.SaveMode 
contactsDf.saveAsTable("contacts", SaveMode.Overwrite) 

我还配置$ SPARK_INSTALL_LOC/conf目录/蜂房的site.xml如下:

<property> 
    <name>javax.jdo.option.ConnectionURL</name> 
    <value>jdbc:derby:;databaseName=metastore_db;create=true</value> 
    <description>JDBC connect string for a JDBC metastore</description> 
</property> 

<property> 
    <name>javax.jdo.option.ConnectionDriverName</name> 
    <value>org.apache.derby.jdbc.EmbeddedDriver</value> 
    <description>Driver class name for a JDBC metastore</description> 
</property> 

<property> 
    <name>hive.metastore.warehouse.dir</name> 
    <value>/user/hive-warehouse</value> 
    <description>Where to store metastore data</description> 
</property> 

</configuration> 

另一个关键是与Derby作为蜂房备份数据库由于Derby的线程限制,你不能(至少是如何配置它的)同时运行ThriftJdbc Server和REPL。但是,也许如果它被重新配置Postgres或MySQL或类似的同时访问可能是可能的。

相关问题