0

我正在尝试完成“将数据框写入Hive表”的简单操作,下面是用Java编写的代码。我使用Cloudera VM时没有任何更改。在Apache Spark中将数据框写入Java中的Hive表格

public static void main(String[] args) { 
    String master = "local[*]"; 

    SparkSession sparkSession = SparkSession 
      .builder().appName(JsonToHive.class.getName()) 
      //.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/") 
      .enableHiveSupport().master(master).getOrCreate(); 

    SparkContext context = sparkSession.sparkContext(); 
    context.setLogLevel("ERROR"); 

    SQLContext sqlCtx = sparkSession.sqlContext(); 
    Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json"); 
    rowDataset.printSchema(); 
    rowDataset.registerTempTable("employeesData"); 

    Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData"); 
    firstRow.show(); 

    sparkSession.catalog().listTables().select("*").show(); 

    firstRow.write().mode() saveAsTable("default.employee"); 
    sparkSession.close(); 

} 

我已创建使用HQL在HIVE的管理表,

CREATE TABLE employee (firstName STRING, lastName STRING, addresses ARRAY < STRUCT < street:STRING, city:STRING, state:STRING > >) STORED AS PARQUET; 

我是从 “employees.json”

{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}} 

它读一个简单的JSON文件数据说“表default.employee已经存在;”并且不会附加内容。如何将内容追加到配置单元表?

如果我设定的模式(“追加”),它不抱怨,但它不写的内容以及..

firstRow.write()模式(“追加”)saveAsTable(“默认。雇员”);

任何帮助将不胜感激......谢谢。

+-------------+--------+-----------+---------+-----------+ 
|   name|database|description|tableType|isTemporary| 
+-------------+--------+-----------+---------+-----------+ 
|  employee| default|  null| MANAGED|  false| 
|employeesdata| null|  null|TEMPORARY|  true| 
+-------------+--------+-----------+---------+-----------+ 

UPDATE

/usr/lib/hive/conf/hive-site.xml是不是在类路径,以便它不读表,它工作得很好classpath中加入它以后。 。由于我从IntelliJ运行,我有这个问题..在生产spark-conf文件夹将链接到蜂巢site.xml ...

+1

你需要创建HiveContext HiveContext sqlContext =新org.apache.spark.sql.hive.HiveContext(ctx.sc()); –

+0

我认为根本问题是我无法连接到本地配置单元,下面的调用返回 “线程中的异常”main“org.apache.spark.sql.catalyst.analysis.NoSuchTableException:Table或view'employee '未在数据库中找到'默认';“ hiveContext.sql(“SHOW COLUMNS FROM default.employee”)。show(); sqlCtx.sql(“SHOW COLUMNS FROM default.employee”)。show(); – Manjesh

+0

在HiveContext上设置配置..没有好运........... hiveContext.setConf(“hive.metastore.warehouse.dir”,“hdfs:// localhost:50070/user/hive/warehouse “); hiveContext.sql(“SHOW COLUMNS FROM employee”)。show(); – Manjesh

回答

1

看起来像你应该做的insertInto(String tableName)而不是saveAsTable(String tableName)

firstRow.write().mode("append").insertInto("default.employee"); 
+0

我认为最根本的问题是我无法连接到本地配置单元,下面的调用返回 “线程中的异常”main“org.apache.spark.sql.catalyst.analysis.NoSuchTableException:表或视图'员工'未在数据库中找到'默认';“ hiveContext.sql(“SHOW COLUMNS FROM default.employee”)。show(); sqlCtx.sql(“SHOW COLUMNS FROM default.employee”)。show(); – Manjesh

+0

在HiveContext上设置配置..没有运气........... hiveContext.setConf(“hive.metastore.warehouse.dir”,“hdfs:// localhost:50070/user/hive/warehouse “); hiveContext.sql(“SHOW COLUMNS FROM employee”)。show(); – Manjesh

+0

你试过'insertInto(“employee”)'吗? –

相关问题