在Apache Spark中将数据框写入Java中的Hive表格

我正在尝试完成“将数据框写入Hive表”的简单操作，下面是用Java编写的代码。我使用Cloudera VM时没有任何更改。在Apache Spark中将数据框写入Java中的Hive表格

public static void main(String[] args) { 
    String master = "local[*]"; 

    SparkSession sparkSession = SparkSession 
      .builder().appName(JsonToHive.class.getName()) 
      //.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/") 
      .enableHiveSupport().master(master).getOrCreate(); 

    SparkContext context = sparkSession.sparkContext(); 
    context.setLogLevel("ERROR"); 

    SQLContext sqlCtx = sparkSession.sqlContext(); 
    Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json"); 
    rowDataset.printSchema(); 
    rowDataset.registerTempTable("employeesData"); 

    Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData"); 
    firstRow.show(); 

    sparkSession.catalog().listTables().select("*").show(); 

    firstRow.write().mode() saveAsTable("default.employee"); 
    sparkSession.close(); 

}

我已创建使用HQL在HIVE的管理表，

CREATE TABLE employee (firstName STRING, lastName STRING, addresses ARRAY < STRUCT < street:STRING, city:STRING, state:STRING > >) STORED AS PARQUET;

我是从 “employees.json”

{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}}

它读一个简单的JSON文件数据说“表default.employee已经存在;”并且不会附加内容。如何将内容追加到配置单元表？

如果我设定的模式（“追加”），它不抱怨，但它不写的内容以及..

firstRow.write（）模式（“追加”）saveAsTable（“默认。雇员”）;

任何帮助将不胜感激......谢谢。

+-------------+--------+-----------+---------+-----------+ 
|   name|database|description|tableType|isTemporary| 
+-------------+--------+-----------+---------+-----------+ 
|  employee| default|  null| MANAGED|  false| 
|employeesdata| null|  null|TEMPORARY|  true| 
+-------------+--------+-----------+---------+-----------+

UPDATE

/usr/lib/hive/conf/hive-site.xml是不是在类路径，以便它不读表，它工作得很好classpath中加入它以后。。由于我从IntelliJ运行，我有这个问题..在生产spark-conf文件夹将链接到蜂巢site.xml ...

来源

2017-07-24 Manjesh

你需要创建HiveContext HiveContext sqlContext =新org.apache.spark.sql.hive.HiveContext（ctx.sc（））; –

我认为根本问题是我无法连接到本地配置单元，下面的调用返回 “线程中的异常”main“org.apache.spark.sql.catalyst.analysis.NoSuchTableException：Table或view'employee '未在数据库中找到'默认';“ hiveContext.sql（“SHOW COLUMNS FROM default.employee”）。show（）; sqlCtx.sql（“SHOW COLUMNS FROM default.employee”）。show（）; – Manjesh

在HiveContext上设置配置..没有好运........... hiveContext.setConf（“hive.metastore.warehouse.dir”，“hdfs：// localhost：50070/user/hive/warehouse “）; hiveContext.sql（“SHOW COLUMNS FROM employee”）。show（）; – Manjesh

看起来像你应该做的insertInto(String tableName)而不是saveAsTable(String tableName)。

firstRow.write().mode("append").insertInto("default.employee");

来源

2017-07-24 19:28:52

我认为最根本的问题是我无法连接到本地配置单元，下面的调用返回 “线程中的异常”main“org.apache.spark.sql.catalyst.analysis.NoSuchTableException：表或视图'员工'未在数据库中找到'默认';“ hiveContext.sql（“SHOW COLUMNS FROM default.employee”）。show（）; sqlCtx.sql（“SHOW COLUMNS FROM default.employee”）。show（）; – Manjesh

在HiveContext上设置配置..没有运气........... hiveContext.setConf（“hive.metastore.warehouse.dir”，“hdfs：// localhost：50070/user/hive/warehouse “）; hiveContext.sql（“SHOW COLUMNS FROM employee”）。show（）; – Manjesh

你试过'insertInto（“employee”）'吗？ –

在Apache Spark中将数据框写入Java中的Hive表格

回答

相关问题