2017-07-21 46 views
0

我无法将嵌套的JSON数据加载到Hive表中。有人可以帮我吗?下面是我曾尝试:为嵌套的JSON数据创建Hive表

样品输入:

{"DocId":"ABC","User1":{"Id":1234,"Username":"sam1234","Name":"Sam","ShippingAddress":{"Address1":"123 Main St.","Address2":null,"City":"Durham","State":"NC"},"Orders":[{"ItemId":6789,"OrderDate":"11/11/2012"},{"ItemId":4352,"OrderDate":"12/12/2012"}]}} 

在蜂巢(CDH3):

ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar; 

CREATE TABLE json_tab(
    DocId string, 
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>> 
) 
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' 
STORED AS TEXTFILE; 

hive> select * from json_tab; 
OK 
NULL null 

我在这里得到NULL秒。

与HCatalog罐子也试过:

ADD JAR /home/training/Desktop/hcatalog-core-0.11.0.jar; 

CREATE TABLE json_tab(
    DocId string, 
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>> 
) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'; 

但低于错误与我create table声明面临:

失败:错误的元数据:无法验证SERDE: org.apache.hive.hcatalog .data.JsonSerDe FAILED:执行错误, 从org.apache.hadoop.hive.ql.exec.DDLTask返回代码1

有人可以帮我吗?感谢您的帮助提前。

回答

3

可以使用org.openx.data.jsonserde.JsonSerDe类RAD JSON数据

您可以从http://www.congiu.net/hive-json-serde/1.3.6-SNAPSHOT/cdh4/

jar文件,并做以下步骤

add jar /path/to/jar/json-serde-1.3.6-jar-with-dependencies.jar; 

CREATE TABLE json_tab(
    DocId string, 
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>> 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; 

LOAD DATA LOCAL INPATH '/path/to/data/nested.json' INTO TABLE json_tab; 

SELECT DocId, User1.Id, User1.ShippingAddress.City as city, 
User1.Orders[0].ItemId as order0id, 
User1.Orders[1].ItemId as order1id from json_tab; 


result 
ABC  1234 Durham 6789 4352 
+1

谢谢。我试着用建议的依赖关系jar。但是,创建表语句抛出错误为“失败:执行错误,从org.apache.hadoop.hive.ql.exec.DDLTask。org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector返回代码1 ( Lorg /阿帕奇/ hadoop的/蜂巢/ serde2/objectinspector /原始/ PrimitiveObjectInspectorUtils $ PrimitiveTypeEntry;)V”。你能检查一下,让我知道可以做些什么吗?我尝试了CDH3和CDH5。 – user2531569

0
I was getting same exception. 

我加了下面的罐子,它对我很有用。

ADD JAR /home/cloudera/Data/json-serde-1.3.7.3.jar; 
ADD JAR /home/cloudera/Data/hive-hcatalog-core-0.13.0.jar;