2014-11-03 55 views
0

我有以下格式加载JSON数组猪

[ 
    { 
    "id": 2, 
    "createdBy": 0, 
    "status": 0, 
    "utcTime": "Oct 14, 2014 4:49:47 PM", 
    "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia", 
    "longitude": 77.5983817, 
    "latitude": 12.9832418, 
    "createdDate": "Sep 16, 2014 2:59:03 PM", 
    "accuracy": 5, 
    "loginType": 1, 
    "mobileNo": "0000005567" 
    }, 
    { 
    "id": 4, 
    "createdBy": 0, 
    "status": 0, 
    "utcTime": "Oct 14, 2014 4:52:48 PM", 
    "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia", 
    "longitude": 77.5983817, 
    "latitude": 12.9832418, 
    "createdDate": "Oct 8, 2014 5:24:42 PM", 
    "accuracy": 5, 
    "loginType": 1, 
    "mobileNo": "0000005566" 
    } 
] 

json文件,当我尝试使用JsonLoader类我越来越喜欢意外的结束输入的,一个错误的数据加载到猪:预期适合对象

a = LOAD '/user/root/jsoneg/exp.json' USING JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double,loginType:double,mobileNo:chararray'); 
b = foreach a generate $0,$1,$2; 
dump b; 
+0

是您的JSON文件格式有一个JSON对象在单线上?我认为猪正试图逐行解析json,而json被分成多行。 – mbaxi 2014-11-03 10:18:18

回答

3

我也面临类似的问题的某个时候回来,接近标记后来我才知道,猪八戒JSON将不支持多JSON格式。它总会期望json输入必须在单行中。

而不是原生的Jsonloader,我建议你使用elephantbird json loader。这对于Jsons格式非常有用。

您可以从下面的链接

http://www.java2s.com/Code/Jar/e/elephant.htm 

下载的罐子我改变了你的输入格式,以单行和加载通过elephantbird如下

input.json 
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]} 

PigScript: 
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar'; 
REGISTER '/tmp/elephant-bird-pig-4.1.jar'; 

A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad'); 
B = FOREACH A GENERATE FLATTEN($0#'test'); 
C = FOREACH B GENERATE FLATTEN($0) AS mymap; 
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status'; 
DUMP D; 

Output: 
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0) 
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0) 
+0

感谢您的代码。节省了很多时间 – Astro 2016-04-08 19:08:54