2016-03-10 137 views
6

我试图使用USQL从JSON文件中提取数据。查询成功运行而不产生任何输出数据或导致“顶点失败的快速错误”。U-SQL无法从JSON文件中提取数据

JSON文件看起来像:

{ 
    "results": [ 
    { 
     "name": "Sales/Account", 
     "id": "7367e3f2-e1a5-11e5-80e8-0933ecd4cd8c", 
     "deviceName": "HP", 
     "deviceModel": "g6-pavilion", 
     "clientip": "0.41.4.1" 
    }, 
    { 
     "name": "Sales/Account", 
     "id": "c01efba0-e0d5-11e5-ae20-af6dc1f2c036", 
     "deviceName": "acer", 
     "deviceModel": "veriton", 
     "clientip": "10.10.14.36" 
    } 
    ] 
} 

我的U型SQL脚本

REFERENCE ASSEMBLY [Newtonsoft.Json]; 
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json"; 

DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv"; 

@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor(); 

@jsonify=SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(results,"name","id","deviceName","deviceModel","clientip") AS rec FROM @trail2; 

@logSchema=SELECT rec["name"] AS sysName, 
       rec["id"] AS sysId, 
       rec["deviceName"] AS domainDeviceName, 
       rec["deviceModel"] AS domainDeviceModel, 
       rec["clientip"] AS domainClientIp 
     FROM @jsonify; 

OUTPUT @logSchema TO @out USING Outputters.Tsv(); 

回答

8

其实JSONExtractor支持JSONPath表示rowpath参数,让您识别JSON对象或要映射到行JSON数组项的能力。所以你可以从你的JSON文件中用一条语句提取你的数据:

@logSchema = 
    EXTRACT name string, id string, deviceName string, deviceModel string, clientip string 
    FROM @input 
    USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("results[*]"); 
0

萨拉特,

的问题是,您的@ TRAIL2输出是一个JSON数组“据我所知,JsonFunction无法解析[{...},{...}]。所以我将它输出到一个文件中,并用输入器重新读取它,它可以解析数组。

REFERENCE ASSEMBLY [Newtonsoft.Json]; 
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json"; 
DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv"; 
DECLARE @mid string="adl://xyz.azuredatalakestore.net/intermediate.txt"; 


@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor(); 

OUTPUT @trail2 TO @mid USING Outputters.Text(quoting:false); 

@jsonify=EXTRACT name string, 
       id string, 
       deviceName string , 
       deviceModel string, 
       clientip string 
FROM @mid USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor(); 

@logSchema=SELECT name AS sysName, 
       id AS sysId, 
       deviceName AS domainDeviceName, 
       deviceModel AS domainDeviceModel, 
       clientip AS domainClientIp 
     FROM @jsonify; 

OUTPUT @logSchema TO @out USING Outputters.Tsv(); 
+0

谢谢迈克尔,那解决了这个问题。 –

+0

无需中间文件(实际上需要您提交两个作业,因为脚本无法读取它创建的数据),您可以更高效地完成此操作。看到我的替代答案。 –