2014-11-04 33 views
1

我有一个如下所述的csv文件。使用csv格式的非结构化GPS数据包创建结构化配置单元表

VTS,51,0071,9739965515,NM,GP,INF01,V,19,072219,291014,0000.0000,N,00000.0000,E,07AE VTS,01,0097,9739965515,SP,GP,18,072253,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,169,B205 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072311,291014,0000.0000,N,00000.0000,E,C24E VTS,01,0097,9739965515,NM,GP,19,072311,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,171,B358 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072319,291014,0000.0000,N,00000.0000,E,012F VTS,51,0071,9739965515,NM,GP,INF01,V,19,072326,291014,0000.0000,N,00000.0000,E,B2E6 VTS,01,0097,9739965515,NM,GP,18,072326,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,173,EAA0 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072333,291014,0000.0000,N,00000.0000,E,9896 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072340,291014,0000.0000,N,00000.0000,E,9B23

这与字段被映射:

pkt_header,gprs_pkt_id,pkt_length,sim_no,msg_id,gprs_pkt,gsm_sig_strength,utc_time,pkt_validation,latitude,direction_n_s,longitude,direction_e_w,speed,track_angle,utc_date,fuel_adc_values,ignition,odometer_values,supply_int,battery_adc,pkt_id,check_sum

第二字段即gprs_pkt_id与值01示出了有效的数据包。我使用的情况是过滤csv数据只为有效的数据包,我使用正则表达式,但我无法获得整个数据。任何帮助将深表谢意。

使用的Hive查询如下所示。

CREATE EXTERNAL TABlE sky_track_testing1( pkt_header STRING, gprs_pkt_id STRING, pkt_length STRING, sim_no STRING, msg_id STRING, gprs_pkt STRING, gsm_sig_strength STRING, utc_time STRING, pkt_validation STRING, latitude STRING, direction_n_s STRING, longitude STRING, direction_e_w STRING, speed STRING, track_angle STRING, utc_date STRING, fuel_adc_values STRING, ignition STRING, odometer_values STRING, supply_int STRING, battery_adc STRING, pkt_id STRING, check_sum STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(VTS,01).*$" ) STORED AS TEXTFILE LOCATION '/user/root/sky_track';

这绝对是一个错误的查询。请帮帮我。

回答

1

我建议你使用Pig此:

a = load '/user/root/sky_track' as (pkt_header,gprs_pkt_id,pkt_length,sim_no,msg_id,gprs_pkt,gsm_sig_strength,utc_time,pkt_validation,latitude,direction_n_s,longitude,direction_e_w,speed,track_angle,utc_date,fuel_adc_values,ignition,odometer_values,supply_int,battery_adc,pkt_id,check_sum); 
b = filter a by gprs_pkt_id == '01'; 
store b into '/user/root/sky_track_valid'; 
+0

这猪脚本生成输出文件夹和文件,但没有写进去it..anyhow我得到了它在脚本稍作修改完成,它的工作... – 2014-11-05 05:52:45

+0

'L =加载“/用户/ skytrack /系统日志 '使用PigStorage(',“) 如( pkt_header:Chararray, gprs_pkt_id:INT, pkt_length:Chararray, sim_no:Chararray, MSG_ID:Chararray, gprs_pkt:Chararray, gsm_sig_strength:INT, UTC_TIME: Chararray, pkt_validation:Chararray, 纬度:DOUBLE, 个direction_n_s:Chararray, 经度:DOUBLE, direction_e_w:Chararray, 速度:DOUBLE, track_angle:DOUBLE, UTC_DATE:Chararray, fuel_adc_values:INT, 点火:INT, odometer_values:INT, supply_int:INT, battery_adc:INT, pkt_id:INT, check_sum:Chararray); VC = filter L by(gprs_pkt_id == 1); dump VC; STORE VC INTO'input/valid/packet123456';' – 2014-11-05 05:54:14

0

是,按照上面的回答猪会非常适合您的数据。你可以试试猪。如果您对配置单元感兴趣,请参阅下面的示例(数据集不需要正则表达式)。

hive> CREATE TABLE sky_track_testing1(
    > pkt_header STRING, 
    > gprs_pkt_id STRING, 
    > pkt_length STRING, 
    > sim_no STRING, 
    > msg_id STRING, 
    > gprs_pkt STRING, 
    > gsm_sig_strength STRING, 
    > utc_time STRING, 
    > pkt_validation STRING, 
    > latitude STRING, 
    > direction_n_s STRING, 
    > longitude STRING, 
    > direction_e_w STRING, 
    > speed STRING, 
    > track_angle STRING, 
    > utc_date STRING, 
    > fuel_adc_values STRING, 
    > ignition STRING, 
    > odometer_values STRING, 
    > supply_int STRING, 
    > battery_adc STRING, 
    > pkt_id STRING, 
    > check_sum STRING 
    >) 
    > ROW FORMAT 
    > DELIMITED FIELDS TERMINATED BY ',' 
    > LINES TERMINATED BY '\n' 
    > STORED AS TEXTFILE; 
OK 
Time taken: 0.1 seconds 

hive> select *from sky_track_testing1 where gprs_pkt_id='01'; 
OK 
VTS 01 0097 9739965515 SP GP 18 072253 V 0000.0000 N 00000.0000 E 0.0 0.0 291014 0000 00 4000 1999 169 B205 
VTS 01 0097 9739965515 NM GP 19 072311 V 0000.0000 N 00000.0000 E 0.0 0.0 291014 0000 00 4000 1999 171 B358 
VTS 01 0097 9739965515 NM GP 18 072326 V 0000.0000 N 00000.0000 E 0.0 0.0 291014 0000 00 4000 1999 173 EAA0 
Time taken: 14.328 seconds