我有一个如下所述的csv文件。使用csv格式的非结构化GPS数据包创建结构化配置单元表
VTS,51,0071,9739965515,NM,GP,INF01,V,19,072219,291014,0000.0000,N,00000.0000,E,07AE VTS,01,0097,9739965515,SP,GP,18,072253,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,169,B205 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072311,291014,0000.0000,N,00000.0000,E,C24E VTS,01,0097,9739965515,NM,GP,19,072311,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,171,B358 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072319,291014,0000.0000,N,00000.0000,E,012F VTS,51,0071,9739965515,NM,GP,INF01,V,19,072326,291014,0000.0000,N,00000.0000,E,B2E6 VTS,01,0097,9739965515,NM,GP,18,072326,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,173,EAA0 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072333,291014,0000.0000,N,00000.0000,E,9896 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072340,291014,0000.0000,N,00000.0000,E,9B23
这与字段被映射:
pkt_header,gprs_pkt_id,pkt_length,sim_no,msg_id,gprs_pkt,gsm_sig_strength,utc_time,pkt_validation,latitude,direction_n_s,longitude,direction_e_w,speed,track_angle,utc_date,fuel_adc_values,ignition,odometer_values,supply_int,battery_adc,pkt_id,check_sum
第二字段即gprs_pkt_id与值01示出了有效的数据包。我使用的情况是过滤csv数据只为有效的数据包,我使用正则表达式,但我无法获得整个数据。任何帮助将深表谢意。
使用的Hive查询如下所示。
CREATE EXTERNAL TABlE sky_track_testing1( pkt_header STRING, gprs_pkt_id STRING, pkt_length STRING, sim_no STRING, msg_id STRING, gprs_pkt STRING, gsm_sig_strength STRING, utc_time STRING, pkt_validation STRING, latitude STRING, direction_n_s STRING, longitude STRING, direction_e_w STRING, speed STRING, track_angle STRING, utc_date STRING, fuel_adc_values STRING, ignition STRING, odometer_values STRING, supply_int STRING, battery_adc STRING, pkt_id STRING, check_sum STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(VTS,01).*$" ) STORED AS TEXTFILE LOCATION '/user/root/sky_track';
这绝对是一个错误的查询。请帮帮我。
这猪脚本生成输出文件夹和文件,但没有写进去it..anyhow我得到了它在脚本稍作修改完成,它的工作... – 2014-11-05 05:52:45
'L =加载“/用户/ skytrack /系统日志 '使用PigStorage(',“) 如( pkt_header:Chararray, gprs_pkt_id:INT, pkt_length:Chararray, sim_no:Chararray, MSG_ID:Chararray, gprs_pkt:Chararray, gsm_sig_strength:INT, UTC_TIME: Chararray, pkt_validation:Chararray, 纬度:DOUBLE, 个direction_n_s:Chararray, 经度:DOUBLE, direction_e_w:Chararray, 速度:DOUBLE, track_angle:DOUBLE, UTC_DATE:Chararray, fuel_adc_values:INT, 点火:INT, odometer_values:INT, supply_int:INT, battery_adc:INT, pkt_id:INT, check_sum:Chararray); VC = filter L by(gprs_pkt_id == 1); dump VC; STORE VC INTO'input/valid/packet123456';' – 2014-11-05 05:54:14