2016-03-31 50 views
0

尝试在配置单元中导入data以下。配置单元查询csv文本分隔符问题

姓名,电话,地址

Arverne,(718) 634-4784,"*312 Beach 54 Street 
Arverne, NY 11692 
(40.59428994144626, -73.78442865540268)*" 

Astoria,(718) 278-2220,"*14 01 Astoria Boulevard 
Long Island City, NY 11102 
(40.77152402451418, -73.92643545073543)*" 

Auburndale,(718) 352-2027,"*25 55 Francis Lewis Boulevard 
Flushing, NY 11358 
(40.76035096822195, -73.79632645819947)*" 

但是地址不正确来临,从而损坏表数据 我想这个问题与(取默认\ N,因为地址是3-4终止线线)时,使得当我跑低于采样数据

a,b,"e,f" 

x,y,"l,m" 

下面查询

create table test(c1 string, c2 string, c3 string) 
row format serde 'com.bizo.hive.serde.csv.CSVSerde' 
with serdeproperties(
"separatorChar" = ","); 

其做工精细:

test.c1 test.c2 test.c3

a b c,d 

e f g,z 

如何做到这一点?

回答

0

这就是我已经制定出来的。

>>> CREATE TABLE Test(name string, phone string, address string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; 
>>> load data inpath 'file.csv' into table Test; 

>>> select name from hiveTest; 
+-------------+--+ 
| name  | 
+-------------+--+ 
| Arverne  | 
| Astoria  | 
| Auburndale | 
+-------------+--+ 
>>> select address from hiveTest; 
+--------------------------------------------+--+ 
|     address     | 
+--------------------------------------------+--+ 
| "312 Beach 54 Street Arverne    | 
| "14 01 Astoria Boulevard Long Island City | 
| "25 55 Francis Lewis Boulevard Flushing | 
+--------------------------------------------+--+ 

我想它有帮助。

+0

地址被截断。它假设为“312 Beach 54 Street Arverne,NY 11692(40.59428994144626,-73.78442865540268)” – sr7

+0

试试这个:create table my_table(name string,phone string,address string)row format serde'com.bizo.hive.serde。 )以serdeproperties(“separatorChar”=“\ t”,“quoteChar”=“'”,“escapeChar”=“\\”)存储为文本文件的“csv.CSVSerde”根据要求更改serdeproperties。 – srikanth

+0

已经尝试使用这些选项(“separatorChar”=“,”,“quoteChar”=“\”“,”escapeChar“=”\ n“)....再次不工作..你可以从这个实际的数据链接:https://nycopendata.socrata.com/Recreation/Queens-Library-Branches/kh3d-xhq7? – sr7