2015-03-25 61 views
0

我有以下架构HIVE表像这样的数组:蜂巢:加入到字符串列

hive>desc books; 
gen_id     int           
author     array<string>        
rating     double        
genres     array<string> 

hive>select * from books; 

| gen_id   | rating | author   |genres 
+----------------+-------------+---------------+---------- 
| 1    | 10  | ["A","B"]  | ["X","Y"] 
| 2    | 20  | ["C","A"]  | ["Z","X"] 
| 3    | 30  | ["D"]   | ["X"] 

是否有一个地方,我可以执行一些SELECT操作查询并返回单个行,像这样:

| gen_id  | rating  | JoinData 
+-------------+---------------+------------- 
| 1   | 10   | ["A","B","X","Y"] 
| 2   | 20   | ["C","A","Z","X"] 
| 3   | 30   | ["D","X"] 
| 1   | 10   | "Y" 

有人可以指导我怎么能得到这个结果。预先感谢任何帮助。

回答

1

答案就在这个帖子:
[1]:http://stackoverflow.com/questions/21578477/array-intersect-hive

对于人来说,不希望进入线程:

1)使用UDF创建临时函数 CREATE TEMPORARY FUNCTION结合AS'brickhouse.udf.collect.CombineUDF';

2)做一个select语句

select gen_id 
    , rating 
    , combine(author, genres) as JoinData 
from books