2016-03-09 21 views
-2

我有一个字符串像下面蜂房正则表达式的工作问题

"ITheme:Sports,Genre:SportingEvent,Genre:Sports,Genre:Football,Genre:Pro,ITheme:Football" 

如果我用下面的查询只返回我的第一场比赛的流派即SportingEvents给出

select split(regexp_extract(coalesce('ITheme:Sports,Genre:SportingEvent,Genre:Sports,Genre:Football,Genre:Pro,ITheme:Football'), '(Genre:.[^,]+)', 0),':')[1] 

我想输出像

    Genre 
单独的列
 SportingEvent,Sports, Football 
+0

你看到这个漂亮的和(不幸的是不那么闪耀)编辑按钮?真的,你的问题很难阅读。 – Jan

+0

你得到了错误的流派是输出中的列名称 – Peter2711

回答

0

这里是回答上述问题,但对于大型数据集将需要大量的时间

select collect_set(myCol2) from ( select myCol1,regexp_extract(myCol1,'Genre:(.*)',1) as myCol2 from ( select split('ITheme:Sports,Genre:SportingEvent,Genre:Sports,Genre:Football,Genre:Pro,ITheme:Football',',') as a ) v1 LATERAL VIEW explode(v1.a) myTable1 AS myCol1 ) v2 where myCol2 != ''

0

这也将工作 select regexp_replace( regexp_replace( regexp_replace( regexp_replace('ITheme:Sports,Genre:SportingEvent,Genre:Sports,Genre:Football,Genre:Pro,ITheme:Football','((\\w*)(?<!ITheme):.[^,]*)','') ,'(^,)|(,$)','' ) , ',{2,}', ',' ) , 'Genre:','' ) as Genre