2017-06-13 107 views
1

我在HIVE(HDFS)中使用以下行并将Presto用作查询引擎。查找字符串中的所有匹配项 - Apache Presto

1,@markbutcher72 @charlottegloyn Not what Belinda Carlisle thought. And yes, she was singing about Edgbaston. 
2,@tomkingham @markbutcher72 @charlottegloyn It's true the garden of Eden is currently very green... 
3,@MrRhysBenjamin @gasuperspark1 @markbutcher72 Actually it's Springfield Park, the (occasional) home of the might 

要求是通过Presto Query获取以下内容。我们怎样才能得到这个请

1,markbutcher72 
1,charlottegloyn 
2,tomkingham 
2,markbutcher72 
2,charlottegloyn 
3,MrRhysBenjamin 
3,gasuperspark1 
3,markbutcher72 
+0

尚不清楚。它是一个单列的Hive表吗? 2列?更多?... –

+0

@DuduMarkovitz - 感谢您的回复。 配置单元表有2列。 ID和TEXT。理想情况下,我想迭代地执行一个字符串标记,当@出现时,直到SPACE。 我在看strpos(文本'@')。但是这只给出了'@'的第一次出现而不是迭代 –

回答

1
select t.id 
     ,u.token 

from mytable as t 
     cross join unnest (regexp_extract_all(text,'(?<[email protected])\S+')) as u(token) 
; 

+----+----------------+ 
| id |  token  | 
+----+----------------+ 
| 1 | markbutcher72 | 
| 1 | charlottegloyn | 
| 2 | tomkingham  | 
| 2 | markbutcher72 | 
| 2 | charlottegloyn | 
| 3 | MrRhysBenjamin | 
| 3 | gasuperspark1 | 
| 3 | markbutcher72 | 
+----+----------------+ 
+0

辉煌..感谢一吨。 –

相关问题