2015-06-20 165 views
0

我正在尝试做一个简单的蜂巢转换。蜂巢转换

Simple Hive Transformation

可有一个人给我提供一个方法来做到这一点?我试过collect_set,目前正在查看klout的开源UDF。

+0

可以有相同的单位以后例如ABC可以在日期时间8开始,然后在日期时间9开始。我们需要保持时间单位连续。 Fyi,一个简单的groupby会做出这个不正确的模式。 –

回答

0

使用min和max函数怎么样?我认为有以下会得到你所需要的:

SELECT 
    Unit, 
    MIN(datetime) as start, 
    MAX(datetime) as stop 
from table_name 
group by Unit 
; 
+0

感谢您的回复。这不是那么简单。想象一下不同的日期和时区。不止一次访问同一单位。 –

1

我觉得这给你想要的东西。我无法运行它并进行调试。祝你好运!

select start_point.unit 
    , start_time as start 
    , start_time + min(stop_time - start_time) as stop 
from 
    (select * from 
     (select date_time as start_time 
     , unit 
     , last_value(unit) over (order by date_time row desc between current row and 1 following) as previous_unit 
     from table 
    ) previous 
     where unit <> previous_unit 
) start_points 
left outer join 
    (select * from 
     (select date_time as stop_time 
     , unit 
     , last_value(unit) over (order by date_time row between current row and 1 following) as next_unit 
     from table 
    ) next 
     where unit <> next_unit 
) stop_points 
on start_points.unit = stop_points.unit 
where stop_time > start_time 
group by start_point.unit, start_time 
; 
+0

感谢您使用窗口函数的指针。不是一个确切的解决方案,而是正确的道路。 –

0

我发现了。感谢您的指针使用窗函数

select * 
from 
(select *, 
case when lag(unit,1) over (partition by id order by effective_time_ut desc) is NULL THEN 1 
when unit<>lag(unit,1) over (partition by id order by effective_time_ut desc) then 1 
when lead(unit,1) over (partition by id order by effective_time_ut desc) is NULL then 1 
else 0 end as different_loc 
from units_we_care) a 
where different_loc=1 
0
create table temptable as select unit, start_date, end_time, row_number() over() as row_num from (select unit, min(date_time) start_date, max(date_time) as end_time from table group by unit) a; 

select a.unit, a.start_date as start_date, nvl(b.start_date, a.end_time) end_time from temptable a left outer join temptable b on (a.row_num+1) = b.row_num;