hadoop hive count并发性

在配置单元中，我有一个有很多列的表，其中两个是begin_time，end_time。

我需要统计每次

一块表上的数字是这样的：

begin_time     end_time 
2011.04.26 10:19:06^A2011.04.26 10:20:22 
2011.04.26 10:19:08^A2011.04.26 10:21:49 
2011.04.26 10:19:08^A2011.04.26 11:18:46 
2011.04.26 10:19:09^A2011.04.26 12:08:36 
2011.04.26 10:19:09^A2011.04.26 11:00:16 
2011.04.26 10:19:11^A2011.04.26 10:19:17 
2011.04.26 10:19:12^A2011.04.26 10:46:21 
2011.04.26 10:19:13^A2011.04.26 10:55:43 
2011.04.26 10:19:17^A2011.04.26 10:19:41 
2011.04.26 10:19:18^A2011.04.26 10:34:41

结果我要的是有多少人是在一个特定的时间。

例如在2011.04.26 10:19:08，当然有3名游客在19:06有1人，19:08有2人。

和2011.04.26 10时19分十八秒为9，当然10但2011.04.26 10时19分17秒

为一块希望的结果一个休假

2011.04.26 10:19:06 1 
2011.04.26 10:19:08 3 
2011.04.26 10:19:09 5 
2011.04.26 10:19:11 6 
2011.04.26 10:19:12 7 
2011.04.26 10:19:13 8 
2011.04.26 10:19:17 9 
2011.04.26 10:19:18 9

任何帮助非常感谢和欢迎。

来源

2013-05-20 caning

显示您尝试过什么？ –

我在c中编写了一个程序来做到这一点。但应该使用hadoop做到这一点。 – caning

你可以试试这个在蜂巢（假设表名是test_log）：

select /*+ MAPJOIN(driven) */ driven.time, count(*)  
from   
    (select time 
    from 
    (select begin_time time from test_log union all 
     select end_time time from test_log) u 
    group by time) driven 
join test_log l on true 
where 
    driven.time between l.begin_time and l.end_time 
group by driven.time

可能不是最好的解决办法，但至少它的工作原理。您可以在驱动的子查询上添加一些过滤器来减少数据集。

来源

2013-05-22 07:47:27 teledi

解析错误：行10:16'期待EOF'时''之间的''''之间不匹配输入 – caning

它在hive 0.9上正常工作，您使用哪个版本来运行？ – teledi

版本配置单元0.8.1 – caning

hadoop hive count并发性

回答

相关问题