如何优化配置单元中的非平等加入？

我有两张表，一张是a（1000行），另一张是b（7000万行）。如何优化配置单元中的非平等加入？

有两个场starttime，在表bendtime在表a和一个场time。

我用mapjoin查询：

select /*+ MAPJOIN(a) */ a.starttime,a.endtime, b.time 
from a join b 
where b.time between a.starttime and a.endtime;

，但执行的速度非常非常缓慢。 mapreduce工作始终保持在0％。

你有另一种优化方法吗？

来源

2016-07-05 Guo

一种方法是将a扩大为每天都有一行。

另一种方法是使用交错技术。这假设a确实划分时间，所以没有重叠或间隙。而且，b有一个主键。

所以，在b每个id就可以得到相应的起始时间a：

select id, time, a.starttime, a.endtime 
from (select id, time, max(starttime) over (order by time, priority) as a_starttime 
     from ((select b.id, b.time, null as starttime, 2j as priority from b) union all 
      (select null, a.starttime, a.starttime, 1 as priority from a) 
      ) ab 
    ) ab join 
    a 
    on ab.a_starttime = a.starttime;

注：该技术工作

select id, time, max(starttime) over (order by time, priority) as a_starttime 
from ((select b.id, b.time, null as starttime, 2j as priority from b) union all 
     (select null, a.starttime, a.starttime, 1 as priority from a) 
    ) ab;

然后你就可以用等值连接使用以及其他数据库。我没有机会在Hive上试用它。

来源

2016-07-05 11:00:19

谢谢你的回复！实际上，在两张表中有很多字段，使用交错技术看起来很麻烦和不方便，是不是？这种情况有另一种方法吗？ – Guo

@郭。。。不是我可以在Hive中想到的。 –

如何优化配置单元中的非平等加入？

回答

相关问题