为大型Postgresql表优化嵌套连接窗口函数

我一直在对大小为56GB的表（789700760行）运行以下查询，并且在执行时间内遇到瓶颈。从我之前的一些例子中我可以看出，可能有一种方法可以'嵌套'INNER JOIN，以便查询对大型数据集执行更好。特别是下面的查询花了7.651小时完成MPP PostgreSQL部署的执行。为大型Postgresql表优化嵌套连接窗口函数

create table large_table as 
select column1, column2, column3, column4, column5, column6 
from 
(
    select 
    a.column1, a.column2, a.start_time, 
    rank() OVER( 
     PARTITION BY a.column2, a.column1 order by a.start_time DESC 
    ) as rank, 
    last_value(a.column3) OVER (
     PARTITION BY a.column2, a.column1 order by a.start_time ASC 
     RANGE BETWEEN unbounded preceding and unbounded following 
    ) as column3, 
    a.column4, a.column5, a.column6 
    from 
    (table2 s 
     INNER JOIN table3 t 
     ON s.column2=t.column2 and s.event_time > t.start_time 
    ) a 
) b 
where rank =1;

Question 1: Is there a way to modify the above sql code to speed up the overall execution time of the query?

来源

2012-07-11 user7980

如果rank为每个column2，column1组合仅返回一行，则last_value（）似乎是多余的。你期待多行吗？否则，rank = 1的column3中的值应与计算值相同。 – 2012-07-11 18:45:30

您可以将LAST_VALUE移动到外的子查询，这可能会买你的表现有所改善。该LAST_VALUE是越来越值栏3的每个地方，开始时间为最小的分区 - 这正是在秩= 1：

select column1, column2, 
     ast_value(a.column3) OVER (PARTITION BY column2, column1 order by start_time ASC 
            RANGE BETWEEN unbounded preceding and unbounded following 
           ) as column3, 
     column4, column5, column6 
from (select a.column1, a.column2, a.start_time, 
      rank() OVER (PARTITION BY a.column2, a.column1 order by a.start_time DESC 
         ) as rank, 
      a.column3, a.column4, a.column5, a.column6 
     from (table2 s INNER JOIN 
      table3 t 
      ON s.column2 = t.column2 and s.event_time > t.start_time 
      ) a 
    ) b 
where rank = 1

否则，你需要给在执行计划和表2和表3，以了解更多信息获得更多帮助。

来源

2012-07-11 18:51:30

感谢您的帮助我正在测试更新查询的时间，但是当我使用last_value（a.column3）时，遇到了一个小问题，给出的错误是ERROR：缺少表“a”的FROM-clause条目。我用last_value（column3）取代了这个命令，这是否仍然有效？ – user7980 2012-07-11 23:30:05

为大型Postgresql表优化嵌套连接窗口函数

回答

相关问题