2017-05-12 148 views
1

因此,我第一次使用postgres,发现它运行速度很慢,并且通过查询来运行不同的分组,现在我正在试图找到最新的记录以及它是否工作。 这是第一个查询我想出了:DISTINCT与ORDER BY非常缓慢

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
     FROM call_logs c 
     ORDER BY c.device_id, c.timestamp desc 

和它的作品,但它是沿着时间运行服用。

Unique (cost=94840.24..97370.54 rows=11 width=17) (actual time=424.424..556.253 rows=13 loops=1) 
    -> Sort (cost=94840.24..96105.39 rows=506061 width=17) (actual time=424.423..531.905 rows=506061 loops=1) 
    Sort Key: device_id, "timestamp" DESC 
    Sort Method: external merge Disk: 13272kB 
    -> Seq Scan on call_logs c (cost=0.00..36512.61 rows=506061 width=17) (actual time=0.059..162.932 rows=506061 loops=1) 
Planning time: 0.152 ms 
Execution time: 557.957 ms 
(7 rows) 

我已经更新了查询中使用速度更快,但非常难看如下:

SELECT c.device_id, c.timestamp, c.working FROM call_logs c 
    INNER JOIN (SELECT c.device_id, MAX(c.timestamp) AS timestamp 
               FROM call_logs c 
               GROUP BY c.device_id) 
               newest on newest.timestamp = c.timestamp 

和分析:

Nested Loop (cost=39043.34..39136.08 rows=12 width=17) (actual time=216.406..216.580 rows=15 loops=1) 
    -> HashAggregate (cost=39042.91..39043.02 rows=11 width=16) (actual time=216.347..216.351 rows=13 loops=1) 
    Group Key: c_1.device_id 
    -> Seq Scan on call_logs c_1 (cost=0.00..36512.61 rows=506061 width=16) (actual time=0.026..125.482 rows=506061 loops=1) 
    -> Index Scan using call_logs_timestamp on call_logs c (cost=0.42..8.44 rows=1 width=17) (actual time=0.016..0.016 rows=1 loops=13) 
    Index Cond: ("timestamp" = (max(c_1."timestamp"))) 
Planning time: 0.318 ms 
Execution time: 216.631 ms 
(8 rows) 

即使是200ms的似乎有点慢我因为我想要的是每台设备的最高记录(这是在索引表中)

这是我的索引它使用:

CREATE INDEX call_logs_timestamp 
ON public.call_logs USING btree 
(timestamp) 
TABLESPACE pg_default; 

我曾尝试下面的指数,但不会在所有帮助:

CREATE INDEX dev_ts_1 
ON public.call_logs USING btree 
(device_id, timestamp DESC, working) 
TABLESPACE pg_default; 

任何想法,我失去了一些东西明显?

回答

1

200毫秒真的没有那么糟糕,通过500K行。但对于此查询:

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
FROM call_logs c 
ORDER BY c.device_id, c.timestamp desc 

那么你的索引call_logs(device_id, timestamp desc, working)应该是一个最佳索引。其他

两种方式编写查询的同一指标为:

select c.* 
from (select c.device_id, c.timestamp, c.working, c.*, 
      row_number() over (partition by device_id order by timestamp desc) as seqnum 
     from call_logs c 
    ) c 
where seqnum = 1; 

和:

select c.device_id, c.timestamp, c.working 
from call_logs c 
where not exists (select 1 
        from call_logs c2 
        where c2.device_id = c.device_id and 
         c2.timestamp > c.timestamp 
       ); 
+0

未使用的索引。但我不确定你的意思是一个最佳指数? – user1434177

+0

@ user1434177。 。 。最佳意味着这是查询的最佳索引。表中的统计数据可能不正确。 –

+0

谢谢我使用了VACUUM ANALYZE;现在需要74ms才能运行。 – user1434177