DISTINCT与ORDER BY非常缓慢

因此，我第一次使用postgres，发现它运行速度很慢，并且通过查询来运行不同的分组，现在我正在试图找到最新的记录以及它是否工作。这是第一个查询我想出了：DISTINCT与ORDER BY非常缓慢

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
     FROM call_logs c 
     ORDER BY c.device_id, c.timestamp desc

和它的作品，但它是沿着时间运行服用。

Unique (cost=94840.24..97370.54 rows=11 width=17) (actual time=424.424..556.253 rows=13 loops=1) 
    -> Sort (cost=94840.24..96105.39 rows=506061 width=17) (actual time=424.423..531.905 rows=506061 loops=1) 
    Sort Key: device_id, "timestamp" DESC 
    Sort Method: external merge Disk: 13272kB 
    -> Seq Scan on call_logs c (cost=0.00..36512.61 rows=506061 width=17) (actual time=0.059..162.932 rows=506061 loops=1) 
Planning time: 0.152 ms 
Execution time: 557.957 ms 
(7 rows)

我已经更新了查询中使用速度更快，但非常难看如下：

SELECT c.device_id, c.timestamp, c.working FROM call_logs c 
    INNER JOIN (SELECT c.device_id, MAX(c.timestamp) AS timestamp 
               FROM call_logs c 
               GROUP BY c.device_id) 
               newest on newest.timestamp = c.timestamp

和分析：

Nested Loop (cost=39043.34..39136.08 rows=12 width=17) (actual time=216.406..216.580 rows=15 loops=1) 
    -> HashAggregate (cost=39042.91..39043.02 rows=11 width=16) (actual time=216.347..216.351 rows=13 loops=1) 
    Group Key: c_1.device_id 
    -> Seq Scan on call_logs c_1 (cost=0.00..36512.61 rows=506061 width=16) (actual time=0.026..125.482 rows=506061 loops=1) 
    -> Index Scan using call_logs_timestamp on call_logs c (cost=0.42..8.44 rows=1 width=17) (actual time=0.016..0.016 rows=1 loops=13) 
    Index Cond: ("timestamp" = (max(c_1."timestamp"))) 
Planning time: 0.318 ms 
Execution time: 216.631 ms 
(8 rows)

即使是200ms的似乎有点慢我因为我想要的是每台设备的最高记录（这是在索引表中）

这是我的索引它使用：

CREATE INDEX call_logs_timestamp 
ON public.call_logs USING btree 
(timestamp) 
TABLESPACE pg_default;

我曾尝试下面的指数，但不会在所有帮助：

CREATE INDEX dev_ts_1 
ON public.call_logs USING btree 
(device_id, timestamp DESC, working) 
TABLESPACE pg_default;

任何想法，我失去了一些东西明显？

来源

2017-05-12 user1434177

200毫秒真的没有那么糟糕，通过500K行。但对于此查询：

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
FROM call_logs c 
ORDER BY c.device_id, c.timestamp desc

那么你的索引call_logs(device_id, timestamp desc, working)应该是一个最佳索引。其他

两种方式编写查询的同一指标为：

select c.* 
from (select c.device_id, c.timestamp, c.working, c.*, 
      row_number() over (partition by device_id order by timestamp desc) as seqnum 
     from call_logs c 
    ) c 
where seqnum = 1;

和：

select c.device_id, c.timestamp, c.working 
from call_logs c 
where not exists (select 1 
        from call_logs c2 
        where c2.device_id = c.device_id and 
         c2.timestamp > c.timestamp 
       );

来源

2017-05-12 01:53:08

未使用的索引。但我不确定你的意思是一个最佳指数？ – user1434177

@ user1434177。。。最佳意味着这是查询的最佳索引。表中的统计数据可能不正确。 –

谢谢我使用了VACUUM ANALYZE;现在需要74ms才能运行。 – user1434177

DISTINCT与ORDER BY非常缓慢

回答

相关问题