select *
from records
where id in (select max(id) from records group by option_id)
即使在数百万行上,该查询也能正常工作。但是你可以从解释语句的结果可以看出:优化分组最大查询
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=30218.84..31781.62 rows=620158 width=44) (actual time=1439.251..1443.458 rows=1057 loops=1)
-> HashAggregate (cost=30218.41..30220.41 rows=200 width=4) (actual time=1439.203..1439.503 rows=1057 loops=1)
-> HashAggregate (cost=30196.72..30206.36 rows=964 width=8) (actual time=1438.523..1438.807 rows=1057 loops=1)
-> Seq Scan on records records_1 (cost=0.00..23995.15 rows=1240315 width=8) (actual time=0.103..527.914 rows=1240315 loops=1)
-> Index Scan using records_pkey on records (cost=0.43..7.80 rows=1 width=44) (actual time=0.002..0.003 rows=1 loops=1057)
Index Cond: (id = (max(records_1.id)))
Total runtime: 1443.752 ms
(cost=0.00..23995.15 rows=1240315 width=8)
< - 这说,这是扫描所有行,这是明显的低效。
我也试过重新排序查询:
select r.* from records r
inner join (select max(id) id from records group by option_id) r2 on r2.id= r.id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=30197.15..37741.04 rows=964 width=44) (actual time=835.519..840.452 rows=1057 loops=1)
-> HashAggregate (cost=30196.72..30206.36 rows=964 width=8) (actual time=835.471..835.836 rows=1057 loops=1)
-> Seq Scan on records (cost=0.00..23995.15 rows=1240315 width=8) (actual time=0.336..348.495 rows=1240315 loops=1)
-> Index Scan using records_pkey on records r (cost=0.43..7.80 rows=1 width=44) (actual time=0.003..0.003 rows=1 loops=1057)
Index Cond: (id = (max(records.id)))
Total runtime: 840.809 ms
(cost=0.00..23995.15 rows=1240315 width=8)
< - 仍在扫描所有行。
我试过并没有索引(option_id)
,(option_id, id)
(option_id, id desc)
,他们没有任何影响查询计划。
有没有在Postgres中执行群组最大查询而不扫描所有行的方法?
我正在寻找,以编程方式,它是一个索引,它存储每个option_id
插入到记录表时的最大ID。这样,当我查询option_ids的最大值时,我应该只需要扫描索引记录多次,因为有不同的option_id。
我见过select distinct on
来自高层次的用户回答所有问题(感谢@Clodoaldo Neto给我关键字搜索)。这是为什么它不起作用:
create index index_name on records(option_id, id desc)
select distinct on (option_id) *
from records
order by option_id, id desc
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.43..76053.10 rows=964 width=44) (actual time=0.049..1668.545 rows=1056 loops=1)
-> Index Scan using records_option_id_id_idx on records (cost=0.43..73337.25 rows=1086342 width=44) (actual time=0.046..1368.300 rows=1086342 loops=1)
Total runtime: 1668.817 ms
这很好,它使用索引。然而,使用索引来扫描所有ID并没有什么意义。根据我的处决,它实际上比简单的顺序扫描慢。
有趣的是,MySQL的5.5能够优化上records(option_id, id)
mysql> select count(1) from records;
+----------+
| count(1) |
+----------+
| 1086342 |
+----------+
1 row in set (0.00 sec)
mysql> explain extended select * from records
inner join (select max(id) max_id from records group by option_id) mr
on mr.max_id= records.id;
+------+----------+--------------------------+
| rows | filtered | Extra |
+------+----------+--------------------------+
| 1056 | 100.00 | |
| 1 | 100.00 | |
| 201 | 100.00 | Using index for group-by |
+------+----------+--------------------------+
3 rows in set, 1 warning (0.02 sec)
“不过使用索引扫描所有的行并没有真正多大感觉“---它的确如此。索引比整个数据集小,它们在缓存中的机会更大。它不会扫描实际的行,但索引。 – zerkms
创建索引的* original *查询的计划是什么? – zerkms
@zerkms索引option_id没有区别(正如我在问题中所述)索引option_id_id_desc或option_id_id在查询计划中也没有区别。 – nurettin