我有一个简单但很大的表“日志”,它有三列:user_id,day,hours。PostgreSQL 9.6在与时间戳列汇总期间选择错误的计划
user_id character varying(36) COLLATE pg_catalog."default" NOT NULL,
day timestamp without time zone,
hours double precision
所有列都有索引。
问题是,针对'day'字段的聚合工作非常缓慢。例如,简单的查询需要永久完成。
select min(day) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
分析表明,Postgres的做一个完整的扫描过滤条目不涉及user_id说明= 'ab056f5a-390B-41d7-ba56-897c14b679bf' 什么是绝对的反直觉
[
{
"Execution Time": 146502.05,
"Planning Time": 0.893,
"Plan": {
"Startup Cost": 789.02,
"Actual Rows": 1,
"Plans": [
{
"Startup Cost": 0.44,
"Actual Rows": 1,
"Plans": [
{
"Index Cond": "(log.day IS NOT NULL)",
"Startup Cost": 0.44,
"Scan Direction": "Forward",
"Plan Width": 8,
"Rows Removed by Index Recheck": 0,
"Actual Rows": 1,
"Node Type": "Index Scan",
"Total Cost": 1395792.54,
"Plan Rows": 1770,
"Relation Name": "log",
"Alias": "log",
"Parallel Aware": false,
"Actual Total Time": 146502.015,
"Output": [
"log.day"
],
"Parent Relationship": "Outer",
"Actual Startup Time": 146502.015,
"Schema": "public",
"Filter": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Actual Loops": 1,
"Rows Removed by Filter": 12665610,
"Index Name": "index_log_day"
}
],
"Node Type": "Limit",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 146502.016,
"Output": [
"log.day"
],
"Parent Relationship": "InitPlan",
"Actual Startup Time": 146502.016,
"Plan Width": 8,
"Subplan Name": "InitPlan 1 (returns $0)",
"Actual Loops": 1,
"Total Cost": 789.02
}
],
"Node Type": "Result",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 146502.019,
"Output": [
"$0"
],
"Actual Startup Time": 146502.019,
"Plan Width": 8,
"Actual Loops": 1,
"Total Cost": 789.03
},
"Triggers": []
}
]
更奇怪的是,几乎相似的查询完美无缺。
select min(hours) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
的Postgres选择为USER_ID = 'ab056f5a-390B-41d7-ba56-897c14b679bf' 第一项,然后其中的聚集是什么显然是正确。
[
{
"Execution Time": 5.989,
"Planning Time": 1.186,
"Plan": {
"Partial Mode": "Simple",
"Startup Cost": 6842.66,
"Actual Rows": 1,
"Plans": [
{
"Startup Cost": 66.28,
"Plan Width": 8,
"Rows Removed by Index Recheck": 0,
"Actual Rows": 745,
"Plans": [
{
"Startup Cost": 0,
"Plan Width": 0,
"Actual Rows": 745,
"Node Type": "Bitmap Index Scan",
"Index Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Plan Rows": 1770,
"Parallel Aware": false,
"Actual Total Time": 0.25,
"Parent Relationship": "Outer",
"Actual Startup Time": 0.25,
"Total Cost": 65.84,
"Actual Loops": 1,
"Index Name": "index_log_user_id"
}
],
"Recheck Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Exact Heap Blocks": 742,
"Node Type": "Bitmap Heap Scan",
"Plan Rows": 1770,
"Relation Name": "log",
"Alias": "log",
"Parallel Aware": false,
"Actual Total Time": 5.793,
"Output": [
"day",
"hours",
"user_id"
],
"Lossy Heap Blocks": 0,
"Parent Relationship": "Outer",
"Actual Startup Time": 0.357,
"Total Cost": 6838.23,
"Actual Loops": 1,
"Schema": "public"
}
],
"Node Type": "Aggregate",
"Strategy": "Plain",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 5.946,
"Output": [
"min(hours)"
],
"Actual Startup Time": 5.946,
"Plan Width": 8,
"Actual Loops": 1,
"Total Cost": 6842.67
},
"Triggers": []
}
]
有两个可能的变通:
1)重写查询到:
select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id
2)引入配对索引就像是在finding MAX(db_timestamp) query
建议他们可以很好看但我认为两种方式都是解决方法(第一种方法甚至是黑客)。从逻辑上讲,如果Postgres可以选择一个适合'小时'的计划,它必须在'一天'内完成,但事实并非如此。所以它看起来像是在时间戳字段集合中发生的Postgres错误,但是我承认我可能会错过某些东西。有人请告知,如果在不使用WAs的情况下可以完成某件事情,或者它确实是Postgres的错误,我必须报告它?
UPD:我已经报告过这是PostgreSQL错误邮件列表的错误。我会让每个人都知道它是否被接受。
您是否为表收集统计信息? – are
我有统计信息收集的默认设置,并认为它应该自动收集。所以我需要做一些明确的统计数据? –
BTW:{user_id,day}对我来说看起来像是一个候选键。 – wildplasser