如何在没有通过字面值选择分区的情况下获得好的查询计划？

我有一个大表，foos，由foo_type分区。下面产生了良好的查询计划（选择只有一个分区）：如何在没有通过字面值选择分区的情况下获得好的查询计划？

select count(*) from foos where foo_type=1;

但是，如果我尝试字面“1”更改为（当量）的子查询 - 我结束了与扫描每个分区计划 -

select count(*) from foos where foo_type=(select min(foo_type) from favorite_foo_types);

如何编写在where子句中使用子查询的查询，并且最终不会扫描每个分区？

来源

2017-01-10 gcbenison

你可以用解释分析运行第二个查询，并在问题中包含该查询吗？ –

你没有提供代码，所以没有人回答这个问题。简短的回答，动态分区消除在Greenplum中起作用，但解释不同于提供字面值的计划。

示例：

首先创建您的favorite_foo_types表。

create table public.favorite_foo_types 
(id int, foo_type int) 
distributed by (id); 

insert into public.favorite_foo_types 
values (1, 1), (2,2), (3,3), (4,4), (5,5); 

analyze public.favorite_foo_types;

接下来，创建您的分区表。

create table public.foos 
(id int, foo_type int) 
distributed by (id) 
partition by list (foo_type) 
(
partition foo_1 values (1), 
partition foo_2 values (2), 
partition foo_3 values (3), 
partition foo_4 values (4), 
partition foo_5 values (5) 
); 

insert into public.foos 
select i as id, case when i between 1 and 1999 then 1 
when i between 2000 and 3999 then 2 
when i between 4000 and 5999 then 3 
when i between 6000 and 7999 then 4 
when i between 8000 and 9999 then 5 end as foo_type 
from generate_series(1,9999) as i; 

analyze public.foos;

以下是使用文字值时的计划。你可以看到它也只选择一个分区。

explain analyze 
select count(*) 
from public.foos 
where foo_type = 1; 

Aggregate (cost=0.00..431.07 rows=1 width=8) 
    Rows out: 1 rows with 0.722 ms to first row, 0.723 ms to end, start offset by 0.298 ms. 
    -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.07 rows=1 width=8) 
    Rows out: 2 rows at destination with 0.717 ms to first row, 0.718 ms to end, start offset by 0.299 ms. 
    -> Aggregate (cost=0.00..431.07 rows=1 width=8) 
      Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.287 ms to end, start offset by 0.663 ms. 
      -> Sequence (cost=0.00..431.07 rows=1000 width=4) 
      Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 0.036 ms to first row, 0.215 ms to end, start offset by 0.663 ms. 
      -> Partition Selector for foos (dynamic scan id: 1) (cost=10.00..100.00 rows=50 width=4) 
       Filter: foo_type = 1 
       Partitions selected: 1 (out of 5) 
       Rows out: 0 rows (seg0) with 0.004 ms to end, start offset by 0.663 ms. 
      -> Dynamic Table Scan on foos (dynamic scan id: 1) (cost=0.00..431.07 rows=1000 width=4) 
       Filter: foo_type = 1 
       Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 0.032 ms to first row, 0.140 ms to end, start offset by 0.667 ms. 
       Partitions scanned: Avg 1.0 (out of 5) x 2 workers. Max 1 parts (seg0). 
Slice statistics: 
    (slice0) Executor memory: 408K bytes. 
    (slice1) Executor memory: 195K bytes avg x 2 workers, 195K bytes max (seg0). 
Statement statistics: 
    Memory used: 128000K bytes 
Settings: optimizer=on 
Optimizer status: PQO version 1.650 
Total runtime: 1.162 ms

现在，您的查询：查询计划

explain analyze 
select count(*) 
from public.foos 
where foo_type=(select min(foo_type) from public.favorite_foo_types); 

Aggregate (cost=0.00..863.04 rows=1 width=8) 
    Rows out: 1 rows with 6.466 ms to end, start offset by 24 ms. 
    -> Gather Motion 2:1 (slice3; segments: 2) (cost=0.00..863.04 rows=1 width=8) 
    Rows out: 2 rows at destination with 5.415 ms to first row, 6.459 ms to end, start offset by 24 ms. 
    -> Aggregate (cost=0.00..863.04 rows=1 width=8) 
      Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 4.514 ms to end, start offset by 24 ms. 
      -> Hash Join (cost=0.00..863.04 rows=5000 width=1) 
      Hash Cond: foos.foo_type = inner.min 
      Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 3.464 ms to first row, 4.441 ms to end, start offset by 24 ms. 
      Executor memory: 1K bytes avg, 1K bytes max (seg0). 
      Work_mem used: 1K bytes avg, 1K bytes max (seg0). Workfile: (0 spilling, 0 reused) 
      (seg0) Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets. 
      -> Dynamic Table Scan on foos (dynamic scan id: 1) (cost=0.00..431.10 rows=5000 width=4) 
       Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 0.382 ms to first row, 0.478 ms to end, start offset by 27 ms. 
       Partitions scanned: Avg 1.0 (out of 5) x 2 workers. Max 1 parts (seg0). 
      -> Hash (cost=100.00..100.00 rows=50 width=4) 
       Rows in: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.197 ms to end, start offset by 27 ms. 
       -> Partition Selector for foos (dynamic scan id: 1) (cost=10.00..100.00 rows=50 width=4) 
       Filter: foos.id = min 
       Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.189 ms to first row, 0.190 ms to end, start offset by 27 ms. 
       -> Broadcast Motion 1:2 (slice2) (cost=0.00..431.00 rows=2 width=4) 
         Rows out: Avg 1.0 rows x 2 workers at destination. Max 1 rows (seg0) with 0.015 ms to end, start offset by 27 ms. 
         -> Aggregate (cost=0.00..431.00 rows=1 width=4) 
         Rows out: 1 rows with 0.020 ms to end, start offset by 26 ms. 
         -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=4) 
          Rows out: 2 rows at destination with 0.009 ms to first row, 0.010 ms to end, start offset by 26 ms. 
          -> Aggregate (cost=0.00..431.00 rows=1 width=4) 
          Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.079 ms to end, start offset by 25 ms. 
          -> Table Scan on favorite_foo_types (cost=0.00..431.00 rows=3 width=4) 
            Rows out: Avg 2.5 rows x 2 workers. Max 3 rows (seg0) with 0.065 ms to first row, 0.067 ms to end, start offset by 25 ms. 
Slice statistics: 
    (slice0) Executor memory: 414K bytes. 
    (slice1) Executor memory: 245K bytes avg x 2 workers, 245K bytes max (seg0). 
    (slice2) Executor memory: 253K bytes (entry db). 
    (slice3) Executor memory: 8493K bytes avg x 2 workers, 8493K bytes max (seg0). Work_mem: 1K bytes max. 
Statement statistics: 
    Memory used: 128000K bytes 
Settings: optimizer=on 
Optimizer status: PQO version 1.650 
Total runtime: 30.161 ms

注意，它有 “上FOOS动态表扫描”，然后在下面的是，“分区扫描：平均1.0（满分5分） ”。这意味着，它动态地消除了4个分区，只扫描了1个分区。

greenplum.org上还有一个图形计划检查器，可以帮助您阅读计划。

来源

2017-01-12 16:49:38

感谢您的解释！我试图重现这些结果，最终得出的查询计划似乎仍然选择了所有分区 - http://pastebin.com/yUmnS844也许不同的行为是由于优化器版本的差异造成的？ – gcbenison

你分析过表和根分区吗？它也可能是一个数据类型问题。你可以提供表的DDL和你正在执行的确切的SQL吗？ –

如何在没有通过字面值选择分区的情况下获得好的查询计划？

回答

相关问题