2016-03-29 27 views
1

两个查询时,我使用的是Postgres 9.4相结合的“IN”操作

select version(); 
                version              
--------------------------------------------------------------------------------------------------------------- 
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit 

我在我的数据库中有两列,一个整数和一个文本视图如何提高性能。

\d+ gff_attributes 
+----------------+---------+-------------+-----------+---------------+ 
| Column   | Type | Modifiers | Storage | Description | 
|----------------+---------+-------------+-----------+---------------| 
| seqfeature_id | integer |    | plain  |  <null> | 
| gff_attributes | text |    | extended |  <null> | 
+----------------+---------+-------------+-----------+---------------+ 
View definition: 
SELECT qv.seqfeature_id, 
    string_agg((t.name::text || '='::text) || qv.value, ';'::text 
     ORDER BY t.name) AS gff_attributes 
    FROM term t, 
    seqfeature_qualifier_value qv 
    WHERE qv.term_id = t.term_id 
    GROUP BY qv.seqfeature_id; 

两个表中seqfeature_qualifier_value(〜5500万行)和term(〜11000行)

\d+ seqfeature_qualifier_value 
       Table "public.seqfeature_qualifier_value" 
    Column  | Type |  Modifiers  | Storage | Description 
---------------+---------+--------------------+----------+------------- 
seqfeature_id | integer | not null   | plain | 
term_id  | integer | not null   | plain | 
rank   | integer | not null default 0 | plain | 
value   | text | not null   | extended | 
Indexes: 
    "seqfeature_qualifier_value_pkey" PRIMARY KEY, btree (seqfeature_id, term_id, rank) 
    "seqfeaturequal_sfid" btree (seqfeature_id) 
    "seqfeaturequal_trm" btree (term_id) 
    "seqfeaturequal_type_value" btree (term_id, value) 
Foreign-key constraints: 
    "fkseqfeature_featqual" FOREIGN KEY (seqfeature_id) REFERENCES seqfeature(seqfeature_id) ON DELETE CASCADE 
    "fkterm_featqual" FOREIGN KEY (term_id) REFERENCES term(term_id) 
Rules: 
    rule_seqfeature_qualifier_value_i AS 
    ON INSERT TO seqfeature_qualifier_value 
    WHERE ((SELECT seqfeature_qualifier_value.seqfeature_id 
      FROM seqfeature_qualifier_value 
      WHERE seqfeature_qualifier_value.seqfeature_id = new.seqfeature_id AND seqfeature_qualifier_value.term_id = new.term_id AND seqfeature_qualifier_value.rank = new.rank)) IS NOT NULL DO INSTEAD NOTHING 
Has OIDs: no 

\d+ term 
               Table "public.term" 
    Column |   Type   |      Modifiers      | Storage | Description 
-------------+------------------------+---------------------------------------------------+----------+------------- 
term_id  | integer    | not null default nextval('term_pk_seq'::regclass) | plain | 
name  | character varying(255) | not null           | extended | 
definition | text     |             | extended | 
identifier | character varying(40) |             | extended | 
is_obsolete | character(1)   |             | extended | 
ontology_id | integer    | not null           | plain | 
Indexes: 
    "term_pkey" PRIMARY KEY, btree (term_id) 
    "term_identifier_key" UNIQUE, btree (identifier) 
    "term_name_ontology_id_is_obsolete_key" UNIQUE, btree (name, ontology_id, is_obsolete) 
    "term_ont" btree (ontology_id) 
Foreign-key constraints: 
    "fkont_term" FOREIGN KEY (ontology_id) REFERENCES ontology(ontology_id) ON DELETE CASCADE 
Rules: 
    rule_term_i1 AS 
    ON INSERT TO term 
    WHERE ((SELECT term.term_id 
      FROM term 
      WHERE term.identifier::text = new.identifier::text)) IS NOT NULL DO INSTEAD NOTHING 
    rule_term_i2 AS 
    ON INSERT TO term 
    WHERE ((SELECT term.term_id 
      FROM term 
      WHERE term.name::text = new.name::text AND term.ontology_id = new.ontology_id AND term.is_obsolete = new.is_obsolete)) IS NOT NULL DO INSTEAD NOTHING 
Has OIDs: no 

现在,如果我要选择基于该seqfeature_id专栏中,我能行的子集结合数据使用明确的比较相当快得到结果:

explain (analyze, verbose) select * 
    from gff_attributes 
    where seqfeature_id = 3596159; 
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| QUERY PLAN                                        | 
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 
| GroupAggregate (cost=337.27..734.68 rows=1 width=24) (actual time=11.690..11.690 rows=1 loops=1)                   | 
| Output: qv.seqfeature_id, string_agg((((t.name)::text || '='::text) || qv.value), ';'::text ORDER BY t.name)               | 
| Group Key: qv.seqfeature_id                                    | 
| -> Hash Join (cost=337.27..733.56 rows=110 width=24) (actual time=11.600..11.628 rows=6 loops=1)                  | 
|   Output: t.name, qv.seqfeature_id, qv.value                              | 
|   Hash Cond: (qv.term_id = t.term_id)                                | 
|   -> Index Scan using seqfeaturequal_sfid on public.seqfeature_qualifier_value qv (cost=0.56..394.66 rows=110 width=17) (actual time=0.036..0.055 rows=6 loops=1) | 
|    Output: qv.seqfeature_id, qv.term_id, qv.rank, qv.value                          | 
|    Index Cond: (qv.seqfeature_id = 3596159)                             | 
|   -> Hash (cost=194.09..194.09 rows=11409 width=15) (actual time=11.539..11.539 rows=11413 loops=1)                | 
|    Output: t.name, t.term_id                                 | 
|    Buckets: 2048 Batches: 1 Memory Usage: 540kB                            | 
|    -> Seq Scan on public.term t (cost=0.00..194.09 rows=11409 width=15) (actual time=0.009..5.108 rows=11413 loops=1)          | 
|      Output: t.name, t.term_id                                | 
| Planning time: 0.455 ms                                     | 
| Execution time: 11.753 ms                                     | 
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 

但是,当它与一个查询返回很多使用IN操作的东西大大减缓(〜2分钟)

explain (analyse, verbose) 
    select * from gff_attributes 
    where seqfeature_id in (
     select seqfeature_id 
     from seqfeature_qualifier_value 
     where term_id = (select term_id 
      from term 
      where name = 'SRB_ortholog_id') 
     and value = '1') 
     ; 
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| QUERY PLAN                                         | 
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 
| Merge Join (cost=12911531.62..13619325.85 rows=251228 width=36) (actual time=121504.409..173449.696 rows=102 loops=1)              | 
| Output: qv.seqfeature_id, (string_agg((((t.name)::text || '='::text) || qv.value), ';'::text ORDER BY t.name))                | 
| Merge Cond: (qv.seqfeature_id = seqfeature_qualifier_value.seqfeature_id)                         | 
| InitPlan 1 (returns $0)                                      | 
|  -> Index Scan using term_name_ontology_id_is_obsolete_key on public.term (cost=0.29..8.30 rows=1 width=4) (actual time=0.036..0.037 rows=1 loops=1)      | 
|   Output: term.term_id                                    | 
|   Index Cond: ((term.name)::text = 'SRB_ortholog_id'::text)                           | 
| -> GroupAggregate (cost=12905524.15..13607037.46 rows=502457 width=24) (actual time=121295.372..172418.928 rows=3687424 loops=1)           | 
|   Output: qv.seqfeature_id, string_agg((((t.name)::text || '='::text) || qv.value), ';'::text ORDER BY t.name)               | 
|   Group Key: qv.seqfeature_id                                   | 
|   -> Sort (cost=12905524.15..13044570.67 rows=55618608 width=24) (actual time=121295.315..132671.659 rows=22189814 loops=1)           | 
|    Output: qv.seqfeature_id, t.name, qv.value                              | 
|    Sort Key: qv.seqfeature_id                                  | 
|    Sort Method: external merge Disk: 1639072kB                             | 
|    -> Hash Join (cost=336.70..2328594.94 rows=55618608 width=24) (actual time=13.358..41289.820 rows=55545757 loops=1)           | 
|      Output: qv.seqfeature_id, t.name, qv.value                            | 
|      Hash Cond: (qv.term_id = t.term_id)                              | 
|      -> Seq Scan on public.seqfeature_qualifier_value qv (cost=0.00..1215886.08 rows=55618608 width=17) (actual time=0.063..12230.988 rows=55545757 loops=1) | 
|       Output: qv.seqfeature_id, qv.term_id, qv.rank, qv.value                        | 
|      -> Hash (cost=194.09..194.09 rows=11409 width=15) (actual time=13.278..13.278 rows=11413 loops=1)              | 
|       Output: t.name, t.term_id                               | 
|       Buckets: 2048 Batches: 1 Memory Usage: 540kB                          | 
|       -> Seq Scan on public.term t (cost=0.00..194.09 rows=11409 width=15) (actual time=0.011..6.207 rows=11413 loops=1)        | 
|         Output: t.name, t.term_id                              | 
| -> Sort (cost=5999.16..5999.20 rows=14 width=4) (actual time=0.404..0.436 rows=102 loops=1)                    | 
|   Output: seqfeature_qualifier_value.seqfeature_id                              | 
|   Sort Key: seqfeature_qualifier_value.seqfeature_id                             | 
|   Sort Method: quicksort Memory: 29kB                                 | 
|   -> HashAggregate (cost=5998.76..5998.90 rows=14 width=4) (actual time=0.345..0.368 rows=102 loops=1)                | 
|    Output: seqfeature_qualifier_value.seqfeature_id                            | 
|    Group Key: seqfeature_qualifier_value.seqfeature_id                            | 
|    -> Bitmap Heap Scan on public.seqfeature_qualifier_value (cost=88.22..5994.94 rows=1527 width=4) (actual time=0.102..0.290 rows=102 loops=1)     | 
|      Output: seqfeature_qualifier_value.seqfeature_id, seqfeature_qualifier_value.term_id, seqfeature_qualifier_value.rank, seqfeature_qualifier_value.value | 
|      Recheck Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text))            | 
|      Heap Blocks: exact=102                                 | 
|      -> Bitmap Index Scan on seqfeaturequal_type_value (cost=0.00..87.83 rows=1527 width=0) (actual time=0.083..0.083 rows=102 loops=1)      | 
|       Index Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text))           | 
| Planning time: 1.010 ms                                      | 
| Execution time: 173942.270 ms                                     | 
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 

注意自身运行子查询时,它也快(< 1S),并返回102行

explain (analyse, verbose) 
    select seqfeature_id 
    from seqfeature_qualifier_value 
    where term_id = (select term_id 
     from term where name = 'SRB_ortholog_id' 
     ) 
    and value = '1' 
      ; 
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| QUERY PLAN                                    | 
|-----------------------------------------------------------------------------------------------------------------------------------------------------------| 
| Bitmap Heap Scan on public.seqfeature_qualifier_value (cost=96.52..6003.24 rows=1527 width=4) (actual time=0.104..0.319 rows=102 loops=1)    | 
| Output: seqfeature_qualifier_value.seqfeature_id                          | 
| Recheck Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text))           | 
| Heap Blocks: exact=102                                 | 
| InitPlan 1 (returns $0)                                 | 
|  -> Index Scan using term_name_ontology_id_is_obsolete_key on public.term (cost=0.29..8.30 rows=1 width=4) (actual time=0.035..0.037 rows=1 loops=1) | 
|   Output: term.term_id                               | 
|   Index Cond: ((term.name)::text = 'SRB_ortholog_id'::text)                      | 
| -> Bitmap Index Scan on seqfeaturequal_type_value (cost=0.00..87.83 rows=1527 width=0) (actual time=0.083..0.083 rows=102 loops=1)     | 
|   Index Cond: ((seqfeature_qualifier_value.term_id = $0) AND (seqfeature_qualifier_value.value = '1'::text))          | 
| Planning time: 0.215 ms                                 | 
| Execution time: 0.368 ms                                 | 
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ 

我非常困惑,为什么IN操作员为查询增加了很多时间?有没有一种方法可以重新编写此查询以提高性能?

+0

请编辑您的问题并将'explain(analyze,verbose)'的输出添加到您的问题中,而不是简单的'explain'输出 –

+0

我已经按照您的建议更新了代码块 – cts

+1

您忘记提供您的Postgres版本 - 除此之外。请考虑[[postgresql-performance \]]的标记信息中的说明(http://stackoverflow.com/tags/postgresql-performance/info)。我们需要确切的表定义(至少包括相关的列)和基础表'term'和'seqfeature_qualifier_value'的基数。 –

回答

0

这样的事情?

SELECT 
t1.* 
FROM 
gff_attributes t1 
INNER JOIN 
(
    SELECT DISTINCT t3.seqfeature_id 
    FROM seqfeature_qualifier_value t3 
    INNER JOIN term t4 on t3.term_id = t4.term_id AND t4.name = 'SRB_ortholog_id' 
    WHERE 
    t3.value = '1' 
) t2 ON t1.seqfeature_id = t2.seqfeature_id 
+0

可悲的是,这种方式没有更快 – cts

+0

您使用的是哪种类型的数据库?如果您使用的是Oracle,请将优化器提示/ * + MATERIALIZE * /添加到查询的子选择部分。 – Andy

+0

罢工 - 从你的语法我收集你正在使用PostgreSQL数据库。您可以按照子选择部分的链接中所述创建临时表。 http://stackoverflow.com/questions/15306199/materialize-common-table-expression-in-greenplum – Andy

0

如何:

select a.* 
from 
    gff_attributes as a 
    join 
    seqfeature_qualifier_value as b on 
     a.seqfeature_id = b.seqfeature_id 
     and 
     b.value = '1' 
    join 
    term as c on 
     b.term_id = c.term_id 
     and 
     c.name = 'SRB_ortholog_id'; 
0

一般来说,嵌套查询/子查询是昂贵的,无论你用IN,或加入,或存在。我已经在Transact-SQL中尝试了每种方法,并发现每个方案都有完全相同的执行计划,所以它们在性能方面是相同的,至少在T-SQL方面是如此。

标准解决方法是将您的第一个查询拉入临时表中,并向其添加索引(使用ALTER TABLE),然后对索引的临时表运行子查询。这将在大多数SQL中运行得更快。如果你想深入挖掘,谷歌“与子查询postgresql性能问题”。你会发现很多帖子试图解决同样的问题。

0

一:更换IN +标量子查询(东旭!)由EXISTS子句(和增加对精神理智一些别名):

SELECT * 
FROM gff_attributes ga 
WHERE EXISTS (SELECT 13 
    FROM seqfeature_qualifier_value sqv 
     JOIN term t ON t.term_id = sqv.term_id 
    WHERE ga.seqfeature_id = sqv.seqfeature_id 
    AND sqv.value = '1' 
    AND t.name = 'SRB_ortholog_id' 
); 

下一页:在脂肪结表(或:表),我建议用单个复合索引替换术语和特征的两个单列索引。这实际上是相反顺序的主要关键。 (BTW是rank领域确实需要强制唯一性?这是什么意思?)

DROP INDEX seqfeaturequal_sfid; -- (seqfeature_id) 
DROP INDEX seqfeaturequal_trm; -- (term_id) 
    -- WHAT is "rank" ? Why is it needed? 
CREATE UNIQUE INDEX seqfeaturequal_trm_sfid 
    ON seqfeature_qualifier_value (term_id,seqfeature_id,rank); 

,当然你应该alseo添加索引后运行ANALYZE seqfeature_qualifier_value;,刷新统计数据。

并且:您应该在term.name上添加一个UNIQUE约束;你可以在标量子查询中使用它,所以你认为它是唯一的。

+0

嗨,谢谢你有关索引的提示。由于在gff_attributes视图中的'value'列存在您的建议查询似乎不起作用。它所具有的列是“seqfeature_id”的整数和文本“gff_attributes”列,它将包含许多属性的串联。我可以像'%1%'那样做',但我认为这会很慢。 – cts

+0

这就是为什么我添加了关于添加表别名到您的查询的评论。在:'...和值='1')',我不得不猜测哪个RTE值应该来自。我大概猜错了...... – wildplasser

+0

好吧我修好了。它处于错误的查询级别。你仍然需要回答关于“排名”栏。 (我怀疑这可能是一个包含3个候选键的表) – wildplasser