2011-01-28 72 views
1

我可以优化这个查询,还是修改表结构以缩短执行时间?我不太了解EXPLAIN的输出。我错过了一些索引?优化我的postgres查询

EXPLAIN SELECT COUNT(*) AS count, 
      q.query_str 
     FROM click_fact cf, 
      query q, 
      date_dim dd, 
      queries_p_day_mv qpd 
     WHERE dd.date_dim_id = qpd.date_dim_id 
     AND qpd.query_id = q.query_id 
     AND type = 'S' 
     AND cf.query_id = q.query_id *emphasized text* 
     AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28' 
     AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv') 
     GROUP BY q.query_str   
     ORDER BY count DESC; 
                  QUERY PLAN                
------------------------------------------------------------------------------------------------------------------------------------- 
Sort (cost=19170.15..19188.80 rows=7460 width=12) 
    Sort Key: (count(*)) 
    -> HashAggregate (cost=18597.03..18690.28 rows=7460 width=12) 
     -> Nested Loop (cost=10.20..18559.73 rows=7460 width=12) 
       -> Nested Loop (cost=10.20..14975.36 rows=2452 width=20) 
        Join Filter: (qpd.interface_id = interface.interface_id) 
        -> Unique (cost=1.03..1.04 rows=1 width=4) 
          -> Sort (cost=1.03..1.04 rows=1 width=4) 
           Sort Key: interface.interface_id 
           -> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4) 
             Filter: (lang = 'sv'::text) 
        -> Nested Loop (cost=9.16..14943.65 rows=2452 width=24) 
          -> Hash Join (cost=9.16..14133.58 rows=2452 width=8) 
           Hash Cond: (qpd.date_dim_id = dd.date_dim_id) 
           -> Seq Scan on queries_p_day_mv qpd (cost=0.00..11471.93 rows=700793 width=12) 
           -> Hash (cost=8.81..8.81 rows=28 width=4) 
             -> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4) 
              Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date)) 
          -> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16) 
           Index Cond: (q.query_id = qpd.query_id) 
       -> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4) 
        Index Cond: (cf.query_id = qpd.query_id) 
        Filter: (cf.type = 'S'::bpchar) 

与更新EXPLAIN分析一下:

EXPLAIN ANALYZE SELECT COUNT(*) AS count, 
      q.query_str 
     FROM click_fact cf, 
      query q, 
      date_dim dd, 
      queries_p_day_mv qpd 
     WHERE dd.date_dim_id = qpd.date_dim_id 
     AND qpd.query_id = q.query_id 
     AND type = 'S' 
     AND cf.query_id = q.query_id 
     AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28' 
     AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv') 
     GROUP BY q.query_str 
     ORDER BY count DESC; 
                        QUERY PLAN                      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Sort (cost=19201.06..19220.52 rows=7784 width=12) (actual time=51017.162..51046.102 rows=17586 loops=1) 
    Sort Key: (count(*)) 
    Sort Method: external merge Disk: 632kB 
    -> HashAggregate (cost=18600.67..18697.97 rows=7784 width=12) (actual time=50935.411..50968.678 rows=17586 loops=1) 
     -> Nested Loop (cost=10.20..18561.75 rows=7784 width=12) (actual time=42.079..43666.404 rows=3868592 loops=1) 
       -> Nested Loop (cost=10.20..14975.91 rows=2453 width=20) (actual time=23.678..14609.282 rows=700803 loops=1) 
        Join Filter: (qpd.interface_id = interface.interface_id) 
        -> Unique (cost=1.03..1.04 rows=1 width=4) (actual time=0.104..0.110 rows=1 loops=1) 
          -> Sort (cost=1.03..1.04 rows=1 width=4) (actual time=0.100..0.102 rows=1 loops=1) 
           Sort Key: interface.interface_id 
           Sort Method: quicksort Memory: 25kB 
           -> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4) (actual time=0.038..0.041 rows=1 loops=1) 
             Filter: (lang = 'sv'::text) 
        -> Nested Loop (cost=9.16..14944.20 rows=2453 width=24) (actual time=23.550..12553.786 rows=700808 loops=1) 
          -> Hash Join (cost=9.16..14133.80 rows=2453 width=8) (actual time=18.283..3885.700 rows=700808 loops=1) 
           Hash Cond: (qpd.date_dim_id = dd.date_dim_id) 
           -> Seq Scan on queries_p_day_mv qpd (cost=0.00..11472.08 rows=700808 width=12) (actual time=0.014..1587.106 rows=700808 loops=1) 
           -> Hash (cost=8.81..8.81 rows=28 width=4) (actual time=18.221..18.221 rows=31 loops=1) 
             -> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4) (actual time=14.388..18.152 rows=31 loops=1) 
              Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date)) 
          -> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=700808) 
           Index Cond: (q.query_id = qpd.query_id) 
       -> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4) (actual time=0.005..0.022 rows=6 loops=700803) 
        Index Cond: (cf.query_id = qpd.query_id) 
        Filter: (cf.type = 'S'::bpchar) 
+0

我建议使用JOIN语法而不是sql92样式,然后看看你是否得到相同的计划。我知道它不应该发生,但我曾经看到有时这两种风格之间的速度发生了很大的变化 - 也许是因为您的意图更清晰可以帮助查询优化器呢? – iain 2011-01-28 15:14:35

+1

@iain:你可以发表一个样例查询,它在改写`ANSI`风格时在`PostgreSQL`中改变它的计划吗? – Quassnoi 2011-01-28 15:22:04

回答

1

您可以尝试以消除子查询:

SELECT COUNT(*) AS count, 
     q.query_str 
    FROM click_fact cf, 
     query q, 
     date_dim dd, 
     queries_p_day_mv qpd 
    WHERE dd.date_dim_id = qpd.date_dim_id 
    AND qpd.query_id = q.query_id 
    AND type = 'S' 
    AND cf.query_id = q.query_id 
    AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28' 
    AND qpd.interface_id = interface.interface_id 
    AND interface.lang = 'sv' 
    GROUP BY q.query_str   
    ORDER BY count DESC; 

另外,如果接口表是很大的,对郎创建ingex可能会有帮助。 day_dim_id上的queries_p_day_mv中的索引也可能有帮助。

通常,首先要尝试的是查找Seq Scans并尝试通过创建索引使它们进行索引扫描。

HTH

1
SELECT COUNT(*) AS count, 
     q.query_str 
FROM date_dim dd 
JOIN queries_p_date_mv qpd 
ON  qpd.date_dim_id = dd.date_dim_id 
     AND qpd.interface_id IN 
     (
     SELECT interface_id 
     FROM interface 
     WHERE lang = 'sv' 
     ) 
JOIN query q 
ON  q.query_id = qpd.query_id 
JOIN click_fact cf 
ON  cf.query_id = q.query_id 
     AND cf.type = 'S' 
WHERE dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28' 
GROUP BY 
     q.query_str   
ORDER BY 
     count DESC 

创建以下指标(除现有的):

queries_p_date_mv (interface_id, date_dim_id) 
interface (lang) 
click_fact (query_id, type) 

能否请您发表您的表的定义是什么?