2016-11-10 39 views
1

我有以下存储过程来生成动态查询。硬编码函数参数产生5倍加速

给定一个条件/过滤器列表,它找到属于给定App的所有Visitorsapp_id作为参数传入。

如果我使用应用程序标识调用函数,并在动态查询中使用此参数,它将在200ms左右运行。

但是,如果我硬编码app_id,它运行在< 20ms。

这里我如何调用该过程

SELECT id 
FROM find_matching_visitors('my_app_id', '{}', '{(field = ''app_name'' and string_value ILIKE ''My awesome app'' )}') 

任何想法,想法,为什么一个例子吗?

CREATE OR REPLACE FUNCTION find_matching_visitors(app_id text, default_filters text[], custom_filters text[]) 
    RETURNS TABLE (
     id varchar 
    ) AS 
    $body$ 
    DECLARE 
     default_filterstring text; 
     custom_filterstring text; 
     default_filter_length integer; 
     custom_filter_length integer; 
     sql VARCHAR; 
    BEGIN 
     default_filter_length := COALESCE(array_length(default_filters, 1), 0); 
     custom_filter_length := COALESCE(array_length(custom_filters, 1), 0); 

     default_filterstring := array_to_string(default_filters, ' AND '); 
     custom_filterstring := array_to_string(custom_filters, ' OR '); 

     IF custom_filterstring = '' or custom_filterstring is null THEN 
      custom_filterstring := '1=1'; 
     END IF; 

     IF default_filterstring = '' or default_filterstring is null THEN 
      default_filterstring := '1=1'; 
     END IF; 

     sql := format(' 
        SELECT v.id FROM visitors v 
        LEFT JOIN trackings t on v.id = t.visitor_id 
        WHERE v.app_id = ''HARDCODED_APP_ID'' and (%s) and (%s) 
        group by v.id 

       ', custom_filterstring, default_filterstring, custom_filter_length, custom_filter_length); 
     RETURN QUERY EXECUTE sql; 

    END; 
    $body$ 
    LANGUAGE 'plpgsql'; 

分析没有硬编码硬编码APP_ID

Limit (cost=481.86..481.99 rows=50 width=531) (actual time=25.890..25.893 rows=9 loops=1) 
2  -> Sort (cost=481.86..484.26 rows=960 width=531) (actual time=25.888..25.890 rows=9 loops=1) 
3   Sort Key: v0.last_seen DESC 
4   Sort Method: quicksort Memory: 30kB 
5   -> WindowAgg (cost=414.62..449.97 rows=960 width=531) (actual time=25.862..25.870 rows=9 loops=1) 
6     -> Hash Join (cost=414.62..437.97 rows=960 width=523) (actual time=25.830..25.841 rows=9 loops=1) 
7      Hash Cond: ((find_matching_visitors.id)::text = (v0.id)::text) 
8      -> Function Scan on find_matching_visitors (cost=0.25..10.25 rows=1000 width=32) (actual time=15.875..15.876 rows=9 loops=1) 
9      -> Hash (cost=354.19..354.19 rows=4814 width=523) (actual time=9.936..9.936 rows=4887 loops=1) 
10       Buckets: 8192 Batches: 1 Memory Usage: 2145kB 
11       -> Seq Scan on visitors v0 (cost=0.00..354.19 rows=4814 width=523) (actual time=0.013..5.232 rows=4887 loops=1) 
12         Filter: ((NOT merged) AND (((type)::text = 'user'::text) OR ((type)::text = 'lead'::text))) 
13         Rows Removed by Filter: 138 
14 Planning time: 0.772 ms 
15 Execution time: 26.006 ms 

更新1时APP_ID

Limit (cost=481.86..481.99 rows=50 width=531) (actual time=163.579..163.581 rows=9 loops=1) 
2  -> Sort (cost=481.86..484.26 rows=960 width=531) (actual time=163.578..163.579 rows=9 loops=1) 
3   Sort Key: v0.last_seen DESC 
4   Sort Method: quicksort Memory: 30kB 
5   -> WindowAgg (cost=414.62..449.97 rows=960 width=531) (actual time=163.553..163.560 rows=9 loops=1) 
6     -> Hash Join (cost=414.62..437.97 rows=960 width=523) (actual time=163.525..163.537 rows=9 loops=1) 
7      Hash Cond: ((find_matching_visitors.id)::text = (v0.id)::text) 
8      -> Function Scan on find_matching_visitors (cost=0.25..10.25 rows=1000 width=32) (actual time=153.918..153.918 rows=9 loops=1) 
9      -> Hash (cost=354.19..354.19 rows=4814 width=523) (actual time=9.578..9.578 rows=4887 loops=1) 
10       Buckets: 8192 Batches: 1 Memory Usage: 2145kB 
11       -> Seq Scan on visitors v0 (cost=0.00..354.19 rows=4814 width=523) (actual time=0.032..4.993 rows=4887 loops=1) 
12         Filter: ((NOT merged) AND (((type)::text = 'user'::text) OR ((type)::text = 'lead'::text))) 
13         Rows Removed by Filter: 138 
14 Planning time: 1.134 ms 
15 Execution time: 163.705 ms 

分析:增加了两种情况解释。注意:他们实际上是完全相同的计划,只花费时间不同

更新2:事实证明,我需要将app_id作为参数传递给格式函数,而不是直接嵌入它。这将查询时间缩短到20/30ms左右

+0

PostgreSQL版本? –

+0

使用9.5版 – Tarlen

+0

EXPLAIN ANALYSE必须说什么? –

回答

2

硬编码值对于确定最优查询计划很重要。 例如:

select * from some_table where id_person=231 
select * from some_table where id_person=10 

当some_table的90%已id_person = 231微克使用全表扫描,因为这是最快的。 当1%的记录有id_person = 10时,它使用索引扫描。 所以使用的计划取决于参数的值。

当您使用非硬编码值时,例如

select * from some_table where id_person=? 

它无法确定最优化的查询计划,查询速度可能会变慢。

+0

可以看到解释分析,他们使用完全相同的查询计划 – Tarlen

+0

@Tarlen:这是关于语句_inside_函数的执行计划,而不是使用函数 –

+0

好的,我发现问题并更新了文章 – Tarlen

相关问题