2017-06-22 99 views
0

我在Postgresql中遇到了一些问题。此查询需要很长的时间来执行(无缓冲约30秒) 我的查询是在这里:Postgresql LARGE查询优化

SELECT d.name, COUNT (*) AS cnt, 
      'first' AS TYPE 
     FROM 
      tableA a 
     INNER JOIN tableD d ON d.NAME = 'FOO' 
     AND a.key = d.key 
     WHERE 
      a.DATE > '2017-06-01' 
     AND a.DATE < '2017-07-01' 
     group by d.name 
UNION ALL 
    SELECT 
     d.name, 
     COUNT (*) AS cnt, 
     'second' AS TYPE 
    FROM 
     tableB b 
    INNER JOIN tableD d ON d.NAME = 'FOO' 
    AND b.key = d.key 
    WHERE 
     b.DATE > '2017-06-01' 
    AND b.DATE < '2017-07-01' 
    group by d.name 
UNION ALL 
    SELECT 
     d.name, 
     COUNT (*) AS cnt, 
     'Third' AS TYPE 
    FROM 
     tableC c 
    INNER JOIN tableD d ON d.NAME = 'FOO' 
    AND c.key = d.key 
    WHERE 
     c.date > '2017-06-01' 
    AND c.date < '2017-07-01' 
    group by d.name 

我创建了tableC.key(B树)索引和tableC.name(哈希) 而且其他表对日期和键(B树)索引

所以我的查询可以通过索引加入,并且可以通过指标筛选

我提出有几千行,别人有几十亿或几乎百亿

在Ë xecution计划我看到执行人使用嵌套循环中的所有我的连接(预计一个在BD加盟,有一个哈希联接)

也许我找到了“背叛者”

Node Type": "Bitmap Heap Scan", 
     "Parent Relationship": "Inner", 
     "Relation Name": "tableA", 
     "Alias": "a", 
     "Startup Cost": 2469.84, 
     "Total Cost": 137625.61, 
     "Plan Rows": 53748, 
     "Plan Width": 37, 
     "Recheck Cond": "(((key)::text = (d.key)::text) AND (date > '2017-06-01 00:00:00'::timestamp without time zone) AND (date < '2017-07-01 00:00:00'::timestamp without time zone))", 
       "Plans": [{ 
        "Node Type": "Bitmap Index Scan", 
        "Parent Relationship": "Outer", 
        "Index Name": "\"date + key\"", 
        "Startup Cost": 0.00, 
        "Total Cost": 2456.40, 
        "Plan Rows": 53748, 
        "Plan Width": 0, 
        "Index Cond": "(((key)::text = (d.key)::text) AND (date > '2017-06-01 00:00:00'::timestamp without time zone) AND (date < '2017-07-01 00:00:00'::timestamp without time zone))" 
          }] 

提出:

CREATE TABLE "sch"."tableD" (
    "id" int4 NOT NULL, 
    "key" varchar(36) COLLATE "default", 
    "name" varchar(255) COLLATE "default", 


    CREATE INDEX "license_key" ON "sch"."tableD" USING btree ("key"); 
    CREATE INDEX "name" ON "sch"."tableD" USING btree ("name"); 

表A:

CREATE TABLE "sch"."tableA" (
    "id" int4 DEFAULT nextval('"sch".table'::regclass) NOT NULL, 
    "key" varchar(255) COLLATE "default", 
    "date" timestamp(6), 

    CREATE INDEX "date" ON "sch"."tableA" USING btree ("date"); 
    CREATE INDEX "date + key" ON "sch"."tableA" USING btree ("key", "date") 
    CREATE INDEX "keyIndex" ON "sch"."tableA" USING btree ("key"); 

表B和C相似甲

我不知道,为什么我在这里失去了时间。你能帮我解决我的问题,这查询不应该运行30秒 谢谢

+0

开始通过测量每个子查询需要多长时间。然后你可以缩小性能问题。 –

+0

不确定,但在我看来,我们可以消除工会和使用窗函数得到计数有1个查询。和一个case语句来设置类型和外部连接。 – xQbert

+0

第一子查询花费的时间最长,但最行是在表A,所以我可以想像这可能会导致查询的放缓 如果我消除我的工会执行者可以选择散列连接(或合并联接,如果我上的按键使用哈希索引),但它是更慢(100-120秒) –

回答

0

提供这些B树指数(哈希):

b: (DATE, key) 
b: (key, DATE) 
d: (NAME, key) 
d: (key, NAME) 

它看起来像一个月的时间跨度,但你排除了月初。将>更改为>=