此问题与this之一有关。我有包含设备功率值的表,我需要计算给定时间范围内的功耗并返回10个最耗电设备。我已经生成了192个设备和7742208个测量记录(每个40324)。这大概是设备在一个月内将产生多少记录。如何使用窗口函数优化SQL查询
对于这个数据量,我当前的查询需要40多秒来执行,这是因为时间跨度和设备和测量数量可能会高得多。我是否应该尝试用不同的方法解决这个问题,而不是使用滞后()OVER PARTITION以及可以进行哪些其他优化?我真的很感激代码示例的建议。
的PostgreSQL版本9.4
查询与示例值:
SELECT
t.device_id,
sum(len_y*(extract(epoch from len_x))) AS total_consumption
FROM (
SELECT
m.id,
m.device_id,
m.power_total,
m.created_at,
m.power_total+lag(m.power_total) OVER (
PARTITION BY device_id
ORDER BY m.created_at
) AS len_y,
m.created_at-lag(m.created_at) OVER (
PARTITION BY device_id
ORDER BY m.created_at
) AS len_x
FROM
measurements AS m
WHERE m.created_at BETWEEN '2015-07-30 13:05:24.403552+00'::timestamp
AND '2015-08-27 12:34:59.826837+00'::timestamp
) AS t
GROUP BY t.device_id
ORDER BY total_consumption
DESC LIMIT 10;
表信息:
Column | Type | Modifiers
--------------+--------------------------+----------------------------------------------------------
id | integer | not null default nextval('measurements_id_seq'::regclass)
created_at | timestamp with time zone | default timezone('utc'::text, now())
power_total | real |
device_id | integer | not null
Indexes:
"measurements_pkey" PRIMARY KEY, btree (id)
"measurements_device_id_idx" btree (device_id)
"measurements_created_at_idx" btree (created_at)
Foreign-key constraints:
"measurements_device_id_fkey" FOREIGN KEY (device_id) REFERENCES devices(id)
查询计划:
Limit (cost=1317403.25..1317403.27 rows=10 width=24) (actual time=41077.091..41077.094 rows=10 loops=1)
-> Sort (cost=1317403.25..1317403.73 rows=192 width=24) (actual time=41077.089..41077.092 rows=10 loops=1)
Sort Key: (sum((((m.power_total + lag(m.power_total) OVER (?))) * date_part('epoch'::text, ((m.created_at - lag(m.created_at) OVER (?)))))))
Sort Method: top-N heapsort Memory: 25kB
-> GroupAggregate (cost=1041700.67..1317399.10 rows=192 width=24) (actual time=25361.013..41076.562 rows=192 loops=1)
Group Key: m.device_id
-> WindowAgg (cost=1041700.67..1201314.44 rows=5804137 width=20) (actual time=25291.797..37839.727 rows=7742208 loops=1)
-> Sort (cost=1041700.67..1056211.02 rows=5804137 width=20) (actual time=25291.746..30699.993 rows=7742208 loops=1)
Sort Key: m.device_id, m.created_at
Sort Method: external merge Disk: 257344kB
-> Seq Scan on measurements m (cost=0.00..151582.05 rows=5804137 width=20) (actual time=0.333..5112.851 rows=7742208 loops=1)
Filter: ((created_at >= '2015-07-30 13:05:24.403552'::timestamp without time zone) AND (created_at <= '2015-08-27 12:34:59.826837'::timestamp without time zone))
Planning time: 0.351 ms
Execution time: 41114.883 ms
查询以生成测试表和数据:
CREATE TABLE measurements (
id serial primary key,
device_id integer,
power_total real,
created_at timestamp
);
INSERT INTO measurements(
device_id,
created_at,
power_total
)
SELECT
device_id,
now() + (i * interval '1 minute'),
random()*(50-1)+1
FROM (
SELECT
DISTINCT(device_id),
generate_series(0,10) AS i
FROM (
SELECT
generate_series(1,5) AS device_id
) AS dev_ids
) AS gen_table;
(device_id,created_at)上的组合索引如何?顺便说一句,恕我直言,你应该在使用前将'm.power_total + lag(m.power_total)'除以2。 (或者只取平均值) – joop
+1我长期见过的最佳问题。写得很好,适当的样本。我在一秒内创建了示例数据库。现在我应该在'series'中输入什么值来生成一个类似于您当前大小的分贝? –
您的where条件不会删除任何行。这是打算吗?排序也在磁盘上完成:'外部合并磁盘:257344kB',这需要相当长的时间(你的执行计划失去了缩进,所以它有点难以阅读)。如果您增加会话的'work_mem',直到在内存中完成排序,您应该会看到更好的性能。 –