0
我有一个subscription
表和一个payments
表,我需要加入。 我试图在2个选项之间做出决定,性能是一个关键考虑因素。我应该在连接条件还是先前的CTE中放置行号过滤器?
以下两个选项中哪一个表现更好?
我正在使用Impala,并且这些表很大(数百万行)我只需要为每个id
和date
分组(因此为row_number()
分析函数)获得一行。
我已经缩短了的查询来说明我的问题:
OPTION 1:
WITH cte
AS (
SELECT *
, SUM(amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
),
payment
AS (
SELECT *
FROM cte
WHERE sameday_rownum = 1
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
选项2:
WITH payment
AS (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
AND p.sameday_rownum = 1
只要将条件放在'on'子句中即可。无需使用两个CTE混淆查询。 –
谢谢。因此,考虑到它是内连接,所以没有任何性能影响?我想知道这是否类似于连接条件过滤的性能与最终SQL语句的SQL谓词中的where子句过滤的性能? – cdabel
您应该能够通过查看查询计划来查看优化程序是要在开始还是结束时应用筛选器。 – Connor