我有一个(非常大)表像这样OLAP功能处理 - 为什么快上N/M分区M倍比N个记录运行1次
CREATE SET TABLE LOAN
(LoanNumber VARCHAR(100),
LoanBalance DECIMAL(18,4),
RecTimeStamp TIMESTAMP(0)
)
PRIMARY INDEX (LoanNumber)
PARTITION BY RANGE_N
(ROW_INS_TS BETWEEN
TIMESTAMP '2017-01-01 00:00:00+00:00'
AND TIMESTAMP '2017-12-31 23:59:59+00:00'
EACH INTERVAL '1' DAY
);
通常此表获取通过快照卷起来,例如4月份月底快照将是
-- Pretend there is just 2017 data there
CREATE SET TABLE LOAN_APRIL AS
(SELECT *
FROM LOAN
WHERE RecTimeStamp <= DATE '2017-04-30'
QUALIFY row_number() OVER
(PARTITION BY LoanNumber
ORDER BY RecTimeStamp DESC
) = 1
)
PRIMARY INDEX (LoanNumber);
这通常需要相当长时间才能运行。我虽然昨天的实验,发现我打破它拆开,这样
CREATE SET TABLE LOAN_APRIL_TMP
(LoanNumber VARCHAR(100),
LoanBalance DECIMAL(18,4),
RecTimeStamp TIMESTAMP(0)
)
PRIMARY INDEX (LoanNumber);
CREATE SET TABLE LOAN_APRIL
(LoanNumber VARCHAR(100),
LoanBalance DECIMAL(18,4),
RecTimeStamp TIMESTAMP(0)
)
PRIMARY INDEX (LoanNumber);
INSERT INTO LOAN_APRIL_TMP
SELECT *
FROM LOAN
WHERE RecTimeStamp BETWEEN DATE '2017-01-01' AND DATE '2017-01-31'
QUALIFY row_number() OVER
(PARTITION BY LoanNumber
ORDER BY RecTimeStamp DESC
) = 1;
INSERT INTO LOAN_APRIL_TMP
SELECT *
FROM LOAN
WHERE RecTimeStamp BETWEEN DATE '2017-02-01' AND DATE '2017-02-28'
QUALIFY row_number() OVER
(PARTITION BY LoanNumber
ORDER BY RecTimeStamp DESC
) = 1;
INSERT INTO LOAN_APRIL_TMP
SELECT *
FROM LOAN
WHERE RecTimeStamp BETWEEN DATE '2017-03-01' AND DATE '2017-03-31'
QUALIFY row_number() OVER
(PARTITION BY LoanNumber
ORDER BY RecTimeStamp DESC
) = 1;
INSERT INTO LOAN_APRIL_TMP
SELECT *
FROM LOAN
WHERE RecTimeStamp BETWEEN DATE '2017-04-01' AND DATE '2017-04-30'
QUALIFY row_number() OVER
(PARTITION BY LoanNumber
ORDER BY RecTimeStamp DESC
) = 1;
INSERT INTO LOAN_APRIL
SELECT *
FROM LOAN_APRIL_TMP
QUALIFY row_number() OVER
(PARTITION BY LoanNumber
ORDER BY RecTimeStamp DESC
) = 1;
我只是跑这个顺序有很好的执行时间,所以他们没有并行执行。今天我要试验看看如何让每个片段并行加载。
此外,对于更大的一点,我无法找到足够的技术文档来确定这些类型的问题。有这方面的好资源吗?我知道有很多适当的问题,但必须有一些内容至少在高层次上描述这些功能的实施。
@YellowBedwetter:能否请您以后添加一些信息测试这实际上是否改善了性能? – dnoeth