4
我有一个teradata视图,每天包含10亿条记录,我需要处理数据1年,因此我们有365亿条记录,数据按日期分区 - 每天间隔。将teradata中的数十亿条记录从一个表移动到另一个表
我需要插入 - 选择与措施3个ID列(数据将根据这些分组)2列(需要使用SUM AGG功能)
查询是类似下面:
Insert into table1
Select
col1, col2, col3, SUM(col4), SUM(col5)
FROM
table2
GROUP BY
col1, col2, col3
WHERE coldate between 'date1' and 'date2';
问题是,如果我运行一天并且我需要运行一年,查询会继续执行(20分钟内未完成)。
我应该怎么办 - 我应该使用MLOAD - 插入选择还是其他?
请建议,尽快解决。谢谢
Explain SELECT
ORIGINATING_NUMBER_VAL,
SUM(ACTIVITY_DURATION_MEAS),
SUM(Upload_Data_Volume),
SUM(Download_Data_Volume)
FROM
dp_tab_view.NETWORK_ACTIVITY_DATA_RES
WHERE
CAST(Activity_Start_Dttm as DATE) between '2014-12-01' AND '2014-12-31'
GROUP BY
ORIGINATING_NUMBER_VAL;
1) First, we lock DP_TAB.NETWORK_ACTIVITY_DATA_RES in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock
DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock
DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, and we lock
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access.
2) Next, we do an all-AMPs RETRIEVE step from 31 partitions of
DP_TAB.NETWORK_ACTIVITY_DATA_RES in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES with a condition of (
"(DP_TAB.NETWORK_ACTIVITY_DATA_RES in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-12-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_RES in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '3015-02-09 00:00:00') AND
(DP_TAB.NETWORK_ACTIVITY_DATA_RES in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '2015-01-01 00:00:00'))") into Spool 1 (all_amps), which
is built locally on the AMPs. The input table will not be cached
in memory, but it is eligible for synchronized scanning. The size
of Spool 1 is estimated with low confidence to be 1 row (70 bytes).
The estimated time for this step is 37.22 seconds.
3) We do an all-AMPs RETRIEVE step from 31 partitions of
DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES with a condition of (
"(DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-12-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '2015-01-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-10-13 00:00:00') AND
(DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '3015-02-10 00:00:00')))") into Spool 1 (all_amps),
which is built locally on the AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The result spool file will not be cached in memory. The size of
Spool 1 is estimated with low confidence to be 22,856,337,679 rows
(1,599,943,637,530 bytes). The estimated time for this step is 1
hour and 52 minutes.
4) We do an all-AMPs RETRIEVE step from 0 partitions of
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan
with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in
view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-12-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm<
TIMESTAMP '2015-01-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '2014-04-01 00:00:00') AND
(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-01-01 00:00:00')))") into Spool 1 (all_amps),
which is built locally on the AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 1 is estimated with low confidence to be
22,856,337,680 rows (1,599,943,637,600 bytes). The estimated time
for this step is 0.01 seconds.
5) We do an all-AMPs RETRIEVE step from 0 partitions of
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan
with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in
view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-12-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm<
TIMESTAMP '2015-01-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '2014-07-01 00:00:00') AND
(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-04-01 00:00:00')))") into Spool 1 (all_amps),
which is built locally on the AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 1 is estimated with low confidence to be
22,856,337,681 rows (1,599,943,637,670 bytes). The estimated time
for this step is 0.01 seconds.
6) We do an all-AMPs RETRIEVE step from 0 partitions of
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan
with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in
view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-12-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm<
TIMESTAMP '2014-10-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-07-01 00:00:00') AND
(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '2015-01-01 00:00:00')))") into Spool 1 (all_amps),
which is built locally on the AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 1 is estimated with low confidence to be
22,856,337,682 rows (1,599,943,637,740 bytes). The estimated time
for this step is 0.01 seconds.
7) We do an all-AMPs RETRIEVE step from 0 partitions of
DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan
with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in
view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-12-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm<
TIMESTAMP '2015-01-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '2014-10-13 00:00:00') AND
(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-10-01 00:00:00')))") into Spool 1 (all_amps),
which is built locally on the AMPs. The input table will not be
cached in memory, but it is eligible for synchronized scanning.
The size of Spool 1 is estimated with low confidence to be
22,856,337,683 rows (1,599,943,637,810 bytes). The estimated time
for this step is 0.01 seconds.
8) We do an all-AMPs RETRIEVE step from 0 partitions of
DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan
with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in
view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >=
TIMESTAMP '2014-12-01 00:00:00') AND
((DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm<
TIMESTAMP '2014-01-01 00:00:00') AND
(DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view
dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm <
TIMESTAMP '2015-01-01 00:00:00'))") into Spool 1 (all_amps), which
is built locally on the AMPs. The input table will not be cached
in memory, but it is eligible for synchronized scanning. The size
of Spool 1 is estimated with low confidence to be 22,856,337,684
rows (1,599,943,637,880 bytes). The estimated time for this step
is 0.01 seconds.
9) We do an all-AMPs SUM step to aggregate from Spool 1 (Last Use) by
way of an all-rows scan with a condition of (
"((CAST((NETWORK_ACTIVITY_DATA_RES.ACTIVITY_START_DTTM) AS
DATE))>= DATE '2014-12-01') AND
((CAST((NETWORK_ACTIVITY_DATA_RES.ACTIVITY_START_DTTM) AS DATE))<=
DATE '2014-12-31')") , grouping by field1 (ORIGINATING_NUMBER_VAL).
Aggregate Intermediate Results are computed globally, then placed
in Spool 4. The aggregate spool file will not be cached in memory.
The size of Spool 4 is estimated with low confidence to be
17,142,253,263 rows (1,628,514,059,985 bytes). The estimated time
for this step is 6 hours and 28 minutes.
10) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of
an all-rows scan into Spool 2 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 2 is estimated with low confidence to be
17,142,253,263 rows (1,165,673,221,884 bytes). The estimated time
for this step is 21 minutes and 27 seconds.
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of
statement 1. The total estimated time is 8 hours and 42 minutes.
什么是索引和计划?看起来像'范围'型查询,它拒绝任何复合索引。 Group'ing十亿。的未索引记录既是记忆又是耗时的任务。 – Matt 2015-03-03 13:21:30
目标表的PI是什么?它是否与源表匹配?如果不这样做,是否有可能在查询计划的重新分配步骤中存在歪斜问题。具有10亿行的分区相当深,但取决于系统配置,而不是难以管理的。您的统计数据是基于PPI表的推荐做法收集的? – 2015-03-03 16:26:16
感谢您的回复,下面是选择查询的执行计划(考虑1个月),总时间估计为8小时加。请指教。 https://onedrive.live.com/?cid=73d6f5250a5bffa7&id=73D6F5250A5BFFA7!256&ithint=file,txt&authkey=!ABNlAtlSDyGDaLI – 2015-03-04 06:09:57