2017-04-19 67 views
2

我想要了解Azure SQL数据仓库中的分区表。但我看到的东西对我来说没有意义。我显然做错了什么,但我无法弄清楚它是什么。Azure SQL数据仓库表中的分区数据

我的意图是用10000行数据填充第一个表(Marc.foo),检查分区元数据,然后将分区切换到第二个空表(Marc.foo2)。

我开始通过创建两个分区表:

IF OBJECT_ID('Marc.foo', 'U') IS NOT NULL 
    DROP TABLE Marc.foo 
GO 

IF OBJECT_ID('Marc.foo2', 'U') IS NOT NULL 
    DROP TABLE Marc.foo2 
GO 

CREATE TABLE Marc.foo 
(
    id int NOT NULL 
) 
WITH 
( 
    DISTRIBUTION = HASH (id), 
    CLUSTERED COLUMNSTORE INDEX, 
    PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000)) 
) 
GO 

CREATE TABLE Marc.foo2 
(
    id int NOT NULL 
) 
WITH 
( 
    DISTRIBUTION = HASH (id), 
    CLUSTERED COLUMNSTORE INDEX, 
    PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000)) 
) 
GO 

我然后用10000行填充的第一个表(Marc.foo):

IF OBJECT_ID('tempdb..#numbers', 'U') IS NOT NULL 
    DROP TABLE #numbers 
GO 

WITH 
    CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id), 
    CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b), 
    CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b), 
    CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b), 
    CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b) 
SELECT  id 
INTO  #numbers 
FROM  CTE_64K 

INSERT INTO Marc.foo(id) 
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM #numbers WHERE id <=10000 

因为我刚加载的数据到表,我打算在表格上创建统计数据:

CREATE STATISTICS stats_Marc_foo_id ON Marc.foo(id) WITH FULLSCAN 

现在我检查分区威刚:

SELECT  sch.name AS [schema_name], 
      tbl.[name] AS [table_name], 
      ds.type_desc, 
      prt.[partition_number], 
      rng.[value] AS [current_partition_range_boundary_value], 
      prt.[rows] AS [partition_rows] 
FROM  sys.schemas        sch 
      INNER JOIN sys.tables     tbl ON sch.schema_id  = tbl.schema_id 
      INNER JOIN sys.partitions    prt ON prt.[object_id]  = tbl.[object_id] 
      INNER JOIN sys.indexes     idx ON prt.[object_id]  = idx.[object_id] AND prt.[index_id] = idx.[index_id] 
      INNER JOIN sys.data_spaces    ds ON idx.[data_space_id] = ds.[data_space_id] 
      INNER JOIN sys.partition_schemes  ps ON ds.[data_space_id] = ps.[data_space_id] 
      INNER JOIN sys.partition_functions  pf ON ps.[function_id] = pf.[function_id] 
      LEFT JOIN sys.partition_range_values rng ON pf.[function_id] = rng.[function_id] AND rng.[boundary_id] = prt.[partition_number] 
WHERE  sch.name = 'Marc' AND 
      tbl.name = 'foo' 

问题1:这给了我什么,我期待在current_partition_range_boundary_value方面,但partition_rows(我希望是1000)返回5957行的每个分区。

最后,我尝试从Marc.foo SWITCH分区1至Marc.foo2

ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1 

我希望,当我从Marc.foo2选择,我应该可以看到1000行与ID值从1到1000但我回到零行。

问题2:我做错了什么?

回答

3

代码中有错误。你的CTE带回所有行的数字1,你可以通过检查#numbers表的内容来确认。所以,你的id <= 10000标准没有任何影响和语句总是带回65,536行:通过移动ROW_NUMBER成的SELECT ... INTO

1 1 1 1 1

解决这个问题,比如

WITH 
    CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id), 
    CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b), 
    CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b), 
    CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b), 
    CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b) 
SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id 
INTO  #numbers 
FROM  CTE_64K 

我猜的道德故事是,不要编写自己的数字生成例程而不检查它:)

3

把数字表放在一边,这里是问题

问题1:这给了我对current_partition_range_boundary_value的期望,但partition_rows(我希望为1000)为每个分区返回5957行。

我仍然无法得到我期待的答案。

最后,我尝试将开关分区1从Marc.foo切换到Marc.foo2

ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1 

我希望,当我从Marc.foo2选择,我应该可以看到1000行与ID值从1到1000,但我回来零行。

问题2:我做错了什么?

我误解了RANGE RIGHT。如果我们看一下CREATE TABLE的分区子句,我们看到:

PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 
6000, 7000, 8000, 9000))) 

这意味着,一个ID直到但不包括零将在分区1行,0和999之间的ID行会在分区2中。

分区1中没有行。这是按设计工作的。如果我切换分区2,则行将出现在Marc.foo2中。