2010-05-12 69 views
2

我遇到一些数百万行表的严重性能问题,我觉得我应该可以从相当快的速度得到结果。这里有一个跑步的我有什么了,怎么我查询它,它采取了多久:在数百万行的表格上执行聚合函数

  • 我运行SQL Server 2008标准版,所以分区目前还不是一个选项

  • 我试图为过去30天内特定帐户的所有广告资源汇总所有视图。

  • 所有视图都存储在如下表所示:

 
CREATE TABLE [dbo].[LogInvSearches_Daily](
    [ID] [bigint] IDENTITY(1,1) NOT NULL, 
    [Inv_ID] [int] NOT NULL, 
    [Site_ID] [int] NOT NULL, 
    [LogCount] [int] NOT NULL, 
    [LogDay] [smalldatetime] NOT NULL, 
CONSTRAINT [PK_LogInvSearches_Daily] PRIMARY KEY CLUSTERED 
(
    [ID] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] 
) ON [PRIMARY] 
  • 这个表有1.32亿分的记录,超过4场音乐会。

  • 10行从表中的样品:

 
ID     Inv_ID  Site_ID  LogCount LogDay 
-------------------- ----------- ----------- ----------- ----------------------- 
1     486752  48   14   2009-07-21 00:00:00 
2     119314  51   16   2009-07-21 00:00:00 
3     313678  48   25   2009-07-21 00:00:00 
4     298863  0   1   2009-07-21 00:00:00 
5     119996  0   2   2009-07-21 00:00:00 
6     463777  534   7   2009-07-21 00:00:00 
7     339976  503   2   2009-07-21 00:00:00 
8     333501  570   4   2009-07-21 00:00:00 
9     453955  0   12   2009-07-21 00:00:00 
10     443291  0   4   2009-07-21 00:00:00 

(10 row(s) affected) 
  • 我已经在LogInvSearches_Daily下列指数:
 
/****** Object: Index [IX_LogInvSearches_Daily_LogDay] Script Date: 05/12/2010 11:08:22 ******/ 
CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_LogDay] ON [dbo].[LogInvSearches_Daily] 
(
    [LogDay] ASC 
) 
INCLUDE ([Inv_ID], 
[LogCount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
  • 我需要拉库存只能从库存中为特定的帐户编号。我在库存上也有一个索引。

我正在使用以下查询来汇总数据并给出前5条记录。此查询目前正在24秒返回5行:

 
StmtText 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
SELECT TOP 5 
    Sum(LogCount) AS Views 
    , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank 
    , Inv_ID 
FROM LogInvSearches_Daily D (NOLOCK) 
WHERE 
    LogDay > DateAdd(d, -30, getdate()) 
    AND EXISTS(
     SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID 
    ) 
GROUP BY Inv_ID 


(1 row(s) affected) 

StmtText 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
    |--Top(TOP EXPRESSION:((5))) 
     |--Sequence Project(DEFINE:([Expr1007]=dense_rank)) 
      |--Segment 
       |--Segment 
         |--Sort(ORDER BY:([Expr1006] DESC, [D].[Inv_ID] DESC)) 
          |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) 
           |--Sort(ORDER BY:([D].[Inv_ID] ASC)) 
            |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) 
              |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1011], [Expr1012], [Expr1010])) 
              | |--Compute Scalar(DEFINE:(([Expr1011],[Expr1012],[Expr1010])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) 
              | | |--Constant Scan 
              | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1011] AND [D].[LogDay] < [Expr1012]) ORDERED FORWARD) 
              |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOA 

(13 row(s) affected) 

我使用CTE先拿起行,它们聚集试过,但没有跑得更快,并给我本质上是相同的执行计划。

 

(1 row(s) affected) 
StmtText 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
--SET SHOWPLAN_TEXT ON; 
WITH getSearches AS (
     SELECT 
      LogCount 
--   , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank 
      , D.Inv_ID 
     FROM LogInvSearches_Daily D (NOLOCK) 
      INNER JOIN propertyControlCenter.dbo.Inventory I (NOLOCK) ON Acct_ID = 18731 AND I.Inv_ID = D.Inv_ID 
     WHERE 
      LogDay > DateAdd(d, -30, getdate()) 
--  GROUP BY Inv_ID 
) 

SELECT Sum(LogCount) AS Views, Inv_ID 
FROM getSearches 
GROUP BY Inv_ID 


(1 row(s) affected) 

StmtText 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
    |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1004]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) 
     |--Sort(ORDER BY:([D].[Inv_ID] ASC)) 
      |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) 
       |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1007])) 
       | |--Compute Scalar(DEFINE:(([Expr1008],[Expr1009],[Expr1007])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) 
       | | |--Constant Scan 
       | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1008] AND [D].[LogDay] < [Expr1009]) ORDERED FORWARD) 
       |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID] AS [I]), SEEK:([I].[Acct_ID]=(18731) AND [I].[Inv_ID]=[LOALogs].[dbo].[LogInvSearches_Daily].[Inv_ID] as [D].[Inv_ID]) ORDERED FORWARD) 

(8 row(s) affected) 


(1 row(s) affected) 

所以因为我得到很好的索引搜索在我的执行计划,我能做些什么来得到这个运行速度更快?

UPDATE:

这是同样的查询运行没有DENSE_RANK(),它需要完全相同的24秒内运行,使我有相同的基本查询计划:

 
StmtText 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
--SET SHOWPLAN_TEXT ON 
SELECT TOP 5 
    Sum(LogCount) AS Views 
    , Inv_ID 
FROM LogInvSearches_Daily D (NOLOCK) 
WHERE 
    LogDay > DateAdd(d, -30, getdate()) 
    AND EXISTS(
     SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID 
    ) 
GROUP BY Inv_ID 
ORDER BY Views, Inv_ID 
(1 row(s) affected) 

StmtText 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
    |--Sort(TOP 5, ORDER BY:([Expr1006] ASC, [D].[Inv_ID] ASC)) 
     |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) 
      |--Sort(ORDER BY:([D].[Inv_ID] ASC)) 
       |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) 
         |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1010], [Expr1011], [Expr1009])) 
         | |--Compute Scalar(DEFINE:(([Expr1010],[Expr1011],[Expr1009])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) 
         | | |--Constant Scan 
         | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1010] AND [D].[LogDay] < [Expr1011]) ORDERED FORWARD) 
         |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOALogs].[dbo].[LogInvS 

(9 row(s) affected) 


谢谢,

+0

你能提供一个你想看到的输出的例子吗?目前还不清楚为什么你需要DENSE_RANK。 – 2010-05-12 16:47:00

+0

我只需要排名前5位。刚刚发布了更新,显示了使用或不使用DENSE_RANK()的完全相同的性能。 – 2010-05-12 18:18:09

回答

1

我还没经过你的整个阅读的问题(我会走到那不久),但回答的早期评论:你可以在SQL Server 2008标准版中使用分区使用分区视图。它被划分为(这被公认为更灵活),仅限于企业版。

Paritioned看待信息:http://msdn.microsoft.com/en-us/library/ms190019.aspx

对更广泛的问题,我想知道,如果你真的需要DENSE_RANK在那里。我想知道你是否在DENSE_RANK的ORDER BY和查询本身的ORDER BY之间感到困惑。由于它站在你的TOP 5将返回5 undefined记录,因为SQL Server不保证记录上的任何顺序,除非指定了ORDER BY子句(你还没有完成)。如果您将ORDER BY从DENSE_RANK向下移动以成为整个查询ORDER BY,那么记录会按我想的方式出现,并且它将消除对昂贵的DENSE_RANK聚合函数的需要。

SELECT TOP 5 
    SUM([LogCount]) AS [Views], 
    [Inv_ID] 
FROM [LogInvSearches_Daily] D (NOLOCK) 
WHERE 
    [LogDay] > DateAdd(d, -30, getdate()) 
    AND EXISTS(
     SELECT * 
     FROM Inventory (NOLOCK) 
     WHERE Acct_ID = 18731 
      AND Inv_ID = D.Inv_ID 
    ) 
GROUP BY 
    Inv_ID 
ORDER BY 
    [Views] DESC, 
    [Inv_ID] 

UPDATE:

的时间可能正在使用在这里:

|--Sort(ORDER BY:([D].[Inv_ID] ASC)) 

你可以尝试创建一个覆盖索引像这样的:

CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_Perf] ON [dbo].[LogInvSearches_Daily] 
(
    [Inv_ID] ASC, 
    [LogDay] ASC 
) 
INCLUDE 
(
    [LogCount] 
) 

注我也稍微改变了ORDER BY(Inv_ID现在是这样用ASC代替DESC)。我怀疑这种改变不会以有问题的方式影响结果,但可能有助于提高性能,因为它将按照与它们分组相同的顺序返回行(尽管这可能是不相关的!)。

+0

DENSE_RANK()或不,结果仍然一样慢。我已经尝试了这两种方式,并且我仍然无法将此加载速度超过24秒。更新后显示查询计划和时间为同一查询没有DENSE_RANK() – 2010-05-12 18:14:09

+0

我已经更新我的回答与索引建议 – 2010-05-12 18:33:13

+0

我认为该索引将做的伎俩。现在我只需要弄清楚如何在不关闭整个服务器的情况下创建索引... – 2010-05-13 15:47:29

1

分区之外,

基于我们比你们大表的经验,我们提取数据到一个临时表(不表变量)和聚合上。并非针对所有查询,而是更复杂的查询。

除此之外,我同意丹尼尔·伦肖的有关DENSE_RANK

观察

我还认为有关移动[Inv_ID],[LogCount]进入指数(不包括,或许还有一个降序排序)

+0

那么这就是聚合表...我们有一个由MS表ms创建的ms,然后将所有这些请求转换成天。我现在试图查询。我无法将其分解得更远,因为这些将是用户根据需要为其帐户运行的动态查询。 – 2010-05-12 18:21:53

0

Acct_ID位于Inventory表上,似乎有自己的索引(IX_Inventory_Acct_ID)。也许如果Inventory(Acct_Id,Inv_Id)上的索引和LogInvSearches_Daily(Inv_Id,LogDay)周围聚集(或至少索引),您会有更多的运气。

顺便说一句,我不知道什么LogInvSearches_Daily.ID当前的聚类索引应该买你。为什么导入时在磁盘上有近距离ID的记录?