2017-05-30 43 views
0

让我先介绍一下,我并不完全确定如何首先提出这个问题,这一直是试图找到答案的一大障碍。因此,我可能会使用完全错误的术语。使用TSQL进行滚动计数

我想在一段时间内使用一个窗口来计算不同用户的数量。

我的数据表包含以下列:Id,User,RequestedOn,Query由系统随时间捕获请求的位置。例如,在八个小时的过程中,系统由78个不同的用户查询370次不同的时间。我想通过蛮力并忽略它(BF & I),但我像许多BF &一样接近,它不能缩放值得的bean。

在这些例子中,计数的窗口大小是8小时;给定8小时时隙内不同用户的数量。

Select '5/28/17 15:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 15:00' And [RequestedOn] <= '5/28/17 23:00' Union 
Select '5/28/17 14:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 14:00' And [RequestedOn] <= '5/28/17 22:00' Union 
Select '5/28/17 13:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 13:00' And [RequestedOn] <= '5/28/17 21:00' Union 
Select '5/28/17 12:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 12:00' And [RequestedOn] <= '5/28/17 20:00' Union 
Select '5/28/17 11:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 11:00' And [RequestedOn] <= '5/28/17 19:00' Union 
Select '5/28/17 10:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 10:00' And [RequestedOn] <= '5/28/17 18:00' Union 
Select '5/28/17 09:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 09:00' And [RequestedOn] <= '5/28/17 17:00' Union 
Select '5/28/17 08:00' [StartingFrom], Count(Distinct [UserName]) [Users] From [vwRequests] Where [RequestedOn] >= '5/28/17 08:00' And [RequestedOn] <= '5/28/17 16:00' 

我觉得有一个更好的方法做到这一点,但我不知道哪里可以开始寻找。

指针将是太棒了!

+0

你能提供一些示例数据和预期输出:

如果你想重写查询,您可以通过做这样的事情开始? – tarheel

回答

1

如果我理解正确的,你需要一个recursive cte这样

DECLARE @StartTime datetime = '2017-05-28 00:00:00' 
DECLARE @EndTime datetime = '2017-05-29 00:00:00' 

;WITH cte AS 
(
    SELECT @StartTime AS StartPeriod, dateadd(hour,8,@StartTime) AS EndPeriod 
    UNION ALL 
    SELECT dateadd(hour,1,StartPeriod), dateadd(hour,1,EndPeriod) AS EndPeriod 
    FROM cte 
    WHERE cte.StartPeriod < @EndTime 
) 
-- cte returns 
--StartPeriod    EndPeriod 
--2017-05-28 00:00:00.000 2017-05-28 08:00:00.000 
--2017-05-28 01:00:00.000 2017-05-28 09:00:00.000 
--2017-05-28 02:00:00.000 2017-05-28 10:00:00.000 
--2017-05-28 03:00:00.000 2017-05-28 11:00:00.000 
--2017-05-28 04:00:00.000 2017-05-28 12:00:00.000 
--2017-05-28 05:00:00.000 2017-05-28 13:00:00.000 
--................. 
SELECT c.StartPeriod, c.EndPeriod, Users FROM cte c 
OUTER APPLY (
      SELECT Count(Distinct [UserName]) AS Users -- i think you should use Count(distinct UserId) instead of UserName 
      From [vwRequests] Where [RequestedOn] BETWEEN c.StartPeriod AND c.EndPeriod 
     ) ca 
OPTION (MAXRECURSION 0) 
+0

这是神奇的,并做我需要的一切。它还向我介绍了通用表格表达式的概念,这是我不知道的事情,并有更多的阅读工作要做。非常感谢! (是的,从长远来看,统计UserId的速度可能会更快)。 – amber

+0

不客气@amber – TriV

1

如果你想优化现有查询的服务表现在不改变太多,与UNION ALL取代UNION以及附加的一些指标用户名和RequestedOn列。

如果vwRequests是一个表(不是视图),请尝试以下,看看有什么最适合你:

CREATE INDEX IX1 ON dbo.vwRequests (RequestedOn, Username) 
CREATE INDEX IX2 ON dbo.vwRequests (Username, RequestedOn) 

如果vwRequests是一个视图,你可以尝试在基表上添加索引或更改视图是一个索引视图。

SELECT x1.StartingFrom, x2.Users 
FROM (VALUES (8),(9),(10),(11),(12),(13),(14),(15)) h (h) 
CROSS APPLY (
    SELECT DATEADD(HOUR,h,'20170528') AS [StartingFrom] 
) x1 
CROSS APPLY (
    SELECT COUNT(DISTINCT vr.Username) AS Users 
    FROM dbo.vwRequests vr 
    WHERE vr.RequestedOn BETWEEN x1.StartingFrom AND DATEADD(HOUR,8,x1.StartingFrom) 
) x2 
+0

谢谢!这不是我所需要的;它虽然看起来不容易缩放,但仍然有效。然而,这给了我新的东西去寻找,学习,并深入探索。为此,我很感激,因此感到高兴。 <3 – amber