2017-02-11 24 views
2

我可以使用传统的子查询方法来统计最近10分钟内的事件。例如:窗口函数用于统计最近10分钟内发生的事件

drop table if exists [dbo].[readings] 
go 

create table [dbo].[readings](
    [server] [int] NOT NULL, 
    [sampled] [datetime] NOT NULL 
) 
go 

insert into readings 
values 
(1,'20170101 08:00'), 
(1,'20170101 08:02'), 
(1,'20170101 08:05'), 
(1,'20170101 08:30'), 
(1,'20170101 08:31'), 
(1,'20170101 08:37'), 
(1,'20170101 08:40'), 
(1,'20170101 08:41'), 
(1,'20170101 09:07'), 
(1,'20170101 09:08'), 
(1,'20170101 09:09'), 
(1,'20170101 09:11') 
go 

-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21 
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes 
from readings r1 
order by server,sampled 
go 

如何使用窗口函数获得相同的结果?我试过这个:

select server,sampled, 
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes 
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes 
from readings r1 
order by server,sampled 

但结果只是运行计数。任何引用当前行指针的系统变量? currentrow.sampled?

+0

试试这个 SELECT COUNT(1)从读数R1 其中DATEDIFF(分钟,GETDATE(),采样)<= 10 –

回答

2

这不是一个很讨好的答案,但一种可能性是先创建一个辅助表的所有分

CREATE TABLE #DateTimes(datetime datetime primary key); 

WITH E1(N) AS 
(
    SELECT 1 FROM (VALUES(1),(1),(1),(1),(1), 
          (1),(1),(1),(1),(1)) V(N) 
)          -- 1*10^1 or 10 rows 
, E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows 
, E4(N) AS (SELECT 1 FROM E2 a, E2 b) -- 1*10^4 or 10,000 rows 
, E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows 
,R(StartRange, EndRange) 
AS (SELECT MIN(sampled), 
      MAX(sampled) 
    FROM readings) 
,N(N) 
AS (SELECT ROW_NUMBER() 
       OVER (
       ORDER BY (SELECT NULL)) AS N 
    FROM E8) 
INSERT INTO #DateTimes 
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange) 
FROM N, 
     R; 

,然后与您可以使用ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

WITH T1 AS 
(SELECT Server, 
        MIN(sampled) AS StartRange, 
        MAX(sampled) AS EndRange 
     FROM  readings 
     GROUP BY Server) 
SELECT  Server, 
      sampled, 
      Cnt 
FROM  T1 
CROSS APPLY 
      (SELECT r.sampled, 
           COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt 
         FROM  #DateTimes N 
         LEFT JOIN readings r 
         ON  r.sampled = N.datetime 
           AND r.server = T1.server 
         WHERE  N.datetime BETWEEN StartRange AND EndRange) CA 
WHERE  CA.sampled IS NOT NULL 
ORDER BY sampled 

上面假定每分钟最多有一个样本,并且所有时间都是精确的分钟。如果这不是真的,它需要另一个表格表达式按日期时间预先聚合到一分钟。

1

据我所知,没有一个简单的确切替代你的子查询使用窗口函数。

窗口函数对一组行进行操作,并允许您根据分区和顺序使用它们。 你所要做的不是我们可以在窗口函数中使用的分区类型。 要生成分区,我们需要能够在这个实例中使用窗口函数只会导致代码过于复杂。

我建议cross apply()作为你的子查询的替代。

我不知道你是否打算在9分钟内限制你的结果,但是sampled > dateadd(...)这就是你原来的子查询中发生的情况。

下面是一个窗口函数的样子,它基于将样本分成10分钟窗口和cross apply()版本。

select 
    r.server 
    , r.sampled 
    , CrossApply  = x.CountRecent 
    , OriginalSubquery = (
     select count(*) 
     from readings s 
     where s.server=r.server 
     and s.sampled <= r.sampled 
     /* doesn't include 10 minutes ago */ 
     and s.sampled > dateadd(minute,-10,r.sampled) 
     ) 
    , Slices   = count(*) over(
     /* partition by server, 10 minute slices, not the same thing*/ 
     partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0) 
     order by sampled 
    ) 
from readings r 
    cross apply (
    select CountRecent=count(*) 
    from readings i 
    where i.server=r.server 
     /* changed to >= */ 
     and i.sampled >= dateadd(minute,-10,r.sampled) 
     and i.sampled <= r.sampled 
    ) as x 
order by server,sampled 

结果:http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+ 
| server |  sampled  | CrossApply | OriginalSubquery | Slices | 
+--------+---------------------+------------+------------------+--------+ 
|  1 | 01.01.2017 08:00:00 |   1 |    1 |  1 | 
|  1 | 01.01.2017 08:02:00 |   2 |    2 |  2 | 
|  1 | 01.01.2017 08:05:00 |   3 |    3 |  3 | 
|  1 | 01.01.2017 08:30:00 |   1 |    1 |  1 | 
|  1 | 01.01.2017 08:31:00 |   2 |    2 |  2 | 
|  1 | 01.01.2017 08:37:00 |   3 |    3 |  3 | 
|  1 | 01.01.2017 08:40:00 |   4 |    3 |  1 | 
|  1 | 01.01.2017 08:41:00 |   4 |    3 |  2 | 
|  1 | 01.01.2017 09:07:00 |   1 |    1 |  1 | 
|  1 | 01.01.2017 09:08:00 |   2 |    2 |  2 | 
|  1 | 01.01.2017 09:09:00 |   3 |    3 |  3 | 
|  1 | 01.01.2017 09:11:00 |   4 |    4 |  1 | 
+--------+---------------------+------------+------------------+--------+ 
0

谢谢,马丁和SqlZim,为您解答。我将针对可用于窗口聚合的%% currentrow提出Connect连接增强请求。我想这会导致更简单和自然的SQL:

select count(case when sample < = %% currentrow.sampled and sampled> dateadd(minute,-10,%% currentrow.sampled)then 1否则返回null完)OVER(...无论窗外是...)

我们已经可以用表达式如下:采样< = GETDATE(当

SELECT COUNT(情况)和采样> DATEADD(分,-10,getdate())then 1 else null end)over(...无论窗口是...)

因此,如果我们能够引用当前行中的列,那么思考会很棒。

+0

做的标准SQL的方式,你想要的这里是使用'RANGE'取代'ROWS '但SQL Server不完全支持这一点。 http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-window-clause.html –

相关问题