2017-06-03 26 views
7

问:如何根据1列的更改值对记录进行排名?根据1列的更改值排列记录

我有如下数据(https://pastebin.com/vdTb1JRT):

EmployeeID Date  Onleave 
ABH12345 2016-01-01 0 
ABH12345 2016-01-02 0 
ABH12345 2016-01-03 0 
ABH12345 2016-01-04 0 
ABH12345 2016-01-05 0 
ABH12345 2016-01-06 0 
ABH12345 2016-01-07 0 
ABH12345 2016-01-08 0 
ABH12345 2016-01-09 0 
ABH12345 2016-01-10 1 
ABH12345 2016-01-11 1 
ABH12345 2016-01-12 1 
ABH12345 2016-01-13 1 
ABH12345 2016-01-14 0 
ABH12345 2016-01-15 0 
ABH12345 2016-01-16 0 
ABH12345 2016-01-17 0 

我想产生以下结果:

EmployeeID DateValidFrom DateValidTo  OnLeave 
ABH12345 2016-01-01  2016-01-09  0 
ABH12345 2016-01-10  2016-01-13  1 
ABH12345 2016-01-14  2016-01-17  0 

所以我想,如果我能以某种方式创建一个排名列(如下所示),该值根据Onleave列中的值增加 - 由EmployeeID列分区。

EmployeeID Date  Onleave RankedCol 
ABH12345 2016-01-01 0   1 
ABH12345 2016-01-02 0   1 
ABH12345 2016-01-03 0   1 
ABH12345 2016-01-04 0   1 
ABH12345 2016-01-05 0   1 
ABH12345 2016-01-06 0   1 
ABH12345 2016-01-07 0   1 
ABH12345 2016-01-08 0   1 
ABH12345 2016-01-09 0   1 
ABH12345 2016-01-10 1   2 
ABH12345 2016-01-11 1   2 
ABH12345 2016-01-12 1   2 
ABH12345 2016-01-13 1   2 
ABH12345 2016-01-14 0   3 
ABH12345 2016-01-15 0   3 
ABH12345 2016-01-16 0   3 
ABH12345 2016-01-17 0   3 

然后,我将能够做到以下几点:

SELECT 
[EmployeeID] = [EmployeeID] 
,[DateValidFrom] = MIN([Date]) 
,[DateValidTo] = MAX([Date]) 
,[OnLeave]  = [OnLeave] 
FROM table/view/cte/sub-query 
GROUP BY 
[EmployeeID] 
,[OnLeave] 
,[RankedCol] 

其他解决方案都非常欢迎..

下面是测试数据:

WITH CTE AS (SELECT EmployeeID = 'ABH12345', [Date] = CAST(N'2016-01-01' AS Date), [Onleave] = 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-02' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-03' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-04' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-05' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-06' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-07' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-08' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-09' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-10' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-11' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-12' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-13' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-14' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-15' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-16' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-17' AS Date), 0 
) 

SELECT * FROM CTE 
+4

加1的样本数据 – TheGameiswar

+1

提示:这是有帮助的标记同时与相应的软件数据库的问题(MySQL和甲骨文,DB2,...)和版本,例如'的SQL服务器2014'。语法和功能的差异往往会影响答案。在这种情况下,滞后是一个相对较新的特征。 – HABO

+0

增加了sql-server-2014,谢谢@HABO –

回答

2

这是另一种更简单的方法来获得所需的输出 - 只访问一次表。

-- sample of data from your question 
with t1(EmployeeID, Date1, Onleave) as(
    select 'ABH12345', cast('2016-01-01' as date), 0 union all 
    select 'ABH12345', cast('2016-01-02' as date), 0 union all 
    select 'ABH12345', cast('2016-01-03' as date), 0 union all 
    select 'ABH12345', cast('2016-01-04' as date), 0 union all 
    select 'ABH12345', cast('2016-01-05' as date), 0 union all 
    select 'ABH12345', cast('2016-01-06' as date), 0 union all 
    select 'ABH12345', cast('2016-01-07' as date), 0 union all 
    select 'ABH12345', cast('2016-01-08' as date), 0 union all 
    select 'ABH12345', cast('2016-01-09' as date), 0 union all 
    select 'ABH12345', cast('2016-01-10' as date), 1 union all 
    select 'ABH12345', cast('2016-01-11' as date), 1 union all 
    select 'ABH12345', cast('2016-01-12' as date), 1 union all 
    select 'ABH12345', cast('2016-01-13' as date), 1 union all 
    select 'ABH12345', cast('2016-01-14' as date), 0 union all 
    select 'ABH12345', cast('2016-01-15' as date), 0 union all 
    select 'ABH12345', cast('2016-01-16' as date), 0 union all 
    select 'ABH12345', cast('2016-01-17' as date), 0 
) 
-- actual query 
select max(w.employeeid) as employeeid 
    , min(w.date1)  as datevalidfrom 
    , max(w.date1)  as datevalidto 
    , max(w.onleave) as onleave 
    from (
     select row_number() over(partition by employeeid order by date1) - 
       row_number() over(partition by employeeid, onleave order by date1) as grp 
      , employeeid 
      , date1 
      , onleave 
      from t1 s 
     ) w 
group by w.grp 
order by employeeid, datevalidfrom 

结果:

employeeid datevalidfrom datevalidto onleave 
---------- ------------- ----------- ----------- 
ABH12345 2016-01-01 2016-01-09 0 
ABH12345 2016-01-10 2016-01-13 1 
ABH12345 2016-01-14 2016-01-17 0 
2

这是群岛问题的一个例子。在这种情况下,您可以使用日期算术。关键的观察结果是,从日期列中减去一个整数序列可以确定类似值的岛屿。

作为一个查询,这看起来像:

SELECT EmployeeId, MIN([Date]) as DateValidFrom, MAX([Date]) as DateValidTo, 
     OnLeave 
FROM (SELECT t.*, 
      ROW_NUMBER() OVER (PARTITION BY EmployeeId, OnLeave ORDER BY [Date]) as seqnum 
     FROM t 
    ) t 
GROUP BY EmployeeID, DATEADD(day, - seqnum, [Date]), OnLeave; 

您可以运行子查询,在结果盯着,做算术明白为什么这个工程。

这里是example

+0

有趣..从我开始的地方输出有点仍然是相同的。我怎么能够只用3行来总结结果呢? –

3

还有一种方法可以做到这一点lag。通过获取每个employeeid的前一个Onleave值并在找到不同值时重置它,从而分配组。

select employeeid,min(date) as date_from,max(date) as date_to,max(onleave) as onleave 
from (select t.*,sum(case when prev_ol=onleave then 0 else 1 end) over(partition by employeeid order by date) as grp 
     from (select c.*,lag(onleave,1,onleave) over(partition by employeeid order by date) as prev_ol 
      from cte c 
      ) t 
    ) t 
group by employeeid,grp 
+0

工程就像一个魅力!使用滞后分配组。这很聪明。非常感谢! –