2011-10-26 38 views
1

我正在尝试构建一个查询来分析我们的时间跟踪系统中的数据。每次用户滑入或滑出时,都会记录滑动时间和On或Off站点(进入或退出)。在用户'Joe Bloggs'的情况下,有4行,我想要配对并计算Joe Bloggs在网站上花费的总时间。SQL Server在不同行之间找到datediff,总和

问题是有些记录不容易配对。在给出的例子中,第二个用户有两个连续的'on',我需要找到一个方法来忽略重复的'on'或'off'行。

ID | Time     |OnOffSite| UserName 
------------------------------------------------------ 
123 | 2011-10-25 09:00:00.000 | on  | Bloggs Joe | 
124 | 2011-10-25 12:00:00.000 | off  | Bloggs Joe | 
125 | 2011-10-25 13:00:00.000 | on  | Bloggs Joe | 
126 | 2011-10-25 17:00:00.000 | off  | Bloggs Joe | 
127 | 2011-10-25 09:00:00.000 | on  | Jonesy Ian | 
128 | 2011-10-25 10:00:00.000 | on  | Jonesy Ian | 
129 | 2011-10-25 11:00:00.000 | off  | Jonesy Ian | 
130 | 2011-10-25 12:00:00.000 | on  | Jonesy Ian | 
131 | 2011-10-25 15:00:00.000 | off  | Jonesy Ian | 

我的系统是MS SQL 2005.查询的报告期限为每月。

任何人都可以提出解决方案吗?我的数据已按照用户名和时间分组在一张表中,ID字段为Identity。

+2

对于琼西伊恩,您希望放弃哪个'on'? –

+0

'每次用户滑入'你的意思是'每次用户认证'? – npclaudiu

+0

我想放弃第二个'开',是的通过刷卡我的意思是验证。感谢迄今为止的答案:)我今天会尝试测试。 – MarcKirby

回答

3
-- ===================== 
-- sample data 
-- ===================== 
declare @t table 
(
    ID int, 
    Time datetime, 
    OnOffSite varchar(3), 
    UserName varchar(50) 
) 

insert into @t values(123, '2011-10-25 09:00:00.000', 'on', 'Bloggs Joe') 
insert into @t values(124, '2011-10-25 12:00:00.000', 'off', 'Bloggs Joe') 
insert into @t values(125, '2011-10-25 13:00:00.000', 'on', 'Bloggs Joe') 
insert into @t values(126, '2011-10-25 17:00:00.000', 'off', 'Bloggs Joe') 
insert into @t values(127, '2011-10-25 09:00:00.000', 'on', 'Jonesy Ian') 
insert into @t values(128, '2011-10-25 10:00:00.000', 'on', 'Jonesy Ian') 
insert into @t values(129, '2011-10-25 11:00:00.000', 'off', 'Jonesy Ian') 
insert into @t values(130, '2011-10-25 12:00:00.000', 'on', 'Jonesy Ian') 
insert into @t values(131, '2011-10-25 15:00:00.000', 'off', 'Jonesy Ian') 

-- ===================== 
-- solution 
-- ===================== 
select 
    UserName, timeon, timeoff, diffinhours = DATEDIFF(hh, timeon, timeoff) 
from 
(
    select 
     UserName, 
     timeon = max(case when k = 2 and OnOffSite = 'on' then Time end), 
     timeoff = max(case when k = 1 and OnOffSite = 'off' then Time end) 
    from 
    (
     select 
      ID, 
      UserName, 
      OnOffSite, 
      Time, 
      rn = ROW_NUMBER() over(partition by username order by id) 
     from 
     (
      select 
       ID, 
       UserName, 
       OnOffSite, 
       Time, 
       rn2 = case OnOffSite 
       -- '(..order by id)' takes earliest 'on' in the sequence of 'on's 
       -- to take the latest use '(...order by id desc)' 
       when 'on' then 
        ROW_NUMBER() over(partition by UserName, OnOffSite, rn1 order by id) 
       -- '(... order by id desc)' takes the latest 'off' in the sequence of 'off's 
       -- to take the earliest use '(...order by id)' 
       when 'off' then 
        ROW_NUMBER() over(partition by UserName, OnOffSite, rn1 order by id desc) 
       end, 
       rn1 
      from 
      (
       select 
        *, 
        rn1 = ROW_NUMBER() over(partition by username order by id) + 
         ROW_NUMBER() over(partition by username, onoffsite order by id desc) 
       from @t 
      ) t 
     ) t 
     where rn2 = 1 
    ) t1 
    cross join 
    (
     select k = 1 union select k = 2 
    ) t2 
    group by UserName, rn + k 
) t 
where timeon is not null or timeoff is not null 
order by username 
+0

这个答案是正确的,并且与我的数据一起工作得很好。有一个T-SQL Master,他的名字是Alexey!非常感谢。 – MarcKirby

+0

+1。我的解决方案几乎是一样的。我从一开始就一直使用排名,我可以看到,在某个时候,您也将中间分组更改为排名。唯一的区别就是我如何获得'timeon'和'timeoff':我使用了一个自连接,在这种情况下我认为这比在你的答案中使用的'max(case ...)'更糟糕。无论如何,这个工作很好,所以......做得好! :) –

0

首先,您需要与业务方谈判并决定一组匹配规则。

之后,我建议你添加一个状态字段到你记录每行的状态(匹配,不匹配,删除等)的表中。无论何时添加一行,您都应该尝试将其匹配成一对。成功的匹配将两行的状态设置为匹配,否则新行将无法匹配。