识别 - 优文库

我花了相当长的一段时间处理有N组的边界如下：识别

假设你有ň与多个记录每一个记录都有独特starting和ending点组的数目。

换句话说：

ID|GroupName|StartingPoint|EndingPoint|seq(row_number)|desired_seq 
__|_________|_____________|___________|_______________|____________ 
1 | Grp1 |2014-01-06 |2014-01-07 |1    |1 
__|_________|_____________|___________|_______________|____________ 
2 | Grp1 |2014-01-07 | 2014-01-08|2    |2 
__|_________|_____________|___________|_______________|____________ 
3 | Grp2 |2014-01-08 | 2014-01-09|1    |1 
__|_________|_____________|___________|_______________|____________ 
4 | Grp1 |2014-01-09 | 2014-01-10|3    |1 
__|_________|_____________|___________|_______________|____________ 
5 | Grp2 |2014-01-10 | 2014-01-11|2    |1 
__|_________|_____________|___________|_______________|____________

正如你所看到的，starting point每一个连续的记录是相同以前的ending point。

基本上，我想根据日期为每个组获得minimumS and maximumS。一旦出现带有新组名称的记录，则将其视为新组并重置排序。

单row_number()功能不是此任务足以因为它不反映在组名称的变化。（我已经包含在采样数据一SEQ列表示由行数所产生的值）

期望结果根据样本数据：

1 Grp1 |2014-01-06 | 2014-01-08 
2 Grp2 |2014-01-08 | 2014-01-09 
3 Grp1 |2014-01-09 | 2014-01-10 
4 Grp2 |2014-01-10 | 2014-01-11

我曾尝试：

;with cte as(
select * 
, row_number() over (partition by GroupName order by startingpoint) as seq 
from table1 
) 
select * 
into #temp2 
from cte t1 
left join cte t2 on t1.id=t2.id and t1.seq= t2.seq-1 

select * 
,(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MIN(seq) from #temp2) as Oldest 
(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MAX(seq) from #temp2) as MostRecent 
from #temp2 t1

来源

2014-01-09 Kiril Rusev

从表格判断，似乎你可以使用'MIN'和'MAX'，除非我失去了一些东西。 – Zane

这是一个gaps-and-islands问题亚组。诀窍是按两个ROW_NUMBER（）值之差进行分组，一个分区和一个未分区。

WITH t AS (
    SELECT 
    GroupName, 
    StartingPoint, 
    EndingPoint, 
    ROW_NUMBER() OVER(PARTITION BY GroupName ORDER BY StartingPoint) 
     - ROW_NUMBER() OVER(ORDER BY StartingPoint) AS SubGroupId 
    FROM #test 
) 
SELECT 
    ROW_NUMBER() OVER (ORDER BY MIN(StartingPoint)) AS SortOrderId, 
    GroupName          AS GroupName, 
    MIN(StartingPoint)        AS GroupStartingPoint, 
    MAX(EndingPoint)        AS GroupEndingPoint 
FROM t 
GROUP BY GroupName, SubGroupId 
ORDER BY SortOrderId

来源

2014-01-09 18:18:18 Anon

不知道，但也许：

SELECT DISTINCT 
    GroupName, 
    MIN(StartingPoint) OVER (PARTITION BY GroupName ORDER BY Id), 
    MAX(EndingPoint) OVER (PARTITION BY GroupName ORDER BY Id) 
FROM table1

因为partition不会导致会出现原本复制的行数项，这与distinct去除的减少。

来源

2014-01-09 15:42:23

这是所以用SQL Server 2012中的lag()功能要容易得多。我处理这些问题的方法是找到组的起始位置，为每行分配一个1或0的标志。然后累计总和1 s以获得新的组ID。

在SQL Server 2008中，您可以用相关子查询做到这一点（或连接）：

with table1_flag as (
     select t1.*, 
      isnull((select top 1 1 
        from table1 t2 
        where t2.groupname = t1.groupname and 
          t2.endingpoint = t1.startingpoint 
        ), 0) as groupstartflag 
     from table1 t1 
    ), 
    table1_flag_cum as (
     select tf.*, 
      (select sum(groupstartflag) 
       from table1_flag tf2 
       where tf2.groupname = tf.groupname and 
        tf2.startingpoint <= tf.startingpoint 
      ) as groupnum 
     from table1_flag tf 
    ) 
select groupnum, groupname, 
     min(startingpoint) as startingpoint, max(endingpoint) as endingpoint 
from table1_flag_cum 
group by groupnum, groupname;

来源

2014-01-09 15:48:04

感谢您的帮助。我测试了查询[SQLFiddle]（http://sqlfiddle.com/#!3/87a45/2），但无法根据我的要求对其进行调整。您的查询返回Grp1的07-10和Grp2的08-11，这意味着grps2包含在grp1 –

@Kiril中。。。它包括每个比较中的'groupname'，包括最后的'group by'。这些小组不应该互相干扰。 –

嗯。这是我所期望的，但是，我仍然在与同一日期相关联的多个组进行操作。 –

识别

回答

相关问题