2012-09-05 57 views
7

我有一个包含类别,日期和费率的表。每个类别在不同的日期可以有不同的费率,一个类别在给定的日期只能有一个费率。如何使用MySQL对连续范围进行分组

Id  CatId Date  Rate 
------ ------ ------------ --------- 
000001  12 2009-07-07  1 
000002  12 2009-07-08  1 
000003  12 2009-07-09  1 
000004  12 2009-07-10  2 
000005  12 2009-07-15  1 
000006  12 2009-07-16  1 
000007  13 2009-07-08  1 
000008  13 2009-07-09  1 
000009  14 2009-07-07  2 
000010  14 2009-07-08  1 
000010  14 2009-07-10  1 

唯一索引(CATID,日期,价格) 我想对于每个类别组的所有连续日期范围,只保留开始和范围的结束。 对于前面的例子中,我们将有:

CatId Begin   End   Rate 
------ ------------ ------------ --------- 
12  2009-07-07 2009-07-09  1 
12  2009-07-10 2009-07-10  2 
12  2009-07-15 2009-07-16  1 
13  2009-07-08 2009-07-09  1 
14  2009-07-07 2009-07-07  2 
14  2009-07-08 2009-07-08  1 
14  2009-07-10 2009-07-10  1 

我发现the forum类似的解决方案,它并没有完全放弃的结果

WITH q AS 
     (
     SELECT *, 
       ROW_NUMBER() OVER (PARTITION BY CatId, Rate ORDER BY [Date]) AS rnd, 
       ROW_NUMBER() OVER (PARTITION BY CatId ORDER BY [Date]) AS rn 
     FROM my_table 
     ) 
SELECT CatId AS catidd, MIN([Date]) as beginn, MAX([Date])as endd, Rate 
FROM q 
GROUP BY CatId, rnd - rn, Rate 

查阅SQL FIDDLE 我如何做同样的事情在MySQL ? 请帮忙!

+0

为什么你的例子显示了'(CATID,率)=( 14,1)'当基础表中没有'2009-07-09'时,从'2009-07-08'到'2009-07-10'的结果范围? C.F. (CatId,Rate)=(12,1)',由于它的不连续性,它会产生两个结果范围。 – eggyal

+0

感谢eggyal,现在它已更正 – Fouzi

回答

6

MySQL不支持分析功能,但你可以模拟与user-defined variables这样的行为:

SELECT CatID, Begin, MAX(Date) AS End, Rate 
FROM (
    SELECT my_table.*, 
      @f:=CONVERT(
      IF(@c<=>CatId AND @r<=>Rate AND DATEDIFF(Date, @d)=1, @f, Date), DATE 
      ) AS Begin, 
      @c:=CatId, @d:=Date, @r:=Rate 
    FROM  my_table JOIN (SELECT @c:=NULL) AS init 
    ORDER BY CatId, Rate, Date 
) AS t 
GROUP BY CatID, Begin, Rate 

看到它的sqlfiddle

+0

似乎按预期工作!非常感谢! – Fouzi

+1

'<=>'是什么意思? –

+1

@vanabel:这是MySQL的[NULL-safe等于运算符](http://dev.mysql.com/doc/en/comparison-operators.html#operator_equal-to)。 – eggyal

3
SELECT catid,min(ddate),max(ddate),rate 
FROM (
    SELECT 
     Catid, 
     Ddate, 
     rate, 
     @rn := CASE WHEN (@prev <> rate 
      or DATEDIFF(ddate, @prev_date)>1) THEN @rn+1 ELSE @rn END AS rn, 
     @prev := rate, 
     @prev_id := catid , 
     @prev_date :=ddate 
    FROM (
     SELECT CatID,Ddate,rate 
     FROM rankdate 
     ORDER BY CatID, Ddate) AS a , 
     (SELECT @prev := -1, @rn := 0, @prev_id:=0 ,@prev_date:=-1) AS vars  

) T1 group by catid,rn 

注:线(SELECT @prev:= -1,@Rn:= 0,@prev_id:= 0,@ prev_date:= - 1)AS瓦尔没有必要在MySQL工作区,但它在PHP的mysql_query函数中。

SQL FIDDLE HERE

+0

如果我们删除ID ='000004'的记录您的查询返回(开始:2009-07-07,结束:2009-07-16,比率:1),这是不正确的,因为有一个差距,应该返回(开始:2009-07-07,结束:2009-07-09,费率:1)和(开始:2009-07-15,结束:2009-07-16,费用:1)。 [SQL FIDDLE HERE](http://sqlfiddle.com/#!2/513b2/1) – Fouzi

+0

@BoussahelBachir,我编辑了答案。在这种情况下,需要包括您提到的情况以适应您的情况。 – sel

+0

您似乎没有在任何地方测试'@ prev_id' ...如果具有相同'Rate'的两个连续日期具有不同的'CatId'会发生什么? – eggyal

0

我知道我很晚了,仍然发布了一个适合我的解决方案。 有同样的问题,这里就是我得到了它

发现使用变量

SELECT MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, COUNT(*) cnt 
FROM (
    SELECT @r := @r + (@state != state OR @state IS NULL) AS gn, 
      @state := state AS sn, 
      s.id, s.date, s.state 
    FROM (
      SELECT @r := 0, 
        @state := NULL 
      ) vars, 
      t_range s 
    ORDER BY 
      date, state 
    ) q 
GROUP BY gn 

更多细节在一个很好的解决方案:https://explainextended.com/2009/07/24/mysql-grouping-continuous-ranges/