2013-04-15 140 views
2

一个简单的表:SQL返回连续记录

ForumPost 
-------------- 
ID (int PK) 
UserID (int FK) 
Date (datetime) 

我正在寻找回了多少次特定用户取得了连续ň天每天至少有1篇文章。

例子:

User 15844 has posted at least 1 post a day for 30 consecutive days 10 times 

我已经标记为LINQ /λ这个问题以及解决方案,也将很大。我知道我可以通过迭代所有用户记录来解决这个问题,但这很慢。

+1

您正在使用哪些DBMS? Postgres的?甲骨文? –

+0

SQL Server 2008 r2 –

+0

使用子查询处理日期范围为30天前的所有帖子,按日期和次数分组..检查是否30? –

回答

4

有可以使用使用ROW_NUMBER()找到连续的条目,想象下面的一组日期,与他们的ROW_NUMBER(从0开始)一个方便的技巧:

Date  RowNumber 
20130401 0 
20130402 1 
20130403 2 
20130404 3 
20130406 4 
20130407 5 

因为如果你减去ROW_NUMBER连续的条目从价值你得到相同的结果。例如

Date  RowNumber date - row_number 
20130401 0   20130401 
20130402 1   20130401 
20130403 2   20130401 
20130404 3   20130401 
20130406 4   20130402 
20130407 5   20130402 

然后,您可以date - row_number组获得的连续天集(即第4条记录,而最后2个记录)。

要应用到您的示例,你可以使用:

WITH Posts AS 
( SELECT FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]), 
      UserID, 
      Date 
    FROM ( SELECT DISTINCT UserID, [Date] = CAST(Date AS [Date]) 
       FROM ForumPost 
      ) fp 
), Posts2 AS 
( SELECT FirstPost, 
      UserID, 
      Days = COUNT(*), 
      LastDate = MAX(Date) 
    FROM Posts 
    GROUP BY FirstPost, UserID 
) 
SELECT UserID, ConsecutiveDates = MAX(Days) 
FROM Posts2 
GROUP BY UserID; 

Example on SQL Fiddle (simple with just most consecutive days per user)

Further example to show how to get all consecutive periods

编辑

我不觉得上面相当回答了这个问题,这会给的次数的用户已张贴,或者在N个连续天:

WITH Posts AS 
( SELECT FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]), 
      UserID, 
      Date 
    FROM ( SELECT DISTINCT UserID, [Date] = CAST(Date AS [Date]) 
       FROM ForumPost 
      ) fp 
), Posts2 AS 
( SELECT FirstPost, 
      UserID, 
      Days = COUNT(*), 
      FirstDate = MIN(Date), 
      LastDate = MAX(Date) 
    FROM Posts 
    GROUP BY FirstPost, UserID 
) 
SELECT UserID, [Times Over N Days] = COUNT(*) 
FROM Posts2 
WHERE Days >= 30 
GROUP BY UserID; 

Example on SQL Fiddle

1

您的特定应用使这很简单,我认为。如果在'n'天的时间间隔内有'n'个不同的日期,那么'n'个不同的日期必须是才是连续的。

滚动到底部寻找只需要通用表格表达式并转换为PostgreSQL的通用解决方案。 (开玩笑,我在PostgreSQL中实现,因为我时间不够)。

create table ForumPost (
    ID integer primary key, 
    UserID integer not null, 
    post_date date not null 
); 

insert into forumpost values 
(1, 1, '2013-01-15'), 
(2, 1, '2013-01-16'), 
(3, 1, '2013-01-17'), 
(4, 1, '2013-01-18'), 
(5, 1, '2013-01-19'), 
(6, 1, '2013-01-20'), 
(7, 1, '2013-01-21'), 

(11, 2, '2013-01-15'), 
(12, 2, '2013-01-16'), 
(13, 2, '2013-01-17'), 
(16, 2, '2013-01-17'), 
(14, 2, '2013-01-18'), 
(15, 2, '2013-01-19'), 

(21, 3, '2013-01-17'), 
(22, 3, '2013-01-17'), 
(23, 3, '2013-01-17'), 
(24, 3, '2013-01-17'), 
(25, 3, '2013-01-17'), 
(26, 3, '2013-01-17'), 
(27, 3, '2013-01-17'); 

现在,让我们看看这个查询的输出。为简洁起见,我正在查看5天的时间间隔,而不是30天的时间间隔。

select userid, count(distinct post_date) distinct_dates 
from forumpost 
where post_date between '2013-01-15' and '2013-01-19' 
group by userid; 

USERID DISTINCT_DATES 
1  5 
2  5 
3  1 

对于符合条件的用户,该5天间隔内不同日期的数量必须为5,对不对?所以我们只需要将该逻辑添加到HAVING子句中。

select userid, count(distinct post_date) distinct_dates 
from forumpost 
where post_date between '2013-01-15' and '2013-01-19' 
group by userid 
having count(distinct post_date) = 5; 

USERID DISTINCT_DATES 
1  5 
2  5 

一个更通用的解决方案

它并没有真正意义地说,如果从2013-01-01每天发布到2013年1月31日,你已连续30天发布2次。相反,我希望时间在2013-01-31重新开始。我在PostgreSQL中实现道歉;我会稍后尝试在T-SQL中实现。

with first_posts as (
    select userid, min(post_date) first_post_date 
    from forumpost 
    group by userid 
), 
period_intervals as (
    select userid, first_post_date period_start, 
     (first_post_date + interval '4' day)::date period_end 
    from first_posts 
), user_specific_intervals as (
    select 
    userid, 
    (period_start + (n || ' days')::interval)::date as period_start, 
    (period_end + (n || ' days')::interval)::date as period_end 
    from period_intervals, generate_series(0, 30, 5) n 
) 
select userid, period_start, period_end, 
     (select count(distinct post_date) 
     from forumpost 
     where forumpost.post_date between period_start and period_end 
      and userid = forumpost.userid) distinct_dates 
from user_specific_intervals 
order by userid, period_start;