2010-05-18 86 views
2

我有一个大型的电子邮件发送数据集和状态代码。获取最新的行,按列分组

ID Recipient   Date  Status 
1 [email protected] 01/01/2010  1 
2 [email protected] 02/01/2010  1 
3 [email protected] 01/01/2010  1 
4 [email protected] 02/01/2010  2 
5 [email protected] 03/01/2010  1 
6 [email protected] 01/01/2010  1 
7 [email protected] 02/01/2010  2 

在这个例子中:

  • 发送到有人所有邮件都发送到他们
  • 中间电子邮件(按日期)的状态有一个状态,但最新的是
  • 发送到别人的最后一封电子邮件有

状态我需要找回被发送到每个人的所有电子邮件的数量,以及什么最新状态代码了。

第一部分是相当简单:

SELECT Recipient, Count(*) EmailCount 
FROM Messages 
GROUP BY Recipient 
ORDER BY Recipient 

这给了我:

Recipient   EmailCount 
[email protected] 2 
[email protected] 3 
[email protected] 2 

我怎样才能获得最新的状态代码呢?

最终的结果应该是:

Recipient   EmailCount LastStatus 
[email protected]   2   1 
[email protected]    3   1 
[email protected]   2   2 

感谢。

(服务器是Microsoft SQL Server 2008中,查询是通过一个OleDbConnection在.NET平台上运行)

+1

是否可以同时收到多个电子邮件?您想如何处理两封电子邮件具有相同日期但状态不同的情况? – 2010-05-18 16:18:28

+0

时间戳实际上是这个足够高的分辨率不会是一个问题,即使是如此,“无论SQL将其ORDER BY回报”是不够好。 – Cylindric 2010-05-18 16:44:27

回答

4

这是一个“每组最大”的一个例子查询。我认为通过将其分解成两个子查询并加入结果是最容易理解的。

第一个子查询就是你已经拥有的。

第二子查询使用窗函数ROW_NUMBER与数每个收件人的电子邮件从1开始对最近,则2,3,等...

从所述第一查询然后与接合结果来自第二个查询的行号为1的结果,即最近的。这样做可以保证在有关系的情况下,每个收件人只能得到一行。

下面是该查询:

SELECT T1.Recipient, T1.EmailCount, T2.Status FROM 
(
    SELECT Recipient, COUNT(*) AS EmailCount 
    FROM Messages 
    GROUP BY Recipient 
) T1 
JOIN 
(
    SELECT 
     Recipient, 
     Status, 
     ROW_NUMBER() OVER (PARTITION BY Recipient ORDER BY Date Desc) AS rn 
    FROM Messages 
) T2 
ON T1.Recipient = T2.Recipient AND T2.rn = 1 

这得出以下结果:

Recipient   EmailCount Status 
[email protected] 2   2  
[email protected] 2   1  
[email protected]  3   1  
+0

非常好!非常感谢你。 – Cylindric 2010-05-18 17:06:57

0

您可以使用排序功能这一点。喜欢的东西(未测试):

WITH MyResults AS 
(
    SELECT Recipient, Status, ROW_NUMBER() OVER(Recipient ORDER BY ( [date] DESC)) AS [row_number] 
    FROM Messages 
) 
SELECT MyResults.Recipient, MyCounts.EmailCount, MyResults.Status 
FROM (
    SELECT Recipient, Count(*) EmailCount 
    FROM Messages 
    GROUP BY Recipient 
) MyCounts 
INNER JOIN MyResults 
ON MyCounts.Recipient = MyResults.Recipient 
WHERE MyResults.[row_number] = 1 
2

这不是很漂亮,但我可能只是用了几个子查询的:

SELECT Recipient, 
    COUNT(*) EmailCount, 
    (SELECT Status 
    FROM Messages M2 
    WHERE Recipient = M.Recipient 
     AND Date = (SELECT MAX(Date) 
        FROM Messages 
        WHERE Recipient = M2.Recipient)) 
FROM Messages M 
GROUP BY Recipient 
ORDER BY Recipient 
2
SELECT 
    M.Recipient, 
    C.EmailCount, 
    M.Status 
FROM 
    (
    SELECT Recipient, Count(*) EmailCount 
    FROM Messages 
    GROUP BY Recipient 
    ) C 
    JOIN 
    (
    SELECT Recipient, MAX(Date) AS LastDate 
    FROM Messages 
    GROUP BY Recipient 
    ) MD ON C.Recipient = MD.Recipient 
    JOIN 
    Messages M ON MD.Recipient = M.Recipient AND MD.LastDate = M.Date 
ORDER BY 
    Recipient 

我发现聚集大多规模更好,然后排名函数

+0

+1我的经验也是。为了减少可读性,提高性能:排序功能 - >聚集 - >交叉应用与CTE。 – Andomar 2010-05-18 16:59:27

1

,你不能轻易这是否是单个查询,因为count(*)是一个组函数,而最新的状态来自一个sp ecific排。以下是查询以获取每个用户的最新状态:

SELECT M.Recipient, M.Status FROM Messages M 
WHERE M.Date = (SELECT MAX(SUB.Date) FROM MESSAGES SUB 
    WHERE SUB.Recipient = M.Recipient)