2016-04-06 39 views
6

我有一个客户,用户和收入相似的表到以下(在现实中成千上万的记录):选择超过总值的百分比行的一个子集

Customer User Revenue 
001  James 500 
002  James 750 
003  James 450 
004  Sarah 100 
005  Sarah 500 
006  Sarah 150 
007  Sarah 600 
008  James 150 
009  James 100 

我想要做的是仅返回占用户总收入80%的最高消费客户。

要手动为此,我将下令詹姆斯的客户通过他们的收入,计算出的总百分比和一个正在运行的总百分比,然后只返回最多记录点正在运行的总点击数的80%:

Customer User Revenue  % of total Running Total % 
002   James 750   0.38  0.38 
001   James 500   0.26  0.64 
003   James 450   0.23  0.87 <- Greater than 80%, last record 
008   James 150   0.08  0.95 
009   James 100   0.05  1.00 

我试过使用CTE,但到目前为止都出现了空白。有没有办法通过单个查询来完成此操作,而不是在Excel工作表中手动执行此操作?

回答

6

SQL Server 2012+只有

你可以使用窗口SUM

WITH cte AS 
(
    SELECT *, 
      1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY [User]) AS percentile, 
      1.0 * SUM(Revenue) OVER(PARTITION BY [User] ORDER BY [Revenue] DESC) 
       /SUM(Revenue) OVER(PARTITION BY [User]) AS running_percentile 
    FROM tab 
) 
SELECT * 
FROM cte 
WHERE running_percentile <= 0.8; 

LiveDemo


的SQL Server 2008:

WITH cte AS 
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn 
    FROM t  
), cte2 AS 
(
    SELECT c.Customer, c.[User], c.[Revenue] 
      ,percentile   = 1.0 * Revenue/NULLIF(c3.s,0) 
      ,running_percentile = 1.0 * c2.s /NULLIF(c3.s,0) 
    FROM cte c 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User] 
      AND c2.rn <= c.rn) c2 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User]) AS c3 
) 
SELECT * 
FROM cte2 
WHERE running_percentile <= 0.8; 

LiveDemo2

输出:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗ 
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║ 
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣ 
║  2 ║ James ║  750 ║ 0,384615384615 ║ 0,384615384615  ║ 
║  1 ║ James ║  500 ║ 0,256410256410 ║ 0,641025641025  ║ 
║  7 ║ Sarah ║  600 ║ 0,444444444444 ║ 0,444444444444  ║ 
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝ 

编辑2:

那看起来差不多是这样,唯一的小鬼是它缺少最后一排, 詹姆斯的第三排需要他超过0.80,但需要包括在内。

WITH cte AS 
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn 
    FROM t  
), cte2 AS 
(
    SELECT c.Customer, c.[User], c.[Revenue] 
      ,percentile   = 1.0 * Revenue/NULLIF(c3.s,0) 
      ,running_percentile = 1.0 * c2.s /NULLIF(c3.s,0) 
    FROM cte c 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User] 
      AND c2.rn <= c.rn) c2 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User]) AS c3 
) 
SELECT a.* 
FROM cte2 a 
CROSS APPLY (SELECT MIN(running_percentile) AS rp 
      FROM cte2 
      WHERE running_percentile >= 0.8 
       AND cte2.[User] = a.[User]) AS s 
WHERE a.running_percentile <= s.rp; 

LiveDemo3

输出:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗ 
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║ 
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣ 
║  2 ║ James ║  750 ║ 0,384615384615 ║ 0,384615384615  ║ 
║  1 ║ James ║  500 ║ 0,256410256410 ║ 0,641025641025  ║ 
║  3 ║ James ║  450 ║ 0,230769230769 ║ 0,871794871794  ║ 
║  7 ║ Sarah ║  600 ║ 0,444444444444 ║ 0,444444444444  ║ 
║  5 ║ Sarah ║  500 ║ 0,370370370370 ║ 0,814814814814  ║ 
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝ 

看起来是完美的,翻译成我的大桌子和RET呃,我需要什么,花了好5分钟,通过它,仍然不能遵循你所做的!

SQL Server 2008不支持一切OVER()条款,但ROW_NUMBER一样。

第一CTE只是计算一组中的位置:

╔═══════════╦════════╦══════════╦════╗ 
║ Customer ║ User ║ Revenue ║ rn ║ 
╠═══════════╬════════╬══════════╬════╣ 
║  2 ║ James ║  750 ║ 1 ║ 
║  1 ║ James ║  500 ║ 2 ║ 
║  3 ║ James ║  450 ║ 3 ║ 
║  8 ║ James ║  150 ║ 4 ║ 
║  9 ║ James ║  100 ║ 5 ║ 
║  7 ║ Sarah ║  600 ║ 1 ║ 
║  5 ║ Sarah ║  500 ║ 2 ║ 
║  6 ║ Sarah ║  150 ║ 3 ║ 
║  4 ║ Sarah ║  100 ║ 4 ║ 
╚═══════════╩════════╩══════════╩════╝ 

第二CTE:

  • c2ROW_NUMBER
  • c3运行基于秩总子查询计算每位使用者
  • 计算满总和

在最终查询s子查询中查找总计超过80%的最低running

EDIT 3:

使用ROW_NUMBER实际上是冗余的。

WITH cte AS 
(
    SELECT c.Customer, c.[User], c.[Revenue] 
      ,percentile   = 1.0 * Revenue/NULLIF(c3.s,0) 
      ,running_percentile = 1.0 * c2.s /NULLIF(c3.s,0) 
    FROM t c 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM t c2 
      WHERE c.[User] = c2.[User] 
      AND c2.Revenue >= c.Revenue) c2 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM t c2 
      WHERE c.[User] = c2.[User]) AS c3 
) 
SELECT a.* 
FROM cte a 
CROSS APPLY (SELECT MIN(running_percentile) AS rp 
      FROM cte c2 
      WHERE running_percentile >= 0.8 
       AND c2.[User] = a.[User]) AS s 
WHERE a.running_percentile <= s.rp 
ORDER BY [User], Revenue DESC; 

LiveDemo4

+1

@bendataclear请参阅更新 – lad2025

+0

看起来接近那里,唯一的缺点是它缺少最后一排,詹姆斯的第三排让他超过0.80,但需要包括在内。如果这不可能,但这不是灾难。 – bendataclear

+1

@bendataclear添加了:) – lad2025

0

在SQL Server 2012+,你会使用累积总和 - 高效得多。在SQL Server 2008中,你可以使用相关子查询或cross apply做到这一点:

select t.*, 
     sum(t.Revenue*1.0)/sum(t.Revenue) over (partition by user) as [% of Total], 
     sum(RunningRevenue*1.0)/sum(t.Revenue) over (partition by user) as [Running Total %] 
from t cross apply 
    (select sum(Revenue) as RunningRevenue 
     from t t2 
     where t2.Revenue >= t.Revenue and t2.user = t.user 
    ) t2; 

注:*1.0是以防万一Revenue存储为一个整数。 SQL Server会执行整数除法,这将在几乎所有行上为两列返回0

编辑:

添加where user = 'James',如果你想只对詹姆斯的结果。

+0

'[Total of Total]]列似乎有效,但只对单个用户而言,运行总数似乎已遍布整个地方。 – bendataclear

+0

@bendataclear。 。 。你原来的问题只有一个用户。对单个用户的总计运行进行调整是微不足道的。比小伙子的答案简单得多。 –

+0

't.Revenue'周围的第一个'sum'是没有必要的。它不会工作,因为没有“GROUP BY”(或者我错过了某些东西)。第二'用户'应该引用'[用户]'否则你会得到错误。第三:'SUM OVER()'计算每个整体的百分比而不是每个用户的百分比。并没有过滤。 – lad2025