2017-06-07 116 views
0

我试图通过多个类别跟踪不同的路径。我的表的简化图如下所示:SQL Server row_number()通过分区,但忽略重复的分类值

Table: customer_category 

CustomerID | Category | Date 
11111  | A   | 2016-01-01 
11111  | B   | 2016-02-01 
11111  | C   | 2016-03-01 
22222  | A   | 2016-01-01 
22222  | A   | 2016-02-01 
22222  | A   | 2016-03-01 
22222  | C   | 2016-04-01 
33333  | A   | 2016-01-01 
33333  | B   | 2016-02-01 
33333  | C   | 2016-03-01 
33333  | C   | 2016-04-01 

我可以通过这个查询找到绝对路径:

with cat_order as (
    select CustomerID 
      ,Category 
      ,row_number() over (partition by CustomerID order by Date) as rnk 
    from customer_category 
),pivot as (
    select CustomerID 
     ,max(case when rnk = 1 then Category else null end) as category_1 
     ,max(case when rnk = 2 then Category else null end) as category_2 
     ,max(case when rnk = 3 then Category else null end) as category_3 
     ,max(case when rnk = 4 then Category else null end) as category_4 
    from cat_order 
    group by CustomerID 
) 
select category_1, category_2, category_3, category_4, count(*) as count 
from pivot 
group by category_1, category_2, category_3, category_4 

;

这使我有以下几点:

category_1 | category_2 | category_3 | category_4 | count 
A   | B   | C   |    | 1 
A   | A   | A   | C   | 1 
A   | B   | C   | C   | 1 

我想要什么,虽然是忽略重复的类别,这样我就看到

category_1 | category_2 | category_3 | category_4 | count 
A   | B   | C   |    | 2 
A   | C   |    |    | 1 

在我的头上,我想我会需要到

  1. 省略任何记录,其中类别=滞后(类别)
  2. 排名在分区...
  3. 支点与case语句
  4. 汇总结果

感觉方式过于复杂。有没有更简单的方法来做到这一点?

+0

你是什么意思忽略重复类别..所有1,2,3,4?在你的结果中,你从category2中得到了一个c,但是基础没有。 –

+0

当我说'重复类别'时,我正在研究消费者22222是如何经历AAA C序列的。我不关心他们是否属于A类中的三种不同测量,只是它们是A,然后是C (没有通过B类),而另外两个从A→B→C进展 –

回答

0

就我所知(根据您的数据和您想要的输出),没有一种简单的方法可以做到这一点。为了得到你想要的结果,你基本上需要完成你列出的四个步骤(或者它的一些变化)。尽管如此,你可以通过一种不需要CTE的方式来“简化”它。例如:

SELECT category_1 = P.[1], 
     category_2 = P.[2], 
     category_3 = P.[3], 
     category_4 = P.[4], 
     [Count] = COUNT(*) 
FROM 
(
    SELECT CustomerID, 
      Category, 
      rnk = SUM(checkprev) OVER (PARTITION BY CustomerID ORDER BY [Date]) 
    FROM 
    (
     SELECT *, checkprev = CASE WHEN LAG(Category) OVER (PARTITION BY CustomerID ORDER BY [Date]) = Category THEN 0 ELSE 1 END 
     FROM customer_category 
    ) T 
) AS T 
PIVOT 
(
    MAX(Category) FOR rnk IN ([1], [2], [3], [4]) 
) AS P 
GROUP BY P.[1], P.[2], P.[3], P.[4];