这里是什么可能是一个 “足够好” 的方法解决问题。
沿着三个维度的每一个维度,找到该维度的最小行ID(具有特殊的NULL处理)。总体客户标识符是这三个ID中的最小值。为了使其顺序无间隙地使用,请使用dense_rank()
。
with ids as (
select t.*,
(case when SSN is not null
then min(RowId) over (partition by SSN)
end) as SSN_id,
(case when License is not null
then min(RowId) over (partition by License)
end) as License_id,
(case when SystemId is not null
then min(RowId) over (partition by SystemId)
end)as SystemId_id
from t
),
leastid as (
select ids.*,
(case when SSN_Id <= coalesce(License_Id, SSN_Id) and
SSN_Id <= coalesce(SystemId_id, SSN_Id)
then SSN_Id
when License_Id <= coalesce(SystemId_id, License_Id)
then License_Id
else SystemId_id
end) as LeastId
from ids
)
select Source, RowID, SSN, License, SystemID,
dense_rank(LeastId) over (order by LeastId) as MapCustomerId
from LeastIds;
这不是一个完整的解决方案,但它适用于您的数据。它在下列情况下不起作用:
A |1 |SSN1|Lic111 | |1
A |2 |SSN1| |Sys666 |2
A |3 | | |Sys666 |2
因为这需要两个“跳跃”。
当我在过去遇到过这种情况时,我在表格中创建了额外的列,并重复使用update
以获得不同维度上的最小ID。这种迭代很快连接不同的部分。写一个递归CTE来做同样的事情可能是可能的。但是,上面更简单的解决方案可能会解决您的问题。
编辑:
因为我之前遇到过这个问题,我想拿出一个查询的解决方案(而不是通过更新迭代)。这可以使用递归CTE。这里是似乎工作的代码:
with t as (
select 'A' as source, 1 as RowId, 'SSN1' as SSN, 'Lic111' as License, 'ABC' as SystemId union all
select 'A', 2, 'SSN1', NULL, 'Sys666' union all
select 'A', 3, NULL, NULL, 'Sys666' union all
select 'A', 4, NULL, 'Lic222', 'Sys666' union all
select 'A', 5, NULL, 'Lic222', NULL union all
select 'A', 6, NULL, 'Lic444', NULL
),
first as (
select t.*,
(select min(RowId)
from t t2
where t2.SSN = t.SSN or
t2.License = t.License or
t2.SystemId = t.SystemId
) as minrowid
from t
),
cte as (
select rowid, minrowid
from first
union all
select cte.rowid, first.minrowid
from cte join
first
on cte.minrowid = first.rowid and
cte.minrowid > first.minrowid
),
lookup as (
select rowid, min(minrowid) as minrowid,
dense_rank() over (order by min(minrowid)) as MapCustomerId
from cte
group by rowid
)
select t.*, lookup.MapCustomerId
from t join
lookup
on t.rowid = lookup.rowid;
谢谢戈登。这给了我一个很好的起点。我确实有你提到的有关“两跳”甚至更多的情况。你能多解释一下吗?我不确定你的意思是把最小的id放在不同的维度上。我会将最小ID与MapCustID列进行比较吗?非常感谢您的帮助。 – user2793572