在SQL Server 2008中优化树分支数据聚合（递归）

我有一个表包含某些项目的阶段和子阶段，以及一个包含特定任务和估计成本的表。
我需要一些方法来聚合每个级别（阶段/子阶段），看看它需要多少成本，但要以最低的性能成本来完成。在SQL Server 2008中优化树分支数据聚合（递归）

为了说明这一点，我将使用如下的数据结构：

CREATE TABLE stage 
(
    id int not null, 
    fk_parent int 
) 

CREATE TABLE task 
(
    id int not null, 
    fk_stage int not null, 
    cost decimal(18,2) not null default 0 
)

具有下列数据：

==stage== 
id fk_parent 
1 null 
2 1 
3 1 

==task== 
id fk_stage cost 
1 2   100 
1 2   200 
1 3   600

欲得到含有在每个分支上的总成本的表格。像这样的东西：

Stage ID  Total Cost 
1    900 
2    300 
3    600

但是，我也希望它是生产力。我不想最终得到像The worst algorithm in the world这样的非常糟糕的解决方案。我的意思是这样。如果我要求stage表中所有项目的数据（总成本），则每个总成本将被评估D次，其中D是它所在的树（级别）的深度。恐怕我会在很多级别的大量数据中表现非常低的表现。

SO，

，我决定做一件令我在这里问这个问题。
我决定再添加2列到stage表中进行缓存。

... 
calculated_cost decimal(18,2), 
date_calculated_cost datetime 
...

所以我想要做的是代码内通过另一个变量，一个datetime值，当这个过程开始（几乎是唯一的）它等于时间。这样，如果stage行已经有一个date_calculated_cost等于我正在携带的那一行，我不打算再次计算它，只返回calculated_cost值。

我无法功能做到这一点（更新需要的stage表，一旦成本计算）
我不能与程序做到这一点（运行游标内递归是一个不走）
我我不知道临时表是合适的，因为它不会允许并发请求到相同的程序（这是最不可能的，但无论如何，我想这样做正确的方式）
我找不出其他方法。

我不期待对我的问题有一个明确的答案，但我会奖励任何好主意，并且最好的将被选作答案。

来源

2011-08-10 AlexanderMP

1.一种查询表格以获得聚合成本的方法。

计算每个阶段的成本。
使用递归CTE来获取每个阶段的级别。
将结果存储在临时表中。
将几个索引添加到临时表中。
更新在一个循环中的临时表的费用为每个级别

前三个步骤被合并到一个语句。执行第一次计算cteCost可能会对性能有好处，因为它是自己的临时表，并在递归cteLevel中使用该临时表。

;with cteCost as 
(
    select s.id, 
     s.fk_parent, 
     isnull(sum(t.cost), 0) as cost 
    from stage as s 
    left outer join task as t 
     on s.id = t.fk_stage 
    group by s.id, s.fk_parent 
), 
cteLevel as 
(
    select cc.id, 
     cc.fk_parent, 
     cc.cost, 
     1 as lvl 
    from cteCost as cc 
    where cc.fk_parent is null 
    union all 
    select cc.id, 
     cc.fk_parent, 
     cc.cost, 
     lvl+1 
    from cteCost as cc 
    inner join cteLevel as cl 
     on cc.fk_parent = cl.id  
) 
select * 
into #task 
from cteLevel 

create clustered index IX_id on #task (id) 
create index IX_lvl on #task (lvl, fk_parent) 

declare @lvl int 
select @lvl = max(lvl) 
from #task 

while @lvl > 0 
begin 

    update T1 set 
    T1.cost = T1.cost + T2.cost 
    from #task as T1 
    inner join (select fk_parent, sum(cost) as cost 
       from #task 
       where lvl = @lvl 
       group by fk_parent) as T2 
     on T1.id = T2.fk_parent 

    set @lvl = @lvl - 1 
end 

select id as [Stage ID], 
     cost as [Total Cost] 
from #task 

drop table #task

2.表task一个触发器，它在stage保持calculated_cost字段。

create trigger tr_task 
on task 
after insert, update, delete 
as 
    -- Table to hold the updates 
    declare @T table 
    (
    id int not null, 
    cost decimal(18,2) not null default 0 
) 

    -- Get the updates from inserted and deleted tables 
    insert into @T (id, cost) 
    select fk_stage, sum(cost) 
    from (
      select fk_stage, cost 
      from inserted 
      union all 
      select fk_stage, -cost 
      from deleted 
     ) as T 
    group by fk_stage 

    declare @id int 
    select @id = min(id) 
    from @T 

    -- For each updated row 
    while @id is not null 
    begin 

    -- Recursive update of stage 
    with cte as 
    (
     select s.id, 
      s.fk_parent 
     from stage as s 
     where id = @id 
     union all 
     select s.id, 
      s.fk_parent 
     from stage as s 
     inner join cte as c 
      on s.id = c.fk_parent  
    ) 
    update s set 
     calculated_cost = s.calculated_cost + t.cost 
    from stage as s 
     inner join cte as c 
     on s.id = c.id 
     cross apply (select cost 
        from @T 
        where id = @id) as t 

    -- Get the next id 
    select @id = min(id) 
    from @T 
    where id > @id 
    end

来源

2011-08-11 06:47:11

在等待答案时，我确实解决了我的问题（我认为）。我添加了几个字段，我计算了触发器中阶段的'level'级别，然后针对所有阶段运行游标，按照级别降序排列，并获得期望的结果。所有这些都是在锁定所有资源的事务内部完成的，因此树中的任何树叶都不能被修改。它似乎在工作，但我需要结束集成部分，以获取一些真实的数据并对其进行测试，然后我将在此处发布它。你的答案似乎是正确的，对我来说很有意思。非常感谢您的宝贵时间。 – AlexanderMP

在SQL Server 2008中优化树分支数据聚合（递归）

回答

相关问题