11

我使用PostgreSQL 9.1查询由连接节点的边(或元素)组成的分层树结构数据。数据实际上是用于流式网络,但是我已经将问题抽象为简单的数据类型。考虑示例tree表。每条边都有长度和面积属性,这些属性用于确定网络中的一些有用的指标。如何使用递归查询向后遍历分层树结构结构

CREATE TEMP TABLE tree (
    edge text PRIMARY KEY, 
    from_node integer UNIQUE NOT NULL, -- can also act as PK 
    to_node integer REFERENCES tree (from_node), 
    mode character varying(5), -- redundant, but illustrative 
    length numeric NOT NULL, 
    area numeric NOT NULL, 
    fwd_path text[], -- optional ordered sequence, useful for debugging 
    fwd_search_depth integer, 
    fwd_length numeric, 
    rev_path text[], -- optional unordered set, useful for debugging 
    rev_search_depth integer, 
    rev_length numeric, 
    rev_area numeric 
); 
CREATE INDEX ON tree (to_node); 
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES 
('A', 1, 4, 'start', 1.1, 0.9), 
('B', 2, 4, 'start', 1.2, 1.3), 
('C', 3, 5, 'start', 1.8, 2.4), 
('D', 4, 5, NULL, 1.2, 1.3), 
('E', 5, NULL, 'end', 1.1, 0.9); 

可以在下面说明,其中由A-E表示的边与节点1-5连接。 NULL to_node(Ø)表示末端节点。 from_node始终是唯一的,所以它可以充当PK。如果该网络像drainage basin流动,这将是从顶部到底部,其中起始支流边缘是A,B,C和结束流出边缘E.

tree

documentation for WITH提供了一个很好例子如何在递归查询中使用搜索图。因此,为了得到“转发”信息,查询开始的结束,倒过来:

WITH RECURSIVE search_graph AS (
    -- Begin at ending nodes 
    SELECT E.from_node, 1 AS search_depth, E.length 
    , ARRAY[E.edge] AS path -- optional 
    FROM tree E WHERE E.to_node IS NULL 

    UNION ALL 
    -- Accumulate each edge, working backwards (upstream) 
    SELECT o.from_node, sg.search_depth + 1, sg.length + o.length 
    , o.edge|| sg.path -- optional 
    FROM tree o, search_graph sg 
    WHERE o.to_node = sg.from_node 
) 
UPDATE tree SET 
    fwd_path = sg.path, 
    fwd_search_depth = sg.search_depth, 
    fwd_length = sg.length 
FROM search_graph AS sg WHERE sg.from_node = tree.from_node; 

SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length 
FROM tree ORDER BY edge; 

edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length 
------+-----------+---------+----------+------------------+------------ 
A |   1 |  4 | {A,D,E} |    3 |  3.4 
B |   2 |  4 | {B,D,E} |    3 |  3.5 
C |   3 |  5 | {C,E} |    2 |  2.9 
D |   4 |  5 | {D,E} |    2 |  2.3 
E |   5 |   | {E}  |    1 |  1.1 

以上是有道理的,并且很好地扩展用于大型网络。例如,我可以看到边缘B从末端开始是3条边,正向路径是{B,D,E},从末端到末尾的总长度为3.5。

但是,我找不出一个好方法来构建反向查询。也就是说,从每个边缘,累积的“上游”边缘,长度和面积是多少。使用WITH RECURSIVE,我有最好是:

WITH RECURSIVE search_graph AS (
    -- Begin at starting nodes 
    SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area 
    , ARRAY[S.edge] AS path -- optional 
    FROM tree S WHERE from_node IN (
    -- Starting nodes have a from_node without any to_node 
    SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree) 

    UNION ALL 
    -- Accumulate edges, working forwards 
    SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area 
     , c.edge || sg.path -- optional 
    FROM tree c, search_graph sg 
    WHERE c.from_node = sg.to_node 
) 
UPDATE tree SET 
    rev_path = sg.path, 
    rev_search_depth = sg.search_depth, 
    rev_length = sg.length, 
    rev_area = sg.area 
FROM search_graph AS sg WHERE sg.from_node = tree.from_node; 

SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area 
FROM tree ORDER BY edge; 

edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area 
------+-----------+---------+----------+------------------+------------+---------- 
A |   1 |  4 | {A}  |    1 |  1.1 |  0.9 
B |   2 |  4 | {B}  |    1 |  1.2 |  1.3 
C |   3 |  5 | {C}  |    1 |  1.8 |  2.4 
D |   4 |  5 | {D,A} |    2 |  2.3 |  2.2 
E |   5 |   | {E,C} |    2 |  2.9 |  3.3 

我想建立聚集到递归查询的第二项中,由于每个下游边缘连接到1或许多上游的边缘,但aggregates are not allowed with recursive queries。另外,我知道这个连接是马虎,因为with recursive结果有edge的多个连接条件。

的反向预期结果/向后查询:

edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area 
------+-----------+---------+-------------+------------------+------------+---------- 
A |   1 |  4 | {A}   |    1 |  1.1 |  0.9 
B |   2 |  4 | {B}   |    1 |  1.2 |  1.3 
C |   3 |  5 | {C}   |    1 |  1.8 |  2.4 
D |   4 |  5 | {A,B,D}  |    3 |  3.5 |  3.5 
E |   5 |   | {A,B,C,D,E} |    5 |  6.4 |  6.8 

如何我写这篇文章的更新查询?

请注意,我最终更关心积累准确的长度和面积总计,并且路径属性用于调试。在我的现实案例中,转发路径高达数百,我预计大型和复杂流域的成千上万的反向路径。

+0

你的数据模型试图边和顶点合并成一个表。拆分它们将产生更清洁的模型,IMO。 – wildplasser 2015-02-25 14:08:41

+0

@wildplasser如果你觉得有帮助,很容易分裂它。你也可以忽略'edge',因为'from_node'是唯一的。 from_node在逻辑上是 – 2015-02-26 23:20:23

+1

的主键。 to_node应该是外键引用树(from_node)边在功能上依赖于from_node。通常情况下,根的to_node设置为NULL(根没有父项),这意味着'E'段消失或者您需要额外的节点{from_node = 6,to_node = NULL}'mode'字段更多或较少冗余:它仅指示存在指向/来自“this”节点的记录。 – wildplasser 2015-02-26 23:33:33

回答

5

UPDATE 2: 我重写了原始的递归查询,以便所有的累积/聚合都在递归部分之外完成。它应该比此答案的以前版本表现更好。 对于类似的问题,这与@a_horse_with_no_name的answer非常相似。

WITH 
    RECURSIVE search_graph(edge, from_node, to_node, length, area, start_node) AS 
    (
     SELECT edge, from_node, to_node, length, area, from_node AS "start_node" 
     FROM tree 
     UNION ALL 
     SELECT o.edge, o.from_node, o.to_node, o.length, o.area, p.start_node 
     FROM tree o 
    JOIN search_graph p ON p.from_node = o.to_node 
    ) 
    SELECT array_agg(edge) AS "edges" 
     -- ,array_agg(from_node) AS "nodes" 
      ,count(edge) AS "edge_count" 
      ,sum(length) AS "length_sum" 
      ,sum(area) AS "area_sum" 
    FROM search_graph 
    GROUP BY start_node 
    ORDER BY start_node 
; 

结果正如所料:

start_node | edges  | edge_count | length_sum | area_sum 
------------+-------------+------------+------------+------------ 
    1   | {A}   |   1 |  1.1 |  0.9 
    2   | {B}   |   1 |  1.2 |  1.3 
    3   | {C}   |   1 |  1.8 |  2.4 
    4   | {D,B,A}  |   3 |  3.5 |  3.5 
    5   | {E,D,C,B,A} |   5 |  6.4 |  6.8