我使用PostgreSQL 9.1查询由连接节点的边(或元素)组成的分层树结构数据。数据实际上是用于流式网络,但是我已经将问题抽象为简单的数据类型。考虑示例tree
表。每条边都有长度和面积属性,这些属性用于确定网络中的一些有用的指标。如何使用递归查询向后遍历分层树结构结构
CREATE TEMP TABLE tree (
edge text PRIMARY KEY,
from_node integer UNIQUE NOT NULL, -- can also act as PK
to_node integer REFERENCES tree (from_node),
mode character varying(5), -- redundant, but illustrative
length numeric NOT NULL,
area numeric NOT NULL,
fwd_path text[], -- optional ordered sequence, useful for debugging
fwd_search_depth integer,
fwd_length numeric,
rev_path text[], -- optional unordered set, useful for debugging
rev_search_depth integer,
rev_length numeric,
rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
('A', 1, 4, 'start', 1.1, 0.9),
('B', 2, 4, 'start', 1.2, 1.3),
('C', 3, 5, 'start', 1.8, 2.4),
('D', 4, 5, NULL, 1.2, 1.3),
('E', 5, NULL, 'end', 1.1, 0.9);
可以在下面说明,其中由A-E表示的边与节点1-5连接。 NULL to_node
(Ø)表示末端节点。 from_node
始终是唯一的,所以它可以充当PK。如果该网络像drainage basin流动,这将是从顶部到底部,其中起始支流边缘是A,B,C和结束流出边缘E.
的documentation for WITH
提供了一个很好例子如何在递归查询中使用搜索图。因此,为了得到“转发”信息,查询开始的结束,倒过来:
WITH RECURSIVE search_graph AS (
-- Begin at ending nodes
SELECT E.from_node, 1 AS search_depth, E.length
, ARRAY[E.edge] AS path -- optional
FROM tree E WHERE E.to_node IS NULL
UNION ALL
-- Accumulate each edge, working backwards (upstream)
SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
, o.edge|| sg.path -- optional
FROM tree o, search_graph sg
WHERE o.to_node = sg.from_node
)
UPDATE tree SET
fwd_path = sg.path,
fwd_search_depth = sg.search_depth,
fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;
edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
A | 1 | 4 | {A,D,E} | 3 | 3.4
B | 2 | 4 | {B,D,E} | 3 | 3.5
C | 3 | 5 | {C,E} | 2 | 2.9
D | 4 | 5 | {D,E} | 2 | 2.3
E | 5 | | {E} | 1 | 1.1
以上是有道理的,并且很好地扩展用于大型网络。例如,我可以看到边缘B
从末端开始是3条边,正向路径是{B,D,E}
,从末端到末尾的总长度为3.5。
但是,我找不出一个好方法来构建反向查询。也就是说,从每个边缘,累积的“上游”边缘,长度和面积是多少。使用WITH RECURSIVE
,我有最好是:
WITH RECURSIVE search_graph AS (
-- Begin at starting nodes
SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
, ARRAY[S.edge] AS path -- optional
FROM tree S WHERE from_node IN (
-- Starting nodes have a from_node without any to_node
SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)
UNION ALL
-- Accumulate edges, working forwards
SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
, c.edge || sg.path -- optional
FROM tree c, search_graph sg
WHERE c.from_node = sg.to_node
)
UPDATE tree SET
rev_path = sg.path,
rev_search_depth = sg.search_depth,
rev_length = sg.length,
rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {D,A} | 2 | 2.3 | 2.2
E | 5 | | {E,C} | 2 | 2.9 | 3.3
我想建立聚集到递归查询的第二项中,由于每个下游边缘连接到1或许多上游的边缘,但aggregates are not allowed with recursive queries。另外,我知道这个连接是马虎,因为with recursive
结果有edge
的多个连接条件。
的反向预期结果/向后查询:
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {A,B,D} | 3 | 3.5 | 3.5
E | 5 | | {A,B,C,D,E} | 5 | 6.4 | 6.8
如何我写这篇文章的更新查询?
请注意,我最终更关心积累准确的长度和面积总计,并且路径属性用于调试。在我的现实案例中,转发路径高达数百,我预计大型和复杂流域的成千上万的反向路径。
你的数据模型试图边和顶点合并成一个表。拆分它们将产生更清洁的模型,IMO。 – wildplasser 2015-02-25 14:08:41
@wildplasser如果你觉得有帮助,很容易分裂它。你也可以忽略'edge',因为'from_node'是唯一的。 from_node在逻辑上是 – 2015-02-26 23:20:23
的主键。 to_node应该是外键引用树(from_node)边在功能上依赖于from_node。通常情况下,根的to_node设置为NULL(根没有父项),这意味着'E'段消失或者您需要额外的节点{from_node = 6,to_node = NULL}'mode'字段更多或较少冗余:它仅指示存在指向/来自“this”节点的记录。 – wildplasser 2015-02-26 23:33:33