2010-11-02 64 views
2

尽量不选择同一连接多次我曾尝试使用CHARINDEX()= 0条件的下列方式的检索集群:递归查询问题 - 针对连接

WITH Cluster(calling_party, called_party, link_strength, Path) 
AS 
(SELECT 
    calling_party, 
    called_party, 
    link_strength, 
    CONVERT(varchar(max), calling_party + '.' + called_party) AS Path 
FROM 
    monthly_connections_test 
WHERE 
    link_strength > 0.1 AND 
    calling_party = 'b' 
UNION ALL 
SELECT 
    mc.calling_party, 
    mc.called_party, 
    mc.link_strength, 
    CONVERT(varchar(max), cl.Path + '.' + mc.calling_party + '.' + mc.called_party) AS Path 
FROM 
    monthly_connections_test mc 
INNER JOIN Cluster cl ON 
    (
     mc.called_party = cl.called_party OR 
     mc.called_party = cl.calling_party 
    ) AND 
    (
     CHARINDEX(cl.called_party + '.' + mc.calling_party, Path) = 0 AND 
     CHARINDEX(cl.called_party + '.' + mc.called_party, Path) = 0 
    ) 
WHERE 
    mc.link_strength > 0.1 
) 
SELECT 
    calling_party, 
    called_party, 
    link_strength, 
    Path 
FROM 
    Cluster OPTION (maxrecursion 30000) 

条件但不符合其目的因为多次选择相同的行。

这里的实际目标是检索选定用户(在示例用户b中)所属的整个连接集群。

EDIT1:

我试图修改查询方式如下:

With combined_users AS 
(SELECT calling_party CALLING, called_party CALLED, link_strength FROM dbo.monthly_connections_test WHERE link_strength > 0.1), 
related_users1 AS 
(
SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLING = 'a1' 
UNION ALL 
SELECT c.CALLING, c.CALLED, c.link_strength, 
    convert(varchar(max),r.path + c.CALLED + '.') path 
     from combined_users c 
     join related_users1 r 
     ON (c.CALLING = r.CALLED) and CHARINDEX(c.CALLING + '.' + c.CALLED + '.', r.path)= 0 

     ), 
related_users2 AS 
(
SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLED = 'a1' 
UNION ALL 
SELECT c.CALLING, c.CALLED, c.link_strength, 
    convert(varchar(max),r.path + c.CALLING + '.') path 
     from combined_users c 
     join related_users2 r 
     ON c.CALLED = r.CALLING and CHARINDEX('.' + c.CALLING + '.' + c.CALLED, r.path)= 0 
) 
     SELECT CALLING, CALLED, link_strength, path FROM 
     (SELECT CALLING, CALLED, link_strength, path FROM related_users1 UNION SELECT CALLING, CALLED, link_strength, path FROM related_users2) r OPTION (MAXRECURSION 30000) 

为了测试我创建了以下集群查询:

alt text

查询上面回复了下面的表格:

a1 a2 1.0000000 .a1.a2. 
a11 a13 1.0000000 .a12.a1.a13.a11. 
a12 a1 1.0000000 .a12.a1. 
a13 a12 1.0000000 .a12.a1.a13. 
a14 a13 1.0000000 .a12.a1.a13.a14. 
a15 a14 1.0000000 .a12.a1.a13.a14.a15. 
a2 a10 1.0000000 .a1.a2.a10. 
a2 a3 1.0000000 .a1.a2.a3. 
a3 a4 1.0000000 .a1.a2.a3.a4. 
a3 a6 1.0000000 .a1.a2.a3.a6. 
a4 a8 1.0000000 .a1.a2.a3.a4.a8. 
a4 a9 1.0000000 .a1.a2.a3.a4.a9. 

该查询明显地返回朝向所选节点和相反方向的连接的连接。问题在于方向的改变:例如,由于方向改变(相对于起始节点),未选择连接a7,a4和a11,a10。

有谁知道如何修改查询以包含所有连接?

谢谢

+0

你可以给一些样本数据和你期望看到什么吗? – 2010-11-02 15:26:16

回答

1

好的,这里有几件事要讨论。

Zerothly,我有PostgreSQL,所以这一切都完成了;我试图只使用标准的SQL,所以这应该也适用于SQL Server。

首先,如果你只是在链接强度大于0.1来电兴趣,让我们说:

-- like calls, but only strong enough to be interesting 
create view strong_calls (calling_party, called_party, link_strength) 
as (
    select calling_party, called_party, link_strength 
    from monthly_connections_test 
    where link_strength > 0.1 
); 

,从现在起,我们将在此表方面谈。

其次,你说:

实际这里目的是检索到所选择的用户(在本例中用户B)所属的连接的整个集群。

如果这是真的,为什么你要计算路径?如果你只是想知道组连接,你可以这样做:

with recursive cluster (calling_party, called_party, link_strength) 
as (
    (
    select calling_party, called_party, link_strength 
    from strong_calls 
    where calling_party = 'b' 
) 
    union 
    (
    select c.calling_party, c.called_party, c.link_strength 
    from cluster this, strong_calls c 
    where c.calling_party = this.called_party 
    or c.called_party = this.calling_party 
) 
) 
select * 
from cluster; 

第三,也许你真的不想要查找连接集群中,想要找到其中的人都在集群中,以及从目标到他们的最短路径是什么。在这种情况下,您可以这样做:

with recursive cluster (party, path) 
as (
    select cast('b' as character varying), cast('b' as character varying) 
    union 
    (
    select (case 
     when this.party = c.calling_party then c.called_party 
     when this.party = c.called_party then c.calling_party 
    end), (this.path || '.' || (case 
     when this.party = c.calling_party then c.called_party 
     when this.party = c.called_party then c.calling_party 
    end)) 
    from cluster this, strong_calls c 
    where (this.party = c.calling_party and position(c.called_party in this.path) = 0) 
    or (this.party = c.called_party and position(c.calling_party in this.path) = 0) 
) 
) 
select party, path 
from cluster 
where not exists (
    select * 
    from cluster c2 
    where cluster.party = c2.party 
    and (
    char_length(cluster.path) > char_length(c2.path) 
    or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path) 
) 
) 
order by party, path; 

正如您所看到的,您非常重视正确的方向。

如果你确实需要集群中所有呼叫的列表和路径,那么,呃,我会尽快给你回复!

编辑:请记住,不构建路径的查询将有非常不同的性能特点,以做那些。粗略地说,非路径查询将执行O(n)工作(可能在O(log n)迭代步骤中),因为它们访问集群中的每个节点,但路径构建步骤将做更多工作 - O也许吧? - 因为他们必须通过图访问每个路径。如果集群与示例中的集群一样大,那么你会好起来的,但是如果它们更大,则可能会发现运行时间过长。

0

CHARINDEX( 'b.d', 'b.c.d.b')= 0,因为有一个 'C'。在更容易

之间

阅读:

WITH cluster(calling_party, called_party, link_strength, PATH) 
    AS (SELECT calling_party, 
       called_party, 
       link_strength, 
       CONVERT(VARCHAR(MAX), calling_party + '.' + called_party) AS 
       PATH 
     FROM monthly_connections_test 
     WHERE link_strength > 0.1 
       AND calling_party = 'b' 
     UNION ALL 
     SELECT mc.calling_party, 
       mc.called_party, 
       mc.link_strength, 
       CONVERT(VARCHAR(MAX), cl.PATH + '.' + mc.calling_party + '.' + 
       mc.called_party) 
       AS PATH 
     FROM monthly_connections_test mc 
       INNER JOIN cluster cl 
        ON (mc.called_party = cl.called_party 
         OR mc.called_party = cl.calling_party) 
        AND (Charindex(cl.called_party + '.' + mc.calling_party, 
          PATH) 
          = 0 
          AND Charindex(cl.called_party + '.' + 
           mc.called_party, 
           PATH) 
           = 
           0) 
     WHERE mc.link_strength > 0.1) 
SELECT calling_party, 
     called_party, 
     link_strength, 
     PATH 
FROM cluster 
OPTION (MAXRECURSION 30000) 
+0

这与上面的查询相同。 – 2010-11-02 15:11:15

+0

现在,是的,但是当我第一次阅读它时,它全部在4行上,所以我重新格式化它以帮助每个人 – smirkingman 2010-11-02 16:41:52

0

为了解决你的问题,编辑,如果你想忽略的链接指向,尝试:

create view symmetric_users (calling_party, called_party, link_strength) 
as (
    select calling_party, called_party, link_strength from monthly_connections_test 
    union 
    select called_party , calling_party, link_strength from monthly_connections_test 
) 

然后在这一点上查询。

如果您有相互呼叫的用户,则每个有序对用户将有两行。你应该能够通过选择更强的过滤器来过滤掉。