2013-07-22 27 views
1

那里有很好的SQL PageRank实现吗?我看过http://www.databasedevelop.com/197517/,但它缺少可读性和正确性(T-SQL)语法。SQL PageRank实现

虽然我们在这,但有人知道上面的链接正在使用什么样的SQL?什么SQL使用'是'在随机的地方,'哪里'什么都没有,奇怪的AT关键字,等等?

+0

链接的T-SQL是可怕的。使用游标不是正确的工具。 PageRank完美地映射到map-reduce风格的SQL查询。 – usr

+0

请问map-reduce样式是什么? – NargothBond

+2

Map-reduce是“GROUP BY”查询的现代流行词。谷歌的“页面排名减少”了解更多。 – usr

回答

0

根据您的SQL Server版本,可以查看OFFSET_FETCH窗口函数。这里有很多页面排名的应用程序。当然,这将需要2012年。

我也用SSIS和一个NTILE()分割临时表来完成分页的能力,使用OFFSET_FETCH的缺席。通常使用类似记录计数除以我想在页面中看到的最大数量作为NTILE调用的种子。

无论出于何种原因,我甚至无法打开您的链接,所以希望这是您要求的。

MSDN - OFFSET_FETCH

MSDN - NTILE

0

我只是在SQL中实现的PageRank算法。该算法在以下进展中起作用。

1.计算PageRank初始等级值;

2.Joining 的PageRank边缘表以发射秩值至邻接的节点,并且使用聚集函数总和为“收集”接收到的值。然后,将结果保存到寺庙表TmpRank;

  • 交换的PageRank的内容TmpRank,并转到步骤2,直到收敛条件被满足或达到最大重复时间。
  • 下面是代码:

    -- The graph data and algorithm source from the book "Mining of Massive Datasets", P175, http://infolab.stanford.edu/~ullman/mmds/book.pdf 
    -- This script has been verified the correctness in SQL Server 2017 Linux Version. 
    DROP TABLE Node; 
    DROP TABLE Edge; 
    DROP TABLE OutDegree; 
    DROP TABLE PageRank; 
    CREATE TABLE Node(id int PRIMARY KEY); 
    CREATE TABLE Edge(src int,dst int, PRIMARY KEY (src, dst)); 
    CREATE TABLE OutDegree(id int PRIMARY KEY, degree int); 
    CREATE TABLE PageRank(id int PRIMARY KEY, rank float); 
    CREATE TABLE TmpRank(id int PRIMARY KEY, rank float); 
    
    --delete all records 
    DELETE FROM Node; 
    DELETE FROM Edge; 
    DELETE FROM OutDegree; 
    DELETE FROM PageRank; 
    DELETE FROM TmpRank; 
    
    --init basic tables 
    INSERT INTO Node VALUES (0); 
    INSERT INTO Node VALUES (1); 
    INSERT INTO Node VALUES (2); 
    INSERT INTO Node VALUES (3); 
    
    INSERT INTO Edge VALUES (0, 1); 
    INSERT INTO Edge VALUES (0, 2); 
    INSERT INTO Edge VALUES (0, 3); 
    INSERT INTO Edge VALUES (1, 0); 
    INSERT INTO Edge VALUES (1, 3); 
    INSERT INTO Edge VALUES (2, 2); 
    INSERT INTO Edge VALUES (3, 1); 
    INSERT INTO Edge VALUES (3, 2); 
    
    --compute out-degree 
    INSERT INTO OutDegree 
    SELECT Node.id, COUNT(Edge.src) --Count(Edge.src) instead of Count(*) for count no out-degree edge 
    FROM Node LEFT OUTER JOIN Edge 
    ON Node.id = Edge.src 
    GROUP BY Node.id; 
    
    --WARN: There's no special process for node with out-degree, This may cause wrong result 
    --  Please to make sure every node in graph has out-degree 
    
    DECLARE @ALPHA float = 0.8; 
    DECLARE @Node_Num int; 
    SELECT @Node_Num = COUNT(*) FROM Node; 
    
    --PageRank Init Value 
    INSERT INTO PageRank 
    SELECT Node.id, rank = (1 - @ALPHA)/@Node_Num 
    FROM Node INNER JOIN OutDegree 
    ON Node.id = OutDegree.id 
    
    /* 
    --For Debugging 
    SELECT * FROM Node; 
    SELECT * FROM Edge; 
    SELECT * FROM OutDegree; 
    SELECT * FROM PageRank; 
    SELECT * FROM TmpRank; 
    */ 
    
    DECLARE @Iteration int = 0; 
    
    WHILE @Iteration < 50 
    BEGIN 
    --Iteration Style 
        SET @Iteration = @Iteration + 1 
    
        INSERT INTO TmpRank 
        SELECT Edge.dst, rank = SUM(@ALPHA * PageRank.rank/OutDegree.degree) + (1 - @ALPHA)/@Node_Num 
        FROM PageRank 
        INNER JOIN Edge ON PageRank.id = Edge.src 
        INNER JOIN OutDegree ON PageRank.id = OutDegree.id 
        GROUP BY Edge.dst 
    
        DELETE FROM PageRank; 
        INSERT INTO PageRank 
        SELECT * FROM TmpRank; 
        DELETE FROM TmpRank; 
    END 
    
    SELECT * FROM PageRank;