2016-01-29 41 views
0

下表显示了课程和学生之间的多对多关系。在多对多关系表中查找完全匹配的组

CREATE Table CourseStudents 
     (
      CourseId INT NOT NULL, 
     StudentId INT NOT NULL, 
     PRIMARY KEY (CourseId, StudentId) 
     ); 

INSERT INTO CourseStudents VALUES (1, 1), (1, 2), (2, 1), (2, 2), (3, 3), (3, 2), 
(4, 3), (4, 2), (5, 1) 

实例数据

| CourseId | StudentId | 
|----------|-----------| 
|  1 |   1 | 
|  1 |   2 | 
|  2 |   1 | 
|  2 |   2 | 
|  3 |   2 | 
|  3 |   3 | 
|  4 |   2 | 
|  4 |   3 | 
|  5 |   1 | 

我在寻找一个返回具有完全相同的学生所有课程的查询。我能够拿出下面显示的查询。

WITH CourseGroups AS 
(
SELECT c.CourseId, 
STUFF ((
SELECT ',' + CAST(c2.StudentId AS VARCHAR) 
    FROM CourseStudents c2 
    WHERE c2.CourseId = c.CourseId 
    ORDER BY c2.StudentId 
    FOR XML PATH ('')), 1, 1, '') AS StudentList 
FROM CourseStudents c 
GROUP BY c.CourseId) 
SELECT cg.StudentList, 
STUFF ((
SELECT ',' + CAST(cg2.CourseId AS VARCHAR(10)) 
    FROM CourseGroups cg2 
    WHERE cg2.StudentList = cg.StudentList 
    FOR XML PATH ('')), 1, 1, '') AS ExactMatchCourseList 
FROM CourseGroups cg 
GROUP BY cg.StudentList 
HAVING COUNT(*) > 1 

查询返回

| StudentList | ExactMatchCourseList | 
|-------------|----------------------| 
|   1,2 |     1,2 | 
|   2,3 |     3,4 | 

上述结果是好的。但我只需要ExactMatchCourseList。 我正在处理的表有超过十亿行,所以我需要一个高效的查询,可以在几分钟的运行时间内找到任何匹配的课程。感谢任何帮助。 SqlFiddle

回答

0

这不仅会2个运行在你的CourseStudents表,而不是你的4正在做。如果您在CourseStudents表上的CourseId上添加索引,则第一次运行只会是索引扫描。它也只为每个课程运行一次原始STUFF,而不是每个学生一次,然后按课程分组。我遗漏了最后的东西,我不确定你是否想要它,或者它只是你计算它的副产品。


CREATE TABLE #Course 
(
    CourseId INT NOT NULL PRIMARY KEY 
); 

INSERT INTO #Course 
SELECT CourseId 
FROM 
CourseStudents s 
GROUP BY 
CourseId 
ORDER BY 
CourseId; 

CREATE TABLE #CourseStudentList 
(
CourseId INT NOT NULL PRIMARY KEY, 
StudentList VARCHAR(MAX) NOT NULL 
); 

INSERT INTO #CourseStudentList 
SELECT 
c.CourseId, 
STUFF ((
SELECT ',' + CAST(c2.StudentId AS VARCHAR) 
    FROM CourseStudents c2 
    WHERE c2.CourseId = c.CourseId 
    ORDER BY c2.StudentId 
    FOR XML PATH ('')), 1, 1, '') AS StudentList 
FROM 
#Course c 
ORDER BY 
c.CourseId; 

SELECT * 
FROM 
(
    SELECT 
    l.CourseId, 
    l.StudentList, 
    COUNT(*) OVER (PARTITION BY l.StudentList) AS [Count] 
    FROM 
    #CourseStudentList l 
) l 
WHERE 
l.[Count] > 1 
ORDER BY 
l.StudentList; 
+0

我正在将此标记为答案,因为我可以在可接受的时间内检索重复课程。但是,我不得不修改最后一个查询以输出重复课程列表以及学生列表。谢谢。 – ziddarth

0

这会给你一个课程对的列表,但如果你要得到一式三份(或更多),那么你最终会得到一些额外的结果。我没有时间去玩弄这进一步纠正这个问题,但也许这点你在正确的方向:

WITH CTE_CourseMatches AS (
    SELECT 
     CS1.CourseId AS CourseId_1, 
     CS2.CourseId AS CourseId_2, 
     COUNT(*) AS cnt 
    FROM 
     CourseStudents CS1 
    INNER JOIN CourseStudents CS2 ON CS2.StudentId = CS1.StudentId AND CS2.CourseId > CS1.CourseId 
    GROUP BY 
     CS1.CourseId, 
     CS2.CourseId 
), 
CTE_CourseCounts AS (SELECT CourseId, COUNT(*) AS cnt FROM CourseStudents GROUP BY CourseID) 
SELECT 
    CM.CourseId_1, 
    CM.CourseId_2 
FROM 
    CTE_CourseMatches CM 
INNER JOIN CTE_CourseCounts CC1 ON CC1.CourseId = CM.CourseId_1 AND CC1.cnt = CM.cnt 
INNER JOIN CTE_CourseCounts CC2 ON CC2.CourseId = CM.CourseId_2 AND CC2.cnt = CM.cnt 
+0

谢谢,会试试看。对于不止一场比赛,结果集不断增长。但我可以想出一个办法来处理这个问题。 – ziddarth