2012-12-22 42 views
0

我有两个表,T1和T2如下SELECT COUNT不同,共同行:从两个表

CATEGORY  ID 
1   1100 
1   1200 
1   1300 
1   1500 
2   2000 
2   2100 
2   2300 
2   2500 

我需要知道:

  • 多少行是T1和T2之间的相似(相同的类别和1D)
  • 从T2多少行不在T1
  • 从T1多少行不在T2

我群聚我的头就可以了,因为今天早上,并试图做到这一点,以获得类似的行:

select count(*) from T1, T2 WHERE 
T1.CATEGORY = T2.CATEGORY AND T1.ID = T2.ID; 

但我无法弄清楚如何获得唯一行(仅在T1或T2)。

回答

5

问题1

SELECT COUNT(*) totalCount 
FROM T1 a 
     INNER JOIN T2 b 
      ON a.Category = b.Category AND 
       a.ID = b.ID 

问题2(使用LEFT JOIN

SELECT COUNT(*) totalCount 
FROM T2 a 
     LEFT JOIN T1 b 
      ON a.Category = b.Category AND 
       a.ID = b.ID 
WHERE b.Category IS NULL 

问题3(使用LEFT JOIN

SELECT COUNT(*) totalCount 
FROM T1 a 
     LEFT JOIN T2 b 
      ON a.Category = b.Category AND 
       a.ID = b.ID 
WHERE b.Category IS NULL 
+0

专家给解答快速:-) – vels4j

0

如果你不能为总结行是不同的,那么你需要采取一种稍微不同的方法。下面是回答在同一时间的所有三个问题,考虑到重复行的方法:

select (case when isT1 = 1 and isT2 = 0 then 'BOTH' 
      when isT1 = 1 then 'T1-Only' 
      else 'T2-Only' 
     end) as WhereRow, 
     count(*) as NumDistinctRows, 
     sum(cnt) as NumTotalRows 
from ((select category, id, count(*) as cnt, 1 as isT1, 0 as isT2 
     from t1 
     group by category, id 
    ) union all 
     (select category, id, count(*) as cnt, 0 as isT1, 1 as isT2 
     from t2 
     group by category, id 
    ) 
    ) t 
group by isT1, isT2 
1
DROP SCHEMA tmp CASCADE; 
CREATE SCHEMA tmp ; 
SET search_path=tmp; 

CREATE TABLE lutser 
     (id INTEGER NOT NULL 
     , category INTEGER NOT NULL 
     ); 
INSERT INTO lutser(category, id) VALUES 
(1,1100) ,(1,1200) ,(1,1300) ,(1,1500) 
,(2,2000) ,(2,2100) ,(2,2300) ,(2,2500) 
,(1,3500) -- added these 
,(2,3500) 
     ; 

这些查询构建一个“位掩码” 1类== 1,2类== 2,并添加它们。因此,当两个集合中都存在id时,掩码为3,仅在第一个集合中为1,而仅在第二集合中为2。外部连接+聚合在这里做的伎俩。

 -- 
     -- CTE version 
     -- 
WITH flags AS (
     WITH one AS (SELECT category AS flag , id FROM lutser WHERE category = 1) 
     , two AS (SELECT category AS flag , id FROM lutser WHERE category = 2) 
     SELECT COALESCE(one.flag, 0) + COALESCE(two.flag, 0) AS flag 
     FROM one 
     FULL OUTER JOIN two ON two.id = one.id 
     ) 
SELECT flag, COUNT(*) 
FROM flags 
GROUP BY flag; 

     -- 
     -- Non-CTE version 
     -- 
SELECT COALESCE(one.flag, 0) + COALESCE(two.flag, 0) AS flags 
     , COUNT(*) 
FROM (
     SELECT category AS flag , id 
     FROM lutser WHERE category = 1 
     ) one 
FULL OUTER JOIN (
     SELECT category AS flag , id 
     FROM lutser WHERE category = 2 
     ) two ON two.id = one.id 
GROUP BY flags; 

结果(这两个查询;-):

flags | count 
-------+------- 
    1 |  4 
    2 |  4 
    3 |  1 
+0

我认为'不支持MySQL的FULL JOIN'。 –

+0

感谢您的回答,这真的很好。但是我们拥有1000万个原始数据,而且它确实耗费了内存。 – madkitty

+0

这是在一个查询中回答您的三个问题的唯一方法。 10M行在这里不相关; *每个*解决方案都会受益于id上的某种索引。 – wildplasser