2016-03-18 63 views
0

我有一个观点cnst_prsn_nm。我想检查共享相同的cnst_mstr_id和相同的姓氏,但名字不同的记录。所以,我在Teradata的SQL做LEFT JOIN后群集和HAVING

SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a 
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b 
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id 
GROUP BY prsn_nm_a.cnst_mstr_id 
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm 

然后这些记录cnst_mstr_ids,我想检查另一个表cnst_mstr。 基本上我想检查那里左加入IS NULL

LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new 
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id 
WHERE mstr_new.new_cnst_mstr_id IS NULL 

所以我的查询变得基本

SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a 
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b 
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id 
GROUP BY prsn_nm_a.cnst_mstr_id 
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm 
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new 
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id 
WHERE mstr_new.new_cnst_mstr_id IS NULL 

但有两个WHERE子句。 HAVING之后,LEFT JOIN也不能直接在那里。如果在与分组关联的筛选器中,如何在Group By和HAVING Clause之后进行左连接?

+0

我删除了MySQL标签,因为语法显然不是MySQL。 –

+0

有道理。但我希望这是一般的SQL相关问题,而不是任何RDBMS具体的问题。 – StrugglingCoder

+0

TOP是产品特定的。 – jarlh

回答

1

你原来的查询是不正确的(WHEREGROUP BY前)让我假设你的意思是这样的:

SELECT TOP 20 prsn_nm_a.cnst_mstr_id 
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN 
    arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b 
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id 
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm 
GROUP BY prsn_nm_a.cnst_mstr_id 
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1; 

不匹配的左加入相当于用NOT EXISTS,所以你可以这样做:

SELECT TOP 20 prsn_nm_a.cnst_mstr_id 
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN 
    arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b 
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id 
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm 
GROUP BY prsn_nm_a.cnst_mstr_id 
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 AND 
     NOT EXISTS (SELECT 1 
        FROM arc_mdm_vws.bzal_cnst_mstr mstr_new 
        WHERE prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id 
       ); 
2

SQL语句中的子句总是按特定顺序出现。首先SELECT,然后FROM,然后JOIN s,然后WHERE,然后GROUP BY,然后HAVING。你不能偏离这个顺序,也不需要(也不可能有)第二个WHERE子句。让你唯一的WHERE条款包括全部你需要的条件。

SELECT TOP 20 prsn_nm_a.cnst_mstr_id 
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a 
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b 
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id 
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new 
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id 
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm 
    AND mstr_new.new_cnst_mstr_id IS NULL 
GROUP BY prsn_nm_a.cnst_mstr_id 
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 
1

你任务可以这样写不自联接:

SELECT * 
FROM 
(
    SELECT TOP 20 -- why TOP? 
     cnst_mstr_id, bz_cnst_prsn_last_nm 
    FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a 
    GROUP BY cnst_mstr_id, bz_cnst_prsn_last_nm  -- same customer & name 
    HAVING COUNT(DISTINCT bz_cnst_prsn_first_nm) > 1 -- different first_names 
) AS prsn_nm 
WHERE NOT EXISTS 
(
    SELECT * 
    FROM arc_mdm_vws.bzal_cnst_mstr mstr_new 
    WHERE prsn_nm.cnst_mstr_id = mstr_new.new_cnst_mstr_id 
) 

根据现有索引,这可能比自联接更快。

而且正如Gordon已经提到的,LEFT JOIN ... IS NULLNOT EXISTS相同,而在Teradata中,后者通常效率更高。