快速查询以对SQL数据执行规范化操作

我有一些数据要进行规范化。具体来说，我正在对其进行规范化，以便我可以处理正常化的部分，而不必担心重复。我在做什么是：快速查询以对SQL数据执行规范化操作

INSERT INTO new_table (a, b, c) 
    SELECT DISTINCT a,b,c 
    FROM old_table; 

UPDATE old_table 
SET abc_id = new_table.id 
FROM new_table 
WHERE new_table.a = old_table.a 
    AND new_table.b = old_table.b 
    AND new_table.c = old_table.c;

首先，它似乎应该有一个更好的方式来做到这一点。似乎找到不同数据的固有过程可能会产生属于它的成员列表。其次，更重要的是，INSERT需要一对夫妇，UPDATE需要FOREVER（实际上我没有多长时间的价值，因为它仍在运行）。我正在使用postgresql。有没有更好的方法来做到这一点（也许所有在一个查询中）。

来源

2016-06-09 CrazyCasta

如果UPDATE采用_FOREVER_，那么是否可能是因为缺少new_table（a，b，c）'上的唯一索引？ –

你'分析了old_table;分析new_table;'首先？ –

@EgorRogov我没有，但现在这样做，它只是说“分析”。然后我读了一篇说“它收集静态信息”的文档，但我不确定我完全理解它应该做什么。我是否应该从中收集一些信息，或者它只是让查询更快的一种神奇方式？ – CrazyCasta

这是我的对方回答，扩展到三列：

 -- Some test data 
CREATE TABLE the_table 
     (id SERIAL NOT NULL PRIMARY KEY 
     , name varchar 
     , a INTEGER 
     , b varchar 
     , c varchar 
     ); 
INSERT INTO the_table(name, a,b,c) VALUES 
('Chimpanzee' , 1, 'mammals', 'apes') 
,('Urang Utang' , 1, 'mammals', 'apes') 
,('Homo Sapiens' , 1, 'mammals', 'apes') 
,('Mouse' , 2, 'mammals', 'rodents') 
,('Rat' , 2, 'mammals', 'rodents') 
,('Cat' , 3, 'mammals', 'felix') 
,('Dog' , 3, 'mammals', 'canae') 
     ; 

     -- [empty] table to contain the "squeezed out" domain {a,b,c} 
CREATE TABLE abc_table 
     (id SERIAL NOT NULL PRIMARY KEY 
     , a INTEGER 
     , b varchar 
     , c varchar 
     , UNIQUE (a,b,c) 
     ); 

     -- The original table needs a "link" to the new table 
ALTER TABLE the_table 
     ADD column abc_id INTEGER -- NOT NULL 
     REFERENCES abc_table(id) 
     ; 
     -- FK constraints are helped a lot by a supportive index. 
CREATE INDEX abc_table_fk ON the_table (abc_id); 

     -- Chained query to: 
     -- * populate the domain table 
     -- * initialize the FK column in the original table 
WITH ins AS (
     INSERT INTO abc_table(a,b,c) 
     SELECT DISTINCT a,b,c 
     FROM the_table a 
     RETURNING * 
     ) 
UPDATE the_table ani 
SET abc_id = ins.id 
FROM ins 
WHERE ins.a = ani.a 
AND ins.b = ani.b 
AND ins.c = ani.c 
     ; 

     -- Now that we have the FK pointing to the new table, 
     -- we can drop the redundant columns. 
ALTER TABLE the_table DROP COLUMN a, DROP COLUMN b, DROP COLUMN c; 

SELECT * FROM the_table; 
SELECT * FROM abc_table; 

     -- show it to the world 
SELECT a.* 
     , c.a, c.b, c.c 
FROM the_table a 
JOIN abc_table c ON c.id = a.abc_id 
     ;

结果：

CREATE TABLE 
INSERT 0 7 
CREATE TABLE 
ALTER TABLE 
CREATE INDEX 
UPDATE 7 
ALTER TABLE 
id |  name  | abc_id 
----+--------------+-------- 
    1 | Chimpanzee |  4 
    2 | Urang Utang |  4 
    3 | Homo Sapiens |  4 
    4 | Mouse  |  3 
    5 | Rat   |  3 
    6 | Cat   |  1 
    7 | Dog   |  2 
(7 rows) 

id | a | b | c  
----+---+---------+--------- 
    1 | 3 | mammals | felix 
    2 | 3 | mammals | canae 
    3 | 2 | mammals | rodents 
    4 | 1 | mammals | apes 
(4 rows) 

id |  name  | abc_id | a | b | c  
----+--------------+--------+---+---------+--------- 
    1 | Chimpanzee |  4 | 1 | mammals | apes 
    2 | Urang Utang |  4 | 1 | mammals | apes 
    3 | Homo Sapiens |  4 | 1 | mammals | apes 
    4 | Mouse  |  3 | 2 | mammals | rodents 
    5 | Rat   |  3 | 2 | mammals | rodents 
    6 | Cat   |  1 | 3 | mammals | felix 
    7 | Dog   |  2 | 3 | mammals | canae 
(7 rows)

编辑：这似乎是工作不够好，我讨厌看到向下投我放在那里，如此无用的编辑（CrazyCasta）。

来源

2016-06-09 19:00:12 wildplasser

根据我上面的评论：http://pastebin.com/P7wtCxYx。这似乎没有任何更好的，然后我的原始查询与新表上的唯一约束。 – CrazyCasta

使用主键，外键和支持索引，它是不同的。而且可能会更好。 – wildplasser

好吧，看了你一大堆后，你做了一个散列连接，如果你添加新列之前插入。我不完全确定为什么，但它似乎对所做事情的顺序非常挑剔，即使结果是相同的。我不能只是用已经存在的外键创建表）这可能是我的情况的一个问题，但它确实看起来像它可能在某些情况下工作。可悲的是，SO不会让我失望：（ – CrazyCasta

想出了一个办法做到这一点我自己：

BEGIN; 

CREATE TEMPORARY TABLE new_table_temp (
    LIKE new_table, 
    old_ids integer[] 
) 
ON COMMIT DROP; 

INSERT INTO new_table_temp (a, b, c, old_ids) 
    SELECT a, b, c, array_ag(id) AS old_ids 
    FROM old_table 
    GROUP BY a, b, c; 

INSERT INTO new_table (id, a, b, c) 
    SELECT id, a, b, c 
    FROM new_table_temp; 

UPDATE old_table 
SET abc_id = new_table_temp.id 
FROM new_table_temp 
WHERE old_table.id = ANY(new_table_temp.old_ids); 

COMMIT;

这至少是我一直在寻找。我会更新它是否快速运行。 EXPLAIN似乎是一个明智的计划，所以我很有希望。

来源

2016-06-09 18:24:00 CrazyCasta

在这里看到我的答案：http://stackoverflow.com/a/29879536/905902（用你的{a，b，c}列替换{category，subcategory}。不要忘记{a，b，c }和FK/PK约束！） – wildplasser

好吧，我只是看了一眼，我在第一张桌子上放了一个独特的索引，它会做很多工作。 http://pastebin.com/P7wtCxYx根据这个解释，它必须拉起桌子，进行顺序扫描并对键进行排序。我的结果是对id进行哈希查找。 – CrazyCasta

散列 - >>索引切换是由统计数据和索引的缺失决定的。没有额外的信息，计划者通常会选择散列解决方案，除非散列表预计不适合内存。顺便说一句：对于小测试数据，我的解决方案也会产生散列表。 – wildplasser

快速查询以对SQL数据执行规范化操作

回答

相关问题