我想要获取数据库AA中数据库AA中缺失的任何表或字段。我正在使用INFORMATION_SCHEMA.columns获取信息。所以,我写了一个'缺失记录'查询来找到它们。在测试中,我使用了2个数据库,我知道BB在另一个表中有1个缺失的表和1个缺失的字段。
这是我第一次尝试:确定两个MySQL数据库模式之间的差异
SELECT AA.table_name,
AA.column_name,
BB.table_name,
BB.column_name
FROM information_schema.columns AS AA
LEFT JOIN information_schema.columns AS BB
ON (AA.table_name = bb.table_name)
AND (AA.column_name = BB.column_name)
WHERE AA.table_schema = 'wireless-2015-05'
AND BB.table_schema = 'wireless-2015-04'
AND BB.column_name IS NULL
这返回0的记录。所以,然后我尝试:
SELECT AA.table_name,
AA.column_name
FROM information_schema.columns AS AA
WHERE AA.table_schema = 'wireless-2015-04'
AND NOT EXISTS(SELECT BB.table_name,
BB.column_name
FROM information_schema.columns AS BB
WHERE BB.table_schema = 'wireless-2015-05')
我再次得到0条记录。最后我试过这个:
SELECT table_name,
column_name
FROM (SELECT DISTINCT table_name,
column_name
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-04'
UNION ALL
SELECT DISTINCT table_name,
column_name
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-05') AS tbl
GROUP BY table_name,
column_name
HAVING Count(*) = 1
这产生了预期的结果。
虽然我不介意使用第三个查询,但我无法弄清楚为什么前两个不起作用。我想知道以供将来参考。任何人都可以发现问题吗?
更新:
对于那些感兴趣的,这里有4个查询的工作,以及运行每一个的时间。按照最快的顺序列出,并且在查询下方列出时间。
SELECT AA.table_name,
AA.column_name
FROM information_schema.columns AS AA
LEFT JOIN (SELECT table_name,
column_name
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-04') BB
ON AA.table_name = BB.table_name
AND AA.column_name = BB.column_name
WHERE AA.table_schema = 'wireless-2015-05'
AND BB.table_name IS NULL;
0.047秒
SELECT table_name,
column_name
FROM (SELECT DISTINCT table_name,
column_name
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-04'
UNION ALL
SELECT DISTINCT table_name,
column_name
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-05') AS tbl
GROUP BY table_name,
column_name
HAVING Count(*) = 1;
0.078秒
SELECT DISTINCT table_name,
column_name,
Concat(table_name, '--', column_name) AS tc
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-05'
HAVING tc NOT IN(SELECT DISTINCT Concat(table_name, '--', column_name)
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-04');
0.125秒(一个新的解决方案,我认为今天上午的)
SELECT aa.table_name,
aa.column_name
FROM information_schema.columns aa
WHERE table_schema = 'wireless-2015-05'
AND NOT EXISTS (SELECT 1
FROM information_schema.columns
WHERE table_schema = 'wireless-2015-04'
AND table_name = aa.table_name
AND column_name = aa.column_name);
44.382秒。显然不是一个好的现实世界的解决方案。
information_schema对于查询来说相对昂贵,因为这些表并不是真实的,并且查询经常检查比查询实际需要的更多的内部结构。这有助于解释为什么第一个查询更快 - “LEFT JOIN(SELECT ...)BB'实际上创建了一个临时表”BB“* first *,因此查询中第二个表格实际上是在外部查询运行之前完全填充,与最后显示的非常缓慢的变体形成对比,这可能会针对每列向i_s发出请求。 –