2013-06-02 21 views
0

我正在寻求优化多个自连接或更好的表/数据库设计的建议。优化多个自动JOIN或重新设计数据库?

其中一个表如下所示(相关的cols只):

CREATE TABLE IF NOT EXISTS CountryData (
    countryDataID INT PRIMARY KEY AUTO_INCREMENT, 
    dataID INT NOT NULL REFERENCES DataSources (dataID), 
    dataCode VARCHAR(30) NULL, 
    countryID INT NOT NULL REFERENCES Countries (countryID), 
    year INT NOT NULL , 
    data DEC(20,4) NULL, 
    INDEX countryDataYear (dataID, countryID, year)); 

data列有几百指标,90个国家,30岁〜1MN行合计值。标准查询要求为特定年份和C国家选择N个指标,产生最多90行的CxN表。

将所有的值都放在一个列中,自连接似乎就是要走的路。所以我尝试了各种建议来加速这些建议,包括索引和创建新(临时)表。在9个自连接处,查询需要1分钟以下。除此之外,它永远旋转。

从那里自联接发生大约只有1000行,索引什么似乎是相关变量的新表 - 创作时间约0.5秒:

CREATE TABLE Growth 
    SELECT dataID, countryID, year, data 
    FROM CountryData 
    WHERE dataID > 522 AND year = 2017; 

CREATE INDEX growth_ix 
    ON Growth (dataID, countryID); 

SELECT查询再安排了到XX指标结果表中,有XX不幸< 10:

SELECT 
    Countries.countryName AS Country, 
    em01.em, 
    em02.em, 
    em03.em 
    ... 
    emX.em 
FROM  
    (SELECT 
     em1.data AS em, 
     em1.countryID 
    FROM Growth AS em1 
    WHERE 
    em1.dataID = 523) as em01 
    JOIN 
    (SELECT 
     em2.data AS em, 
     em2.countryID 
    FROM Growth AS em2 
    WHERE 
    em2.dataID = 524) as em02 
    USING (countryID) 
    JOIN 
    (SELECT 
     em3.data AS em, 
     em3.countryID 
    FROM Growth AS em3 
    WHERE 
    em3.dataID = 525) as em03 
    USING (countryID) 
    ... 
    JOIN 
    (SELECT 
     emX.data AS em, 
     emX.countryID 
    FROM Growth AS em5 
    WHERE 
    emX.dataID = 527) as emXX 
    USING (countryID) 
    JOIN Countries 
    USING (countryID) 

其实我想取回几个变量,加上可能加入其他表。现在我想知道是否有办法更有效地运行它,或者我应该采取一种完全不同的方法,例如使用带有不同列中指标的宽表来避免自连接。

+0

这可能属于[dba.se] –

回答

0

是给定countryIDyear或者可以在dataID出现多次使用不同的值唯一的dataID?如果它是唯一的,你可以尝试这样的事情?

SELECT countryID, year 
    ,MAX(CASE WHEN dataID = 523 THEN data ELSE NULL END) AS em0 
    ,MAX(CASE WHEN dataID = 524 THEN data ELSE NULL END) AS em1 
    ,... 
FROM CountryData 
GROUP BY countryID, year