慢MySQL查询占满了我的磁盘空间

这是我当前正在运行的查询（28小时过去了！）：慢MySQL查询占满了我的磁盘空间

drop table if exists temp_codes; 
create temporary table temp_codes 
    select distinct CODE from Table1; 
alter table temp_codes 
    add primary key (CODE); 

drop table if exists temp_ids; 
create temporary table temp_ids 
    select distinct ID from Table1; 
alter table temp_ids 
    add primary key (ID); 

drop table if exists temp_ids_codes; 
create temporary table temp_ids_codes 
    select ID, CODE 
    from temp_ids, temp_codes; 

alter table temp_ids_codes 
    add index idx_id(ID), 
    add index idx_code(CODE); 

insert into Table2(ID,CODE,cnt) 
select 
    a.ID, a.CODE, coalesce(count(t1.ID), 0) 
from 
    temp_ids_codes as a 
    left join Table1 as t1 on (a.ID = t1.ID and a.CODE=t1.CODE) 
group by 
    a.ID, a.CODE;

我的表是这样的（表1）：

ID   CODE 
----------------- 
0001  345 
0001  345 
0001  120 
0002  567 
0002  034 
0002  567 
0003  567 
0004  533 
0004  008 
...... 
(millions of rows)

而且我运行上面的查询，以获得本（表2）：

ID CODE CNT 
1 008  0 
1 034  0 
1 120  1 
1 345  2 
1 533  0 
1 567  0 
2 008  0 
2 034  1 
...

CNT是每个代码的计数的每个ID .. 如何以最佳方式实现此目的以提高性能并且不使用磁盘空间？谢谢

来源

2013-08-06 user2578185

您确定只有6个编码？我怀疑交叉连接产生的数据比你想象的要多得多。 –

不，我有成千上万的代码...这只是一个样本 – user2578185

用LIMIT 1000开始查询并查看结果有什么问题 – jaczes

你是数以百万计的id乘以数千码，并想知道为什么你占用磁盘空间。您正在生成数十亿行。这将需要很长时间。

我可能会提出一些建议（应该重新启动进程还是让资源并行运行）。

首先，将中间结果保存在实际表格中，可能在另一个数据库（“myTmp”）中，这样您就可以监视进度。

其次，在最终查询中加入前进行聚合。事实上，由于使用的是临时表，把这个表中的第一：

select t1.ID, t1.CODE, count(*) as cnt 
from Table1 as t1 
group by t1.ID, t1.CODE;

现在，你是包括所有的额外代码，然后乘以分组的原始数据。

然后从完整的表格加入到这个表格中。

另一种方法是给在原表上的索引，并试试这个：

insert into Table2(ID,CODE,cnt) 
select a.ID, a.CODE, 
     (select count(*) from Table1 t1 where a.ID = t1.ID and a.CODE=t1.CODE) as cnt 
from temp_ids_codes a 
group by a.ID, a.CODE;

这可能看起来有点反常，但它会使用索引表1上的相关子查询。我不喜欢用SQL来玩这样的游戏，但这可能会导致查询在我们有生之年完成。

来源

2013-08-06 12:08:52

哪里是WHERE子句：

create temporary table temp_ids_codes 
select ID, CODE 
from temp_ids, temp_codes;

表应该有collumns PK ID, CODE

来源

2013-08-06 12:03:42 jaczes

我没有一个...我只想获得每个ID上的每个代码的计数（包括零计数） – user2578185

如果我的查询得到更快在这些列上有PK？ – user2578185

是的，但是@Gordon Linoff给出了更好的解决方案 - 对于他的解决方案，您可以添加PK – jaczes

你可以尝试大意如下的东西（未经测试查询）：

select a.ID, 
     a.CODE, 
     coalesce(b.countvalue), 0) 
from temp_ids_codes as a 
left join (select count(t1.ID) as countvalue 
      from Table1 as t1 
      group by a.ID, a.CODE 
      ) b

现在你的小组通过将只安装在需要分组（而不是对所有的0计数记录）记录运行。正确的指数也可以产生巨大的差异。

来源

2013-08-06 12:11:20 Sam

慢MySQL查询占满了我的磁盘空间

回答

相关问题