2017-12-27 112 views
2

CRM表例子:我使用NOT IN,但它是缓慢

`crm` example: 
+----+--------+---------------------+--------------------+ 
| id | name |   date  |  status  | 
+----+--------+---------------------+--------------------+ 
| 1 | john | 2017-12-27 10:58:10 | A status   | 
| 2 | steve | 2017-12-27 10:58:08 | A status   | 
| 3 | eric | 2017-12-27 10:58:04 | Delivery Arranged | 
| 4 | phil | 2017-12-27 10:57:55 | A status   | 
| 5 | bob | 2017-12-27 10:57:52 | A status   | 
| 6 | foo | 2017-12-27 10:57:50 | A status   | 
| 7 | steven | 2017-12-27 10:57:48 | Delivery Arranged | 
| 8 | paul | 2017-12-27 10:57:43 | A status   | 
| 9 | alex | 2017-12-27 10:57:31 | Delivery Arranged | 

我查询的目的是要返回的crm行,其中的status交货安排的数量, date介于2017-12-012018-01-01之间。

所以,这里是我的主要查询

SET @from='2017-12-01'; 
SET @to='2018-01-01'; 


SELECT 
     COUNT(*) AS `delivery_arranged` 
    FROM 
     `crm` a 
    WHERE 
     a.`status` = 'Delivery Arranged' 
      AND DATE(a.`date`) BETWEEN @from AND @to 

结果:

+---------------------+ 
| delivery_arranged | 
+---------------------+ 
| 30     | 

都很好。但我想要折扣那些曾经有过的行(实际上除此日期范围外)已被设置为交货安排。我有一个statuslog表,我可以用这个:

STATUSLOG表例子:

`statuslog` example: 
+--------+-------+---------------------+-----------+---------------------+ 
| id | crmid |  date   | user |  status  | 
+--------+-------+---------------------+-----------+---------------------+ 
| 818572 | 1  | 2017-12-27 10:58:10 | johnsmith | Some status change | 
| 818571 | 2  | 2017-12-27 10:58:08 | johnsmith | Some status change | 
| 818570 | 3  | 2017-12-27 10:58:04 | another | Delivery Arranged | 
| 818569 | 4  | 2017-12-27 10:57:55 | another | Delivery Arranged | 
| 818568 | 5  | 2017-12-27 10:57:52 | johnsmith | Some status change | 
| 818567 | 6  | 2017-12-27 10:57:50 | another | Some status change | 
| 818566 | 7  | 2017-12-27 10:57:48 | johnsmith | Delivery Arranged | 
| 818565 | 8  | 2017-12-27 10:57:43 | another | Some status change | 
| 818564 | 9  | 2017-12-27 10:57:31 | johnsmith | Some status change | 

所以用这个表,我可以从statuslog得到行不日期间然后做一个NOT IN

SELECT 
     COUNT(*) AS `delivery_arranged` 
    FROM 
     `crm` a 
    WHERE 
     a.`status` = 'Delivery Arranged' 
      AND DATE(a.`date`) BETWEEN @from AND @to 
      AND a.`id` 
      NOT IN (
      SELECT 
       a.crmid AS `crmid` 
      FROM 
       statuslog a 
      WHERE 
       a.status = 'Delivery Arranged' 
        AND DATE(a.`date`) NOT BETWEEN @from AND @to 
      GROUP BY a.crmid 
      ORDER BY a.`date` DESC 
      ) 

这个工程,但取决于th e日期范围的大小可能需要很长时间! statuslog有> 2,000,000行。

如何使此查询更快?

+2

首先要看的是索引,你有没有做到这一点?然后SETS比VARCHARS更快然后TEXTS。 –

+0

获取查询执行的底部。找出查询的计划和成本。查找它使用的连接方法。然后,相应地,您可以深入查看所需的索引类型以过滤掉数据,并可能将查询重写为@Gordon Linoff在下面提出的建议。但首先需要做功课。 – Rizwan

回答

0

LEFT JOIN可能比代孕子查询更好:

SELECT 
    COUNT(*) AS `delivery_arranged` 
FROM 
    `crm` a 
LEFT OUTER JOIN 
    (
     SELECT 
      a.crmid AS `crmid` 
     FROM 
      statuslog a 
     WHERE 
      a.status = 'Delivery Arranged' 
       AND DATE(a.`date`) NOT BETWEEN @from AND @to 
     GROUP BY a.crmid 
     --ORDER BY a.`date` DESC --<-- this has no sense 
    ) b 
    on a.`id` = b.crmid 
WHERE 
    b.crmid is null and --<- not int translated to left join 
    a.`status` = 'Delivery Arranged' 
    AND DATE(a.`date`) BETWEEN @from AND @to 

另外,记得使用正确的索引。

0

这通常会更快,如果您使用的是LEFT JOIN/WHERE

SELECT COUNT(*) AS delivery_arranged 
FROM crm c LEFT JOIN 
    statuslog sl 
    ON sl.crmid = c.id AND 
     sl.status = 'Delivery Arranged' 
     sl.date >= @from AND 
     sl.date < @to + INTERVAL 1 DAY 
WHERE c.status = 'Delivery Arranged' AND 
     c.date >= @from AND 
     c.date < @to + INTERVAL 1 DAY AND 
     sl.crmid IS NULL; 

对于这个版本,你想在crm(status, date, id)statuslog(crmid, status, date)指标。

请注意,这会更改日期比较以避免在列上调用函数。这使得使用包含date列的索引更为可行。

+0

哈哈,几秒钟= =) – danihp

相关问题