2012-06-22 23 views
14

问题:为什么在并行执行时连接几乎空表的查询会导致MySQL性能下降?为什么在并行执行查询时MySQL的性能下降?

下面是我正面临的问题的更详细的解释。我在MySQL中有两个表格

CREATE TABLE first (
    num int(10) NOT NULL, 
    UNIQUE KEY key_num (num) 
) ENGINE=InnoDB 

CREATE TABLE second (
    num int(10) NOT NULL, 
    num2 int(10) NOT NULL, 
    UNIQUE KEY key_num (num, num2) 
) ENGINE=InnoDB 

第一个包含大约一千条记录。第二个是空的或包含很少的记录。它还包含双重索引,它与问题有关:问题会消失于单个索引。现在我正试图对这些表进行大量相同的查询。每个查询看起来是这样的:

SELECT first.num 
FROM first 
LEFT JOIN second AS second_1 ON second_1.num = -1 # non-existent key 
LEFT JOIN second AS second_2 ON second_2.num = -2 # non-existent key 
LEFT JOIN second AS second_3 ON second_3.num = -3 # non-existent key 
LEFT JOIN second AS second_4 ON second_4.num = -4 # non-existent key 
LEFT JOIN second AS second_5 ON second_5.num = -5 # non-existent key 
LEFT JOIN second AS second_6 ON second_6.num = -6 # non-existent key 
WHERE second_1.num IS NULL 
    AND second_2.num IS NULL 
    AND second_3.num IS NULL 
    AND second_4.num IS NULL 
    AND second_5.num IS NULL 
    AND second_6.num IS NULL 

我得到的问题是不是具有8个核心机器性能几乎呈线性提高我其实有一个下降。即有一个进程,我每秒钟的请求数大约是200.有两个进程,而不是预期的增加到每秒300到400个查询,我实际上有一个下降到150.对于10个进程,我只有70个查询每秒。我使用的测试Perl代码如下所示:

#!/usr/bin/perl 

use strict; 
use warnings; 

use DBI; 
use Parallel::Benchmark; 
use SQL::Abstract; 
use SQL::Abstract::Plugin::InsertMulti; 

my $children_dbh; 

foreach my $second_table_row_count (0, 1, 1000) { 
    print '#' x 80, "\nsecond_table_row_count = $second_table_row_count\n"; 
    create_and_fill_tables(1000, $second_table_row_count); 
    foreach my $concurrency (1, 2, 3, 4, 6, 8, 10, 20) { 
     my $bm = Parallel::Benchmark->new(
      'benchmark' => sub { 
       _run_sql(); 
       return 1; 
      }, 
      'concurrency' => $concurrency, 
      'time' => 3, 
     ); 
     my $result = $bm->run(); 
    } 
} 

sub create_and_fill_tables { 
    my ($first_table_row_count, $second_table_row_count) = @_; 
    my $dbh = dbi_connect(); 
    { 
     $dbh->do(q{DROP TABLE IF EXISTS first}); 
     $dbh->do(q{ 
      CREATE TABLE first (
       num int(10) NOT NULL, 
       UNIQUE KEY key_num (num) 
      ) ENGINE=InnoDB 
     }); 
     if ($first_table_row_count) { 
      my ($stmt, @bind) = SQL::Abstract->new()->insert_multi(
       'first', 
       ['num'], 
       [map {[$_]} 1 .. $first_table_row_count], 
      ); 
      $dbh->do($stmt, undef, @bind); 
     } 
    } 
    { 
     $dbh->do(q{DROP TABLE IF EXISTS second}); 
     $dbh->do(q{ 
      CREATE TABLE second (
       num int(10) NOT NULL, 
       num2 int(10) NOT NULL, 
       UNIQUE KEY key_num (num, num2) 
      ) ENGINE=InnoDB 
     }); 
     if ($second_table_row_count) { 
      my ($stmt, @bind) = SQL::Abstract->new()->insert_multi(
       'second', 
       ['num'], 
       [map {[$_]} 1 .. $second_table_row_count], 
      ); 
      $dbh->do($stmt, undef, @bind); 
     } 
    } 
} 

sub _run_sql { 
    $children_dbh ||= dbi_connect(); 
    $children_dbh->selectall_arrayref(q{ 
     SELECT first.num 
     FROM first 
     LEFT JOIN second AS second_1 ON second_1.num = -1 
     LEFT JOIN second AS second_2 ON second_2.num = -2 
     LEFT JOIN second AS second_3 ON second_3.num = -3 
     LEFT JOIN second AS second_4 ON second_4.num = -4 
     LEFT JOIN second AS second_5 ON second_5.num = -5 
     LEFT JOIN second AS second_6 ON second_6.num = -6 
     WHERE second_1.num IS NULL 
      AND second_2.num IS NULL 
      AND second_3.num IS NULL 
      AND second_4.num IS NULL 
      AND second_5.num IS NULL 
      AND second_6.num IS NULL 
    }); 
} 

sub dbi_connect { 
    return DBI->connect(
     'dbi:mysql:' 
      . 'database=tmp' 
      . ';host=localhost' 
      . ';port=3306', 
     'root', 
     '', 
    ); 
} 

而对于比较喜欢这种并发与提高性能执行的查询:

SELECT first.num 
FROM first 
LEFT JOIN second AS second_1 ON second_1.num = 1 # existent key 
LEFT JOIN second AS second_2 ON second_2.num = 2 # existent key 
LEFT JOIN second AS second_3 ON second_3.num = 3 # existent key 
LEFT JOIN second AS second_4 ON second_4.num = 4 # existent key 
LEFT JOIN second AS second_5 ON second_5.num = 5 # existent key 
LEFT JOIN second AS second_6 ON second_6.num = 6 # existent key 
WHERE second_1.num IS NOT NULL 
    AND second_2.num IS NOT NULL 
    AND second_3.num IS NOT NULL 
    AND second_4.num IS NOT NULL 
    AND second_5.num IS NOT NULL 
    AND second_6.num IS NOT NULL 

的测试结果,CPU和磁盘使用情况测量的位置:

 
* table `first` have 1000 rows 
* table `second` have 6 rows: `[1,1],[2,2],..[6,6]` 

For query: 
    SELECT first.num 
    FROM first 
    LEFT JOIN second AS second_1 ON second_1.num = -1 # non-existent key 
    LEFT JOIN second AS second_2 ON second_2.num = -2 # non-existent key 
    LEFT JOIN second AS second_3 ON second_3.num = -3 # non-existent key 
    LEFT JOIN second AS second_4 ON second_4.num = -4 # non-existent key 
    LEFT JOIN second AS second_5 ON second_5.num = -5 # non-existent key 
    LEFT JOIN second AS second_6 ON second_6.num = -6 # non-existent key 
    WHERE second_1.num IS NULL 
     AND second_2.num IS NULL 
     AND second_3.num IS NULL 
     AND second_4.num IS NULL 
     AND second_5.num IS NULL 
     AND second_6.num IS NULL 

Results: 
    concurrency: 1,  speed: 162.910/sec 
    concurrency: 2,  speed: 137.818/sec 
    concurrency: 3,  speed: 130.728/sec 
    concurrency: 4,  speed: 107.387/sec 
    concurrency: 6,  speed: 90.513/sec 
    concurrency: 8,  speed: 80.445/sec 
    concurrency: 10, speed: 80.381/sec 
    concurrency: 20, speed: 84.069/sec 

System usage after for last 60 minutes of running query in 6 processes: 
    $ iostat -cdkx 60 

    avg-cpu: %user %nice %system %iowait %steal %idle 
       74.82 0.00 0.08 0.00 0.08 25.02 

    Device:   rrqm/s wrqm/s  r/s  w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
    sda1    0.00  0.00 0.00 0.12  0.00  0.80 13.71  0.00 1.43 1.43 0.02 
    sdf10    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf4    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 30.00 15.00 0.05 
    sdm    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf8    0.00  0.00 0.00 0.37  0.00  1.24  6.77  0.00 5.00 3.18 0.12 
    sdf6    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf9    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 0.00 0.00 0.00 
    sdf    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf3    0.00  0.00 0.00 0.08  0.00  1.33 32.00  0.00 4.00 4.00 0.03 
    sdf2    0.00  0.00 0.00 0.17  0.00  1.37 16.50  0.00 3.00 3.00 0.05 
    sdf15    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf14    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf1    0.00  0.00 0.00 0.05  0.00  0.40 16.00  0.00 0.00 0.00 0.00 
    sdf13    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf5    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 50.00 25.00 0.08 
    sdm2    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdm1    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf12    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf11    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf7    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    md0    0.00  0.00 0.00 0.97  0.00 13.95 28.86  0.00 0.00 0.00 0.00 

################################################################################ 

For query: 
    SELECT first.num 
    FROM first 
    LEFT JOIN second AS second_1 ON second_1.num = 1 # existent key 
    LEFT JOIN second AS second_2 ON second_2.num = 2 # existent key 
    LEFT JOIN second AS second_3 ON second_3.num = 3 # existent key 
    LEFT JOIN second AS second_4 ON second_4.num = 4 # existent key 
    LEFT JOIN second AS second_5 ON second_5.num = 5 # existent key 
    LEFT JOIN second AS second_6 ON second_6.num = 6 # existent key 
    WHERE second_1.num IS NOT NULL 
     AND second_2.num IS NOT NULL 
     AND second_3.num IS NOT NULL 
     AND second_4.num IS NOT NULL 
     AND second_5.num IS NOT NULL 
     AND second_6.num IS NOT NULL 

Results: 
    concurrency: 1,  speed: 875.973/sec 
    concurrency: 2,  speed: 944.986/sec 
    concurrency: 3,  speed: 1256.072/sec 
    concurrency: 4,  speed: 1401.657/sec 
    concurrency: 6,  speed: 1354.351/sec 
    concurrency: 8,  speed: 1110.100/sec 
    concurrency: 10, speed: 1145.251/sec 
    concurrency: 20, speed: 1142.514/sec 

System usage after for last 60 minutes of running query in 6 processes: 
    $ iostat -cdkx 60 

    avg-cpu: %user %nice %system %iowait %steal %idle 
       74.40 0.00 0.53 0.00 0.06 25.01 

    Device:   rrqm/s wrqm/s  r/s  w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util 
    sda1    0.00  0.00 0.00 0.02  0.00  0.13 16.00  0.00 0.00 0.00 0.00 
    sdf10    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf4    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdm    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf8    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf6    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 0.00 0.00 0.00 
    sdf9    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdf    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf3    0.00  0.00 0.00 0.13  0.00  2.67 40.00  0.00 3.75 2.50 0.03 
    sdf2    0.00  0.00 0.00 0.23  0.00  2.72 23.29  0.00 2.14 1.43 0.03 
    sdf15    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf14    0.00  0.00 0.00 0.98  0.00  0.54  1.10  0.00 2.71 2.71 0.27 
    sdf1    0.00  0.00 0.00 0.08  0.00  1.47 35.20  0.00 8.00 6.00 0.05 
    sdf13    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf5    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    sdm2    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdm1    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf12    0.00  0.00 0.00 0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00 
    sdf11    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 0.00 0.00 0.00 
    sdf7    0.00  0.00 0.00 0.03  0.00  1.07 64.00  0.00 10.00 5.00 0.02 
    md0    0.00  0.00 0.00 1.70  0.00 15.92 18.74  0.00 0.00 0.00 0.00 

################################################################################ 

And this server has lots of free memory. Example of top: 
    top - 19:02:59 up 4:23, 4 users, load average: 4.43, 3.03, 2.01 
    Tasks: 218 total, 1 running, 217 sleeping, 0 stopped, 0 zombie 
    Cpu(s): 72.8%us, 0.7%sy, 0.0%ni, 26.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.1%st 
    Mem: 71701416k total, 22183980k used, 49517436k free,  284k buffers 
    Swap:  0k total,  0k used,  0k free, 1282768k cached 

     PID USER  PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
    2506 mysql  20 0 51.7g 17g 5920 S 590 25.8 213:15.12 mysqld 
    9348 topadver 20 0 72256 11m 1428 S 2 0.0 0:01.45 perl 
    9349 topadver 20 0 72256 11m 1428 S 2 0.0 0:01.44 perl 
    9350 topadver 20 0 72256 11m 1428 S 2 0.0 0:01.45 perl 
    9351 topadver 20 0 72256 11m 1428 S 1 0.0 0:01.44 perl 
    9352 topadver 20 0 72256 11m 1428 S 1 0.0 0:01.44 perl 
    9353 topadver 20 0 72256 11m 1428 S 1 0.0 0:01.44 perl 
    9346 topadver 20 0 19340 1504 1064 R 0 0.0 0:01.89 top 

有没有人有一个想法,为什么性能下降的查询与不存在的密钥?

+0

如果您的create table已经有NOT NULL条件,为什么使用'where num .... is null'? – jcho360

+0

@ jcho360左连接将创建这样的空值。这看起来像配置。 Awayka,你能提供关于你的MYSQL服务器的一些信息吗?它有多个处理器吗? – Twelfth

+0

@Twelfth mysql Ver 14.14 Distrib 5.1.59,用于debian-linux-gnu(x86_64)在m2.4xlarge EC2实例上使用readline 5.1,其中有8个内核 – awayka

回答

1

我会建议尝试一种方法,其中每个fork使用自己的连接(它看起来对我来说现在$children_dbh,它持有一个数据库连接,是一个共享变量)。或者,甚至更好的是,实施所谓的connection pool,每个客户端进程都将在需要时进行连接,并在不再需要时将其“还原”。

查看this answer了解更多详情:它给出的线程是关于Java的,但它实际上是关于MySQL组织的一些通用原则。 this answer也可能有用。

P.S.有些类似的情况(我认为)被描述为here,并且有关于如何组织连接池的详细说明。

+0

'_run_sql()'这个字符串'$ children_dbh || = dbi_connect()'会做什么? – raina77ow

+3

看起来像我没有线程:每个进程的单线程。你在哪里看到线程? [Parallel :: Benchmark](http://search.cpan.org/~fujiwara/Parallel-Benchmark-0.04/lib/Parallel/Benchmark.pm)使用[Parallel :: ForkManager](http://search.cpan。 org /〜dlux/Parallel-ForkManager-0.7.9/lib/Parallel/ForkManager.pm)。 – nab

+0

我的观点是有一个共享连接,这就解释了为什么每个新流程的性能实际上会变差。而且,我再说一遍,很容易检查实际使用的连接数量。说'看起来'并且辩论理论是没有意义的:要么使用单一连接 - 要么不使用。 – raina77ow

8

写得很好的问题,这表明一些研究。

出于好奇,我尝试了MySQL 5.6,看看有什么工具要说这些查询。

首先,请注意查询不同:

  • 从“1”到“1”的存在/不存在 重点案件是一回事
  • 改变“SECOND_1更改值。 num IS NOT NULL“至 ”WHERE子句中的second_1.num IS NULL“是另一个。

使用EXPLAIN给出了不同的计划:

EXPLAIN SELECT `first`.num 
FROM `first` 
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key 
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key 
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key 
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key 
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key 
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key 
WHERE second_1.num IS NULL 
AND second_2.num IS NULL 
AND second_3.num IS NULL 
AND second_4.num IS NULL 
AND second_5.num IS NULL 
AND second_6.num IS NULL 
; 
id  select_type  table type possible_keys key  key_len ref  rows Extra 
1  SIMPLE first index NULL key_num 4  NULL 1000 Using index 
1  SIMPLE second_1  ref  key_num key_num 4  const 1  Using where; Not exists; Using index 
1  SIMPLE second_2  ref  key_num key_num 4  const 1  Using where; Not exists; Using index 
1  SIMPLE second_3  ref  key_num key_num 4  const 1  Using where; Not exists; Using index 
1  SIMPLE second_4  ref  key_num key_num 4  const 1  Using where; Not exists; Using index 
1  SIMPLE second_5  ref  key_num key_num 4  const 1  Using where; Not exists; Using index 
1  SIMPLE second_6  ref  key_num key_num 4  const 1  Using where; Not exists; Using index 

,而不是

EXPLAIN SELECT `first`.num 
FROM `first` 
LEFT JOIN `second` AS second_1 ON second_1.num = 1 # existent key 
LEFT JOIN `second` AS second_2 ON second_2.num = 2 # existent key 
LEFT JOIN `second` AS second_3 ON second_3.num = 3 # existent key 
LEFT JOIN `second` AS second_4 ON second_4.num = 4 # existent key 
LEFT JOIN `second` AS second_5 ON second_5.num = 5 # existent key 
LEFT JOIN `second` AS second_6 ON second_6.num = 6 # existent key 
WHERE second_1.num IS NOT NULL 
AND second_2.num IS NOT NULL 
AND second_3.num IS NOT NULL 
AND second_4.num IS NOT NULL 
AND second_5.num IS NOT NULL 
AND second_6.num IS NOT NULL 
; 
id  select_type  table type possible_keys key  key_len ref  rows Extra 
1  SIMPLE second_1  ref  key_num key_num 4  const 1  Using index 
1  SIMPLE second_2  ref  key_num key_num 4  const 1  Using index 
1  SIMPLE second_3  ref  key_num key_num 4  const 1  Using index 
1  SIMPLE second_4  ref  key_num key_num 4  const 1  Using index 
1  SIMPLE second_5  ref  key_num key_num 4  const 1  Using index 
1  SIMPLE second_6  ref  key_num key_num 4  const 1  Using index 
1  SIMPLE first index NULL key_num 4  NULL 1000 Using index; Using join buffer (Block Nested Loop) 

使用JSON格式,我们有:

EXPLAIN FORMAT=JSON SELECT `first`.num 
FROM `first` 
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key 
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key 
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key 
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key 
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key 
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key 
WHERE second_1.num IS NULL 
AND second_2.num IS NULL 
AND second_3.num IS NULL 
AND second_4.num IS NULL 
AND second_5.num IS NULL 
AND second_6.num IS NULL 
; 
EXPLAIN 
{ 
    "query_block": { 
    "select_id": 1, 
    "nested_loop": [ 
     { 
     "table": { 
      "table_name": "first", 
      "access_type": "index", 
      "key": "key_num", 
      "key_length": "4", 
      "rows": 1000, 
      "filtered": 100, 
      "using_index": true 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_1", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "not_exists": true, 
      "using_index": true, 
      "attached_condition": "<if>(found_match(second_1), isnull(`test`.`second_1`.`num`), true)" 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_2", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "not_exists": true, 
      "using_index": true, 
      "attached_condition": "<if>(found_match(second_2), isnull(`test`.`second_2`.`num`), true)" 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_3", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "not_exists": true, 
      "using_index": true, 
      "attached_condition": "<if>(found_match(second_3), isnull(`test`.`second_3`.`num`), true)" 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_4", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "not_exists": true, 
      "using_index": true, 
      "attached_condition": "<if>(found_match(second_4), isnull(`test`.`second_4`.`num`), true)" 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_5", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "not_exists": true, 
      "using_index": true, 
      "attached_condition": "<if>(found_match(second_5), isnull(`test`.`second_5`.`num`), true)" 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_6", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "not_exists": true, 
      "using_index": true, 
      "attached_condition": "<if>(found_match(second_6), isnull(`test`.`second_6`.`num`), true)" 
     } 
     } 
    ] 
    } 
} 

,而不是

EXPLAIN FORMAT=JSON SELECT `first`.num 
FROM `first` 
LEFT JOIN `second` AS second_1 ON second_1.num = 1 # existent key 
LEFT JOIN `second` AS second_2 ON second_2.num = 2 # existent key 
LEFT JOIN `second` AS second_3 ON second_3.num = 3 # existent key 
LEFT JOIN `second` AS second_4 ON second_4.num = 4 # existent key 
LEFT JOIN `second` AS second_5 ON second_5.num = 5 # existent key 
LEFT JOIN `second` AS second_6 ON second_6.num = 6 # existent key 
WHERE second_1.num IS NOT NULL 
AND second_2.num IS NOT NULL 
AND second_3.num IS NOT NULL 
AND second_4.num IS NOT NULL 
AND second_5.num IS NOT NULL 
AND second_6.num IS NOT NULL 
; 
EXPLAIN 
{ 
    "query_block": { 
    "select_id": 1, 
    "nested_loop": [ 
     { 
     "table": { 
      "table_name": "second_1", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "using_index": true 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_2", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "using_index": true 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_3", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "using_index": true 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_4", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "using_index": true 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_5", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "using_index": true 
     } 
     }, 
     { 
     "table": { 
      "table_name": "second_6", 
      "access_type": "ref", 
      "possible_keys": [ 
      "key_num" 
      ], 
      "key": "key_num", 
      "key_length": "4", 
      "ref": [ 
      "const" 
      ], 
      "rows": 1, 
      "filtered": 100, 
      "using_index": true 
     } 
     }, 
     { 
     "table": { 
      "table_name": "first", 
      "access_type": "index", 
      "key": "key_num", 
      "key_length": "4", 
      "rows": 1000, 
      "filtered": 100, 
      "using_index": true, 
      "using_join_buffer": "Block Nested Loop" 
     } 
     } 
    ] 
    } 
} 

看着IO在运行时的性能架构仪器表,我们有:

truncate table performance_schema.objects_summary_global_by_type; 
select * from performance_schema.objects_summary_global_by_type 
where OBJECT_NAME in ("first", "second"); 
OBJECT_TYPE OBJECT_SCHEMA OBJECT_NAME COUNT_STAR SUM_TIMER_WAIT MIN_TIMER_WAIT AVG_TIMER_WAIT MAX_TIMER_WAIT 
TABLE test first 0 0 0 0 0 
TABLE test second 0 0 0 0 0 
SELECT `first`.num 
FROM `first` 
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key 
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key 
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key 
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key 
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key 
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key 
WHERE second_1.num IS NULL 
AND second_2.num IS NULL 
AND second_3.num IS NULL 
AND second_4.num IS NULL 
AND second_5.num IS NULL 
AND second_6.num IS NULL 
; 
(...) 
select * from performance_schema.objects_summary_global_by_type 
where OBJECT_NAME in ("first", "second"); 
OBJECT_TYPE OBJECT_SCHEMA OBJECT_NAME COUNT_STAR SUM_TIMER_WAIT MIN_TIMER_WAIT AVG_TIMER_WAIT MAX_TIMER_WAIT 
TABLE test first 1003 5705014442 1026171 5687889 87356557 
TABLE test second 6012 271786533972 537266 45207298 1123939292 

,而不是:

select * from performance_schema.objects_summary_global_by_type 
where OBJECT_NAME in ("first", "second"); 
OBJECT_TYPE OBJECT_SCHEMA OBJECT_NAME COUNT_STAR SUM_TIMER_WAIT MIN_TIMER_WAIT AVG_TIMER_WAIT MAX_TIMER_WAIT 
TABLE test first 1003 5211074603 969338 5195454 61066176 
TABLE test second 24 458656783 510085 19110361 66229860 

可以扩展做几乎没有表IO表查询second。 不缩放的查询在表second中执行6K表IO,或者是表first大小的6倍。

这是因为查询计划不同,因为查询是不同的(IS NOT NULL与IS NULL)。

我认为这回答了性能相关的问题。

请注意,这两个查询在我的测试中返回了1000行,这可能不是您想要的。 在调整查询以使其更快之前,请确保它按预期工作。