MySQL的执行计划与代价模型详细解析

发表于 2020-02-28 更新于 2025-06-25 分类于 MySQL

背景

为了后续的沟通方便，在20200213的早上创建了一个**《简栈-Java技术交流群》**，也方便大家通过扫二维码积极的参与进来。

如果是二维码已经过期，大家可以添加简栈文化-小助手的微信号（lastpass4u），然后让他拉大家进群进群。我们保持着小而美的精神，宁缺毋滥。

然后早上群里就有人提了一个问题：

执行计划里面的扫描函数跟执行时间不匹配，比如查询优化器发现，扫描a索引行数更多，所以更慢，因此优化器选择了索引b, 但实际上走b索引的时候比a更慢，走a索引大概是4秒左右，b是8秒。

这个问题激发起了大家的讨论，有的人建议说：

1、这种可以强制指定索引执行的吧

2、这个扫描行数都是预估的不一定准的，能操作shell的话执行analyse table看看。

3、看一下你的index，DDL，explain等等

但提问者明显这些都是已经自己搞清楚了的，他关心的是底层的优化器成本规则等。这类我才意识到EXPLAIN出来的是结果，其实数据库底层本身是有优化器的，而最终选择谁，是否过索引等都是有它的规则的。这其中都涉及到效率与成本问题。

Explain执行计划详解

使用explain关键字可以模拟优化器执行SQL查询语句，从而知道MySQL是如何处理你的SQL语句的，分析你的查询语句或是表结构的性能瓶颈。

Explain执行计划包含的信息

![http://static.cyblogs.com/SouthEast (6).png](http://static.cyblogs.com/SouthEast (6).png)

其中最重要的字段为：id、type、key、rows、Extra。

id字段

select查询的序列号，包含一组数字，表示查询中执行select子句或操作表的顺序

三种情况：
- 1、id相同：执行顺序由上至下

![http://static.cyblogs.com/SouthEast (4).png](http://static.cyblogs.com/SouthEast (4).png)

2、id不同：如果是子查询，id的序号会递增，id值越大优先级越高，越先被执行

![http://static.cyblogs.com/SouthEast (8).png](http://static.cyblogs.com/SouthEast (8).png)

3、id相同又不同（两种情况同时存在）：id如果相同，可以认为是一组，从上往下顺序执行；在所有组中，id值越大，优先级越高，越先执行

![http://static.cyblogs.com/SouthEast (5).png](http://static.cyblogs.com/SouthEast (5).png)

select_type字段

查询的类型，主要是用于区分普通查询、联合查询、子查询等复杂的查询

1、SIMPLE：简单的select查询，查询中不包含子查询或者union
2、PRIMARY：查询中包含任何复杂的子部分，最外层查询则被标记为primary
3、SUBQUERY：在select 或 where列表中包含了子查询
4、DERIVED：在from列表中包含的子查询被标记为derived（衍生），mysql或递归执行这些子查询，把结果放在零时表里
5、UNION：若第二个select出现在union之后，则被标记为union；若union包含在from子句的子查询中，外层select将被标记为derived
6、UNION RESULT：从union表获取结果的select

![http://static.cyblogs.com/SouthEast (9).png](http://static.cyblogs.com/SouthEast (9).png)

type字段

访问类型，sql查询优化中一个很重要的指标，结果值从好到坏依次是：

system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

一般来说，好的sql查询至少达到range级别，最好能达到ref

1、system：表只有一行记录（等于系统表），这是const类型的特例，平时不会出现，可以忽略不计

2、const：表示通过索引一次就找到了，const用于比较primary key 或者 unique索引。因为只需匹配一行数据，所有很快。如果将主键置于where列表中，mysql就能将该查询转换为一个const。

3、eq_ref：唯一性索引扫描，对于每个索引键，表中只有一条记录与之匹配。常见于主键或唯一索引扫描。

注意：ALL全表扫描的表记录最少的表如t1表

![http://static.cyblogs.com/SouthEast (2).png](http://static.cyblogs.com/SouthEast (2).png)

4、ref：非唯一性索引扫描，返回匹配某个单独值的所有行。本质是也是一种索引访问，它返回所有匹配某个单独值的行，然而他可能会找到多个符合条件的行，所以它应该属于查找和扫描的混合体

![http://static.cyblogs.com/SouthEast (3).png](http://static.cyblogs.com/SouthEast (3).png)

5、range：只检索给定范围的行，使用一个索引来选择行。key列显示使用了那个索引。一般就是在where语句中出现了bettween、<、>、in等的查询。这种索引列上的范围扫描比全索引扫描要好。只需要开始于某个点，结束于另一个点，不用扫描全部索引

![http://static.cyblogs.com/SouthEast (7).png](http://static.cyblogs.com/SouthEast (7).png)

6、index：Full Index Scan，index与ALL区别为index类型只遍历索引树。这通常为ALL块，应为索引文件通常比数据文件小。（Index与ALL虽然都是读全表，但index是从索引中读取，而ALL是从硬盘读取）

![http://static.cyblogs.com/SouthEast (1).png](http://static.cyblogs.com/SouthEast (1).png)

7、ALL：Full Table Scan，遍历全表以找到匹配的行

![http://static.cyblogs.com/SouthEast (10).png](http://static.cyblogs.com/SouthEast (10).png)

possible_keys字段

查询涉及到的字段上存在索引，则该索引将被列出，但不一定被查询实际使用

key字段

实际使用的索引，如果为NULL，则没有使用索引。
查询中如果使用了覆盖索引，则该索引仅出现在key列表中

![http://static.cyblogs.com/SouthEast (11).png](http://static.cyblogs.com/SouthEast (11).png)

![http://static.cyblogs.com/SouthEast (2).png](http://static.cyblogs.com/SouthEast (12).png)

key_len字段

表示索引中使用的字节数，查询中使用的索引的长度（最大可能长度），并非实际使用长度，理论上长度越短越好。key_len是根据表定义计算而得的，不是通过表内检索出的

ref字段

显示索引的那一列被使用了，如果可能，是一个常量const。

rows字段

根据表统计信息及索引选用情况，大致估算出找到所需的记录所需要读取的行数

Extra字段

不适合在其他字段中显示，但是十分重要的额外信息

1、Using filesort

mysql对数据使用一个外部的索引排序，而不是按照表内的索引进行排序读取。也就是说mysql无法利用索引完成的排序操作成为“文件排序”

由于索引是先按email排序、再按address排序，所以查询时如果直接按address排序，索引就不能满足要求了，mysql内部必须再实现一次“文件排序”

![http://static.cyblogs.com/SouthEast (13).png](http://static.cyblogs.com/SouthEast (13).png)

2、Using temporary

使用临时表保存中间结果，也就是说mysql在对查询结果排序时使用了临时表，常见于order by 和 group by

![http://static.cyblogs.com/SouthEast (14).png](http://static.cyblogs.com/SouthEast (14).png)

3、Using index

表示相应的select操作中使用了覆盖索引（Covering Index），避免了访问表的数据行，效率高
如果同时出现Using where，表明索引被用来执行索引键值的查找（参考上图）
如果没用同时出现Using where，表明索引用来读取数据而非执行查找动作

覆盖索引（Covering Index）：也叫索引覆盖。就是select列表中的字段，只用从索引中就能获取，不必根据索引再次读取数据文件，换句话说查询列要被所建的索引覆盖。
注意：
a、如需使用覆盖索引，select列表中的字段只取出需要的列，不要使用select *
b、如果将所有字段都建索引会导致索引文件过大，反而降低crud性能

![http://static.cyblogs.com/SouthEast (15).png](http://static.cyblogs.com/SouthEast (15).png)

4、Using where

使用了where过滤

5、Using join buffer

使用了链接缓存

6、Impossible WHERE

where子句的值总是false，不能用来获取任何元祖

7、select tables optimized away

在没有group by子句的情况下，基于索引优化MIN/MAX操作或者对于MyISAM存储引擎优化COUNT（*）操作，不必等到执行阶段在进行计算，查询执行计划生成的阶段即可完成优化

8、distinct

优化distinct操作，在找到第一个匹配的元祖后即停止找同样值得动作

优化器代价模型

指标

代价模型：RBO(基于规则的优化)、CBO(基于成本的优化)
SQL的每一种执行路径，均可计算一个对应的执行代价，代价越小，执行效率越高

CBO方式成本的计算

1	Total cost = CPU cost + IO cost

CPU cost计算模型

CPU cost = rows/5 + (rows/10 if comparing key)

CPU cost:

MySQL上层，处理返回记录所花开销

CPU Cost=records/TIME_FOR_COMPARE=Records/5

每5条记录处的时间，作为1 Cost

IO cost计算模型

IO cost以聚集索引叶子节点的数量进行计算

全扫描
IO Cost = table stat_clustered_index_size
聚簇索引page总数，一个page作为1 cost

范围扫描
IO Cost = [(ranges+rows)/total_rows]*全扫描IO Cost
聚簇索引范围扫描与返回记录成比率

若需要回表，则IO cost以预估的记录数量进行计算，开销相当巨大

二级索引之索引覆盖扫描

索引覆盖扫描，减少返回聚簇索引的IO代价
keys_per_block=(stats_block_size/2)/(key_info[keynr].key_lenth+ref_length+1)
stats_block_size/2 = 索引页半满

IO Cost：(records+keys_per_block-1)/keys_per_block

计算range占用多少个二级索引页面，既为索引覆盖扫描的IO Cost

二级索引之索引非覆盖扫描

索引非覆盖扫描，需要回聚簇索引读取完整记录，增加IO代价

IO Cost = （range+rows）

range:多少个范围
对于IN查询，就会转换为多个索引范围查询

row:为范围中一共有多少记录
由于每一条记录都需要返回聚簇索引，因此每一条记录都会产生1 Cost

Cost模型分析

聚簇索引扫描代价为索引页面总数量
二级索引覆盖扫描代价较小
二级索引非覆盖扫描，代价巨大
Cost模型的计算，需要统计信息的支持
- stat_clustered_index_size
- ranges
- records/rows
- stats_block_size
- key_info[keynr].key_length
- rec_per_key
- ……

实战如何看日志确定Cost的选择

drop database lyj;
create database lyj;
use lyj;

create table t1 (
    c1 int(11) not null default '0',
    c2 varchar(128) default null,
    c3 varchar(64) default null,
    c4 int(11) default null,
    primary key (c1),
    key ind_c2 (c2),
    key ind_c4 (c4));

insert into t1 values(1,'a','A',10);
insert into t1 values(2,'b','B',20);
insert into t1 values(3,'b','BB',20);
insert into t1 values(4,'b','BBB',30);
insert into t1 values(5,'b','BBB',40);
insert into t1 values(6,'c','C',50);
insert into t1 values(7,'d','D',60);
commit;

select * from t1;
+----+------+------+------+
| c1 | c2   | c3   | c4   |
+----+------+------+------+
|  1 | a    | A    |   10 |
|  2 | b    | B    |   20 |
|  3 | b    | BB   |   20 |
|  4 | b    | BBB  |   30 |
|  5 | b    | BBB  |   40 |
|  6 | c    | C    |   50 |
|  7 | d    | D    |   60 |
+----+------+------+------+

执行以下SQL为什么不走索引ind_c2？

mysql> explain select * from t1 where c4=20 and c2='b'\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c4
      key_len: 5
          ref: const
         rows: 2
        Extra: Using where

set optimizer_trace='enabled=on';
set optimizer_trace_max_mem_size=1000000;
set end_markers_in_json=on;

select * from t1 where c4=20 and c2='b';

mysql> select * from information_schema.optimizer_trace\G;
*************************** 1. row ***************************
                            QUERY: select * from t1 where c4=20 and c2='b'
                            TRACE: {
......
                  "potential_range_indices": [ # 列出备选索引
                    {
                      "index": "PRIMARY", 
                      "usable": false,         # 本行表明主键索引不可用
                      "cause": "not_applicable"
                    },
                    {
                      "index": "ind_c2",
                      "usable": true,
                      "key_parts": [
                        "c2",
                        "c1"
                      ] /* key_parts */
                    },
                    {
                      "index": "ind_c4",
                      "usable": true,
                      "key_parts": [
                        "c4",
                        "c1"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indices */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  "analyzing_range_alternatives": {   # 开始计算每个索引做范围扫描的花费
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "b <= c2 <= b"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 4,         # c2=b的结果有4行
                        "cost": 5.81,
                        "chosen": false,   # 这个索引没有被选中，原因是cost
                        "cause": "cost"   
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": true     # 这个索引的代价最小,被选中
                      }
......
                  "chosen_range_access_summary": {  # 总结：因为cost最小选择了ind_c4
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "ind_c4",
                      "rows": 2,
                      "ranges": [
                        "20 <= c4 <= 20"
                      ] /* ranges */
                    } /* range_access_plan */,
                    "rows_for_plan": 2,
                    "cost_for_plan": 3.41,
                    "chosen": true
                  } /* chosen_range_access_summary */
......

因为ind_c4范围扫描的cost要小于ind_c2，所以索引不走ind_c2

where条件中字段c2和c4换个位置，索引还是不走ind_c2？为什么？

explain select * from t1 where  c2='b' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c4
      key_len: 5
          ref: const
         rows: 2
        Extra: Using where

因为ind_c4范围扫描的cost要小于ind_c2，所以索引不走ind_c2，跟c2和c4的位置无关。验证方法同上。

如下语句，换个条件c2=’c’，为什么可以走索引ind_c2？

mysql> explain select * from t1 where c2='c' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c2
      key_len: 387
          ref: const
         rows: 1
        Extra: Using index condition; Using where

mysql> select * from information_schema.optimizer_trace\G;
*************************** 1. row ***************************
                            QUERY: select * from t1 where c2='c' and c4=20
                            TRACE: {
......
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "c <= c2 <= c"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 1,           # c2=c 的结果集有1行
                        "cost": 2.21,
                        "chosen": true       # 这个索引的代价最小,被选中
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,   # 这个索引没有被选中，原因是cost
                        "cause": "cost"
                      }
                  ......

创建复合索引

1	ALTER TABLE t1 ADD KEY ind_c2_c4(c2,c4);

下面语句为什么不走复合索引ind_c2_c4？

explain select * from t1 where  c2='b' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c4
      key_len: 5
          ref: const
         rows: 2
        Extra: Using where

set optimizer_trace='enabled=on';
set optimizer_trace_max_mem_size=1000000;
set end_markers_in_json=on;
select * from t1 where  c2='b' and c4=20;
select * from information_schema.optimizer_trace\G;
*************************** 1. row ***************************
                            QUERY: select * from t1 where  c2='b' and c4=20
                            TRACE: {
......
                  "potential_range_indices": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "ind_c2",
                      "usable": true,
                      "key_parts": [
                        "c2",
                        "c1"
                      ] /* key_parts */
                    },
                    {
                      "index": "ind_c4",
                      "usable": true,
                      "key_parts": [
                        "c4",
                        "c1"
                      ] /* key_parts */
                    },
                    {
                      "index": "ind_c2_c4",
                      "usable": true,
                      "key_parts": [
                        "c2",
                        "c4",
                        "c1"
                      ] /* key_parts */
                    }
                  ] /* potential_range_indices */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  } /* group_index_range */,
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "b <= c2 <= b"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 4,
                        "cost": 5.81,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": true
                      },
                      {
                        "index": "ind_c2_c4",
                        "ranges": [
                          "b <= c2 <= b AND 20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,
                        "cause": "cost"
                      }
......
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "ind_c4",
                      "rows": 2,
                      "ranges": [
                        "20 <= c4 <= 20"
                      ] /* ranges */
                    } /* range_access_plan */,
                    "rows_for_plan": 2,
                    "cost_for_plan": 3.41,
                    "chosen": true
                  } /* chosen_range_access_summary */
                } /* range_analysis */
              }
 ......

索引ind_c4和ind_c2_c4都是非覆盖扫描，而ind_c4和ind_c2_c4的cost是一样的，mysql会选择叶子块数量较少的那个索引，很明显ind_c4叶子块数量较少。

下面语句为什么又可以走复合索引ind_c2_c4？

explain select c2,c4 from t1 where c2='b' and c4=20\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: ind_c2,ind_c4,ind_c2_c4
          key: ind_c2_c4
      key_len: 392
          ref: const,const
         rows: 2
        Extra: Using where; Using index

set optimizer_trace='enabled=on';
set optimizer_trace_max_mem_size=1000000;
set end_markers_in_json=on;
select c2,c4 from t1 where c2='b' and c4=20;
select * from information_schema.optimizer_trace\G;

*************************** 1. row ***************************
                            QUERY: select c2,c4 from t1 where c2='b' and c4=20
                            TRACE: {
......
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "ind_c2",
                        "ranges": [
                          "b <= c2 <= b"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 4,
                        "cost": 5.81,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "ind_c4",
                        "ranges": [
                          "20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "ind_c2_c4",
                        "ranges": [
                          "b <= c2 <= b AND 20 <= c4 <= 20"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": true,    # 索引覆盖扫描
                        "rows": 2,
                        "cost": 3.41,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": {
                      "intersecting_indices": [
                        {
                          "index": "ind_c2_c4",
                          "index_scan_cost": 1.0476,
                          "cumulated_index_scan_cost": 1.0476,
                          "disk_sweep_cost": 0,
                          "cumulated_total_cost": 1.0476,
                          "usable": true,
                          "matching_rows_now": 2,
                          "isect_covering_with_this_index": true,
                          "chosen": true
                        }
                      ] /* intersecting_indices */,
                      "clustered_pk": {
                        "clustered_pk_added_to_intersect": false,
                        "cause": "no_clustered_pk_index"
                      } /* clustered_pk */,
                      "chosen": false,
                      "cause": "too_few_indexes_to_merge"
                    } /* analyzing_roworder_intersect */
                  } /* analyzing_range_alternatives */
                } /* range_analysis */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "ref",
                      "index": "ind_c2",
                      "rows": 4,
                      "cost": 2.8,
                      "chosen": true
                    },
                    {
                      "access_type": "ref",
                      "index": "ind_c4",
                      "rows": 2,
                      "cost": 2.4,
                      "chosen": true
                    },
                    {
                      "access_type": "ref",
                      "index": "ind_c2_c4",
                      "rows": 2,
                      "cost": 1.4476,
                      "chosen": true
                    },
                    {
                      "access_type": "scan",
                      "cause": "covering_index_better_than_full_scan",  
                      "chosen": false
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "cost_for_plan": 1.4476,
                "rows_for_plan": 2,
                "chosen": true
              }
......

因为语中ind_c2_c4是索引覆盖扫描，不需要回表，代价较小。

总结

**我们看执行计划（Explain）仅仅只是结果，而看代价模型（Cost）才是过程。**如果我们真的想了解数据库是如何优化我们的SQL或者真的是如何执行的，需要深入深入的理解底层才行。

背景