04-index

clustered 英 [ˈklʌstəd] 美 [ˈklʌstərd] adj. 成群的;成簇的, v. 群聚;聚集。cluster的过去分词和过去式

cardinality 英 [kɑːdɪˈnælɪti] n. 基数;集的势

straight 英 [streɪt] adj. 直的;直筒型(非紧身)的;准的;正中目标的; adv. 直;直接;径直;笔直地

  1. explain_官网地址open in new window
  2. 83_explain.md
  3. 数据结构学习网站open in new window
  4. Java数据结构之红黑树解析open in new window
  5. 闲聊MySQL:(七)InnoDB之索引结构open in new window

  1. 数据结构和算法动态可视化open in new window
  2. 印度学习网站open in new window
  3. 95_sakila说明.md
  4. 探索HyperLogLog算法(含Java实现)open in new window
  5. join官网——算法open in new window
  6. join官网——优化open in new window
  7. Use of Index Extensionsopen in new window
  8. Server Status Variablesopen in new window

1. explain

  1. explain_官网地址open in new window
  2. 83_explain.md
  3. 重要属性
    1. id:执行顺序
    2. select_type:查询类型
    3. table:表
    4. type:访问类型。一般情况下,得保证查询至少达到range级别,最好能达到ref
    5. key:实际使用的索引
    6. key_len:长度越短越好
    7. extra:额外信息
image-20220528170634489
image-20220528170644137
image-20220528170651670

2. Structure

image-20220528170823671

1. hash表

哈希表可以完成索引的存储,每次在添加索引的时候,需要计算指定列的hash值,取模运算后计算出下标,将元素插入下标位置即可

  1. 优点:
    • 等值查询很快
  2. 缺点:
    1. 所有data需要加载到内存,比较耗费内存空间(Memory存储引擎用Hash_index)
    2. data无序,范围查询要全遍历,效率低。企业或者实际工作环境中范围查询更多,而不是等值查询
image-20220528171025175

2. 二叉树

  • 数据倾斜

3. BST

  1. 二叉搜索树(Binary_Search_Tree),又称为二叉排序树,属于树的一种,通过二叉树将数据组织起来
  2. 内置排序,支持二分查找
  3. 数据倾斜

特点:

  1. 左右子树也分别是二叉搜索树
  2. 左子树的所有节点key值都小于它的根节点的key值
  3. 右子树的所有节点key值都大于他的根节点的key值
  4. 二叉搜索树可以为一棵空树
  5. 一般来说,树中的每个节点的key值都不相等,但根据需要也可以将相同的key值插入树中

4. AVL树

  1. 平衡BST,AVL是其发明者姓名简写
  2. 通过一定机制保证AVL平衡,查询效率更高

1. 特点

  1. 为BST
  2. 左右子节点也是AVL树
  3. 子树高度差<= 1,即平衡因子为范围为[-1,1]
  4. 维持平衡:左右旋转。消耗性能,删、插慢

2. 左右旋

image-20220625102919440

5. Red_Blank树

  1. AVL树
  2. 子树高度差<= 2倍
  3. O(logn) 时间内做查找、插入和删除
  4. 相较于AVL树,牺牲了部分平衡性以换取插入、删除操作时少量的旋转操作,整体来说性能要优于AVL树
  5. 维持平衡:左右旋转、颜色反转
  6. 变化规律总结:根节点必黑,新增是红色,只能黑连黑,不能红连红; 爸叔通红就变色,爸红叔黑就旋转,哪边黑往哪边转

1. 特点

  1. 节点分红、黑色,用以减少旋转次数
  2. 根节点是黑色,每个叶子节点(NIL节点)是黑色
  3. 每个红色节点的两个子节点都为黑色(从每个叶子到根的所有路径上不能有两个连续的红色节点)
  4. 任一节点到其每个叶子的所有路径都包含相同数目的黑色节点
  5. 新插入节点默认为红色,插入后需要校验红黑树是否符合规则,不符合则需要进行平衡
image-20220625102944201

2. 缺点

无论是二叉树还是红黑树

  1. 深度不可控。导致IO次数变多,读取性能低
  2. 维持平衡。左右旋,插入、删除性能低
  3. 不支持index
image-20220623192817098

6. B树

1. 特点

  1. 所有键值分布在整颗树中
  2. 搜索有可能在非叶子结点结束,在关键字全集内做一次查找,性能逼近二分查找
  3. 每个节点最多拥有m个子树
  4. 根节点至少有2个子树,分支节点至少拥有(m/2)颗子树(除根节点和叶子节点外都是分支节点)
  5. 所有叶子节点都在同一层,每个节点最多可以有(m-1)个key,并且以升序排列

2. 缺点

  1. 每个节点都有key,同时也包含data,而每个页存储空间是有限的,如果data比较大的话会导致每个节点存储的key数量变小
  2. 当存储的数据量很大的时候会导致深度较大,增大查询时磁盘IO次数,进而影响查询性能
  • MySQL磁盘预读方式,按页读4K的整数倍。InnoDB,默认读取(4页 * 4K = 16K)
  • eg:一条数据1K,一个磁盘块只能装16条记录,一共才能装(16 * 16 * 16)数据
image-20220531120737606

7. B+树

  1. B+Tree每个节点可以包含更多的节点
    1. 为了降低树的高度
    2. 将数据范围变为多个区间,区间越多,数据检索越快
  2. 非叶子节点存储key,叶子节点存储key+data
  3. 叶子节点两两指针相互连接(符合磁盘的预读特性),顺序查找性能更高
  4. B+Tree有两个头指针。可以进行两种查找运算
    1. 一个指向根节点 => 从根节点开始,进行随机查找
    2. 另一个指向关键字最小的叶子节点,所有叶子节点(即数据节点)之间是一种链式环结构 => 对应主键的范围查找和分页查找
image-20220531141341468
image-20220531135955784

1. 存储数据量

  1. 默认的话,一个数据页的大小是(4 * 4K = 16K)
  2. 单个叶子节点(数据页)中的记录数:(16K / 1K=16)。(假设一行记录的数据大小为1k,实际上很多互联网业务数据记录大小通常就是1K左右)
  3. 非叶子节点
    • 假设主键ID为bigint类型,长度为8B,而指针大小在InnoDB源码中为6B,这样一个判断指针为14B
    • 一个数据页判断指针数为(16K * 1024) / 14B = 1170
    • 高度为2的B+Tree,能存放(1170 * 16 = 18720)条记录。高度为3的B+树可以存放:(1170 * 1170 * 16 = 21902400)条记录
  4. InnoDB中B+树高度一般为1-3层,能满足千万级的数据存储
    • 一次数据页的查找代表一次 IO,所以通过primary查询通常只需要1-3次 IO 操作即可查找到数据

2. why不用B*树?

  • 非叶子节点间也建立了指针,非叶子节点没有数据块,所以没必要

3. Engine

  1. Mysql8已经没有查询缓存了
  2. 一个index对应一个index_tree,index_tree一般三层,四层数据量就太大了
  3. table尽量都有主键

1. InnoDB

  1. InnoDB的数据是面向主键索引进行数据存储
  2. InnoDB中,B+Tree索引又可以分为聚簇索引与辅助索引
    1. 聚簇索引:通过primary来构造一棵B+Tree,同时叶子节点中存储的是整张表的行记录信息。数据也是索引的一部分
    2. 辅助索引:非主键索引。回表
  3. B+Tree结构对主键创建索引,然后叶子节点中存储data
    1. 如果没有主键,那么会选择唯一键
    2. 如果没有唯一键,那么会生成一个6位的row_id来作为主键
  4. 回表:辅助索引的叶子节点中,存储的是该row的primary,通过primary找到对应的row
image-20220531141018344

2. MyISAM

  1. data和index分开文件存储。primary叶子节点存一整行data的指针
  2. normal_index叶子节点存primary。进行回表
image-20220531141523923

4. 索引

1. 优点

  1. 大大减少了服务器需要扫描的数据量
  2. 帮助服务器避免排序和临时表
    • order by会进行全排序,index是排好序的
  3. 将随机IO变成顺序IO

2. 用处

  1. 快速查找匹配where子句的行
  2. 从consideration中消除行,如果多个索引之间进行选择,Mysql通常会使用最少行的索引
  3. 表具有多列索引,则优化器可以使用索引的任何最左前缀来查找行
  4. 当有表连接的时候,从其他表检索行数据
  5. 查找特定索引列的min或max值
  6. 排序或分组时在可用索引的最左前缀上完成的,则对表进行排序和分组
  7. 在某些情况下,可以优化查询以检索值而无需查询数据行

3. 分类

  1. 主键索引(Primary)
  2. 唯一索引(Unique_Index)
  3. 普通索引(Normal_Index)
  4. 组合索引(Combined_Index)
  5. 全文索引(Fulltext_Index)

4. Concept

  1. 回表:非聚簇索引,在叶子节点中存储的是该row的primary值,通过primary找到对应的row
  2. 覆盖索引:通过非聚簇索引查询primary时,没有回表的过程
  3. 最左匹配
    1. 组合索引(name_age_index)name查询启用索引,age索引无效
    2. (name_age_index + age_index)优于(age_name_index + name_index),age索引更短
    3. age_index + name_index:涉及索引合并(了解)
    4. 页分裂、页合并(了解)
  4. 索引下推
    1. 组合索引(name_age_index)前提
    2. 在通过name取数据的时,直接把age过滤掉,减少回表次数。低版本,name过滤,再age过滤
# name_age_index
EXPLAIN SELECT * from t_user where name like '李%' and age = 22;
  1. 谓词下推
# 谓词下推(减少IO量)
# 把2张表(删除只剩4个字段再关联)比(先关联表再删字段)效率高
select t1.name, t2.name
from t1, t2
where t1.id = t2.id;


select count(0) from A Join B on A.id = B.id
where A.a > 10 and B.b < 100;

# 谓词下推,优化后的sql。先过滤再join
select count(0) from (select * from A where a>10) A1
Join (
    select * from B where b<100
)B1 on A1.id = B1.id;

5. sakila_db

# .sql文件导入库
source /Users/listao/mca/05_mysql/sakila-db/sakila-schema.sql
image-20220530231848502
image-20220530231858263
image-20220531160415569

6. 匹配方式

image-20220530231840086
  1. 全值匹配
  2. 最左前缀
  3. 范围
  4. 等值 + 范围
  5. 列前缀
  6. 覆盖索引
create table mysql_index (
    id       int auto_increment primary key,
    name     varchar(24) default ''                not null comment '姓名',
    age      int         default 0                 not null comment '年龄',
    pos      varchar(20) default ''                not null comment '职位',
    upd_time datetime    default CURRENT_TIMESTAMP not null comment '入职时间'
)
    comment 'mysql索引匹配表';

create index idx_nap on mysql_index (name, age, pos);
# 查看组合索引
show index from mysql_index;

# 1. 全值匹配
explain select * from mysql_index where name = 'July' and age = '23' and pos = 'dev';
+--+-----------+----+-------------+-------+-------+-----------------+-----+
|id|select_type|type|possible_keys|key    |key_len|ref              |Extra|
+--+-----------+----+-------------+-------+-------+-----------------+-----+
|1 |SIMPLE     |ref |idx_nap      |idx_nap|140    |const,const,const|NULL |
+--+-----------+----+-------------+-------+-------+-----------------+-----+

# 2. 最左前缀
explain select * from mysql_index where name = 'July' and age = '23';
+--+-----------+----+-------------+-------+-------+-----------+-----+
|id|select_type|type|possible_keys|key    |key_len|ref        |Extra|
+--+-----------+----+-------------+-------+-------+-----------+-----+
|1 |SIMPLE     |ref |idx_nap      |idx_nap|78     |const,const|NULL |
+--+-----------+----+-------------+-------+-------+-----------+-----+

# ref只有name列,没有age列
explain select * from mysql_index where name = 'July' and pos = '23';
+--+-----------+----+-------------+-------+-------+-----+---------------------+
|id|select_type|type|possible_keys|key    |key_len|ref  |Extra                |
+--+-----------+----+-------------+-------+-------+-----+---------------------+
|1 |SIMPLE     |ref |idx_nap      |idx_nap|74     |const|Using index condition|
+--+-----------+----+-------------+-------+-------+-----+---------------------+

explain select * from mysql_index where name = 'July';
+--+-----------+----+-------------+-------+-------+-----+-----+
|id|select_type|type|possible_keys|key    |key_len|ref  |Extra|
+--+-----------+----+-------------+-------+-------+-----+-----+
|1 |SIMPLE     |ref |idx_nap      |idx_nap|74     |const|NULL |
+--+-----------+----+-------------+-------+-------+-----+-----+

# 3. 匹配范围值
explain select * from mysql_index where name > 'Mary';
+--+-----------+-----+-------------+-------+-------+----+---------------------+
|id|select_type|type |possible_keys|key    |key_len|ref |Extra                |
+--+-----------+-----+-------------+-------+-------+----+---------------------+
|1 |SIMPLE     |range|idx_nap      |idx_nap|74     |NULL|Using index condition|
+--+-----------+-----+-------------+-------+-------+----+---------------------+

# 4. 等值 + 范围
explain select * from mysql_index where name = 'July' and age > 25;
+--+-----------+-----+-------------+-------+-------+----+---------------------+
|id|select_type|type |possible_keys|key    |key_len|ref |Extra                |
+--+-----------+-----+-------------+-------+-------+----+---------------------+
|1 |SIMPLE     |range|idx_nap      |idx_nap|78     |NULL|Using index condition|
+--+-----------+-----+-------------+-------+-------+----+---------------------+

# 5. 匹配列前缀
explain select * from mysql_index where name like 'J%';
+--+-----------+-----+-------------+-------+-------+----+---------------------+
|id|select_type|type |possible_keys|key    |key_len|ref |Extra                |
+--+-----------+-----+-------------+-------+-------+----+---------------------+
|1 |SIMPLE     |range|idx_nap      |idx_nap|74     |NULL|Using index condition|
+--+-----------+-----+-------------+-------+-------+----+---------------------+

explain select * from mysql_index where name like '%y';
+--+-----------+----+-------------+----+-------+----+-----------+
|id|select_type|type|possible_keys|key |key_len|ref |Extra      |
+--+-----------+----+-------------+----+-------+----+-----------+
|1 |SIMPLE     |ALL |NULL         |NULL|NULL   |NULL|Using where|
+--+-----------+----+-------------+----+-------+----+-----------+

# 6. 覆盖索引
explain select name, age, pos from mysql_index where name = 'July' and age = 25 and pos = 'dev';
+--+-----------+----+-------------+-------+-------+-----------------+-----------+
|id|select_type|type|possible_keys|key    |key_len|ref              |Extra      |
+--+-----------+----+-------------+-------+-------+-----------------+-----------+
|1 |SIMPLE     |ref |idx_nap      |idx_nap|140    |const,const,const|Using index|
+--+-----------+----+-------------+-------+-------+-----------------+-----------+

# *
explain select * from mysql_index where name = 'July' and age = 25 and pos = 'dev';
+--+-----------+----+-------------+-------+-------+-----------------+-----+
|id|select_type|type|possible_keys|key    |key_len|ref              |Extra|
+--+-----------+----+-------------+-------+-------+-----------------+-----+
|1 |SIMPLE     |ref |idx_nap      |idx_nap|140    |const,const,const|NULL |
+--+-----------+----+-------------+-------+-------+-----------------+-----+

5. Hash_index

  1. 基于哈希表实现,只有精确匹配索引所有列的查询才有效
  2. 只有Memory的存储引擎显式支持Hash_index
  3. Hash_index自身只需存储对应的hash值,所以索引的结构十分紧凑,查找的速度非常快

1. 限制

  1. 无Covering_index
  2. 数据并不是按照索引值顺序存储的,所以无法进行排序
  3. 不支持部分列匹配查找(最左前缀),使用索引列的全部内容来计算哈希值
  4. 支持等值比较查询,不支持范围查询
  5. 访问哈希索引的数据非常快。除非有很多哈希冲突,哈希冲突时,存储引擎必须遍历链表中的所有行指针,逐行比较,直到找到所有符合条件的行
  6. 哈希冲突比较多的话,维护的代价会很高

2. CRC32

# 当需要存储大量的URL,并且根据URL进行搜索查找。如果使用B+树,存储的内容就会很大
select id from url where url = "";

# 将url进行CRC32哈希,可以使用以下查询方式
select id from url where url = "" and url_crc = crc32("url");

# 此查询性能较高:使用体积很小的索引来完成查找

6. Combined_index

  • 《匹配最左前缀》
image-20220530232321309

7. 聚簇, 非聚簇索引

数据存储方式

  1. InnoDB:.ibd = 数据文件 + 索引文件
  2. MyISAM:.MYD数据文件;.MYI索引文件
image-20220530232408110

1. clustered_index

  1. InnoDB:index和data在一个文件里,叶子节点存row_data
    1. primary是clustered_index
    2. 非primary是non_clustered_index
  2. 优点:
    1. index和data保存在同一个树中,访问更快
    2. covering_index
  3. 缺点:
    1. 插入速度严重依赖插入顺序(最快primary顺序)
    2. 索引块大小是固定的,增、删、改代价高。涉及《页分裂》《页合并》
    3. 导致全表扫描变慢
  4. 进行数据迁移,可以先把index关掉,迁移结束了,再打开。避免index一直进行更新

2. non_clustered_index

  • MyISAM:index和data不在一个文件里,叶子节点存row_data指针

8. covering_index

  1. index包含了要查询的数据
  2. InnoDB、MyISAM支持,Memory不支持
image-20220530232637506
# Extra: Using index
# 二级index的叶子节点,存储primary的值
explain
select empno, deptno from emp where deptno = '10';
+--+-----------+-----+----+-------------+----------+-------+-----+-----------+
|id|select_type|table|type|possible_keys|key       |key_len|ref  |Extra      |
+--+-----------+-----+----+-------------+----------+-------+-----+-----------+
|1 |SIMPLE     |emp  |ref |idx_deptno   |idx_deptno|5      |const|Using index|
+--+-----------+-----+----+-------------+----------+-------+-----+-----------+

9. optimize_detail

image-20220627213616001

1. 不要表达式

  • 使用索引列进行查询不要使用表达式,把计算放到业务层而不是数据库层
explain select *
from emp where deptno = 20;
+--+-----------+-----+----+-------------+----------+-------+-----+----+-----+
|id|select_type|table|type|possible_keys|key       |key_len|ref  |rows|Extra|
+--+-----------+-----+----+-------------+----------+-------+-----+----+-----+
|1 |SIMPLE     |emp  |ref |idx_deptno   |idx_deptno|5      |const|5   |NULL |
+--+-----------+-----+----+-------------+----------+-------+-----+----+-----+

# 表达式,index失效
explain select *
from emp where deptno + 1 = 20;
+--+-----------+-----+----+-------------+----+-------+----+----+-----------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra      |
+--+-----------+-----+----+-------------+----+-------+----+----+-----------+
|1 |SIMPLE     |emp  |ALL |NULL         |NULL|NULL   |NULL|14  |Using where|
+--+-----------+-----+----+-------------+----+-------+----+----+-----------+

2. 尽量使用主键

  • 尽量使用主键查询,不会触发回表

3. 列前缀优化

  1. 很长字符串,索引变得大且慢。使用列开始的部分字符串,大大的节约索引空间,提高索引效率,降低索引的选择性
    • 索引的选择性:不重复的索引值和数据表记录总数的比值,范围从(1/#T)~ 1 之间。选择性越高则查询效率越高,因为可以让Mysql过滤掉更多的行
  2. BLOB,TEXT,VARCHAR类型的列,必须要使用前缀索引。mysql不允许索引这些列的完整长度,要选择足够长的前缀以保证较高的选择性,又不能太长
  3. 无法使用前缀索引做order bygroup by
use sakila;

# 创建数据表
create table citydemo(city varchar(50) not null);
insert into citydemo(city) select city from city;

# 重复执行5次
insert into citydemo(city) select city from citydemo;

# 更新城市表的名称
# `order by rand() limit 1` 随机排序并取出一个
update citydemo set city = (select city from city order by rand() limit 1);

# 查找最常见的城市列表,发现每个值都出现45-65次
select count(1) as cnt, city from citydemo
group by city order by cnt desc limit 10;

# 查找最频繁出现的城市前缀,先从3个前缀字母开始,发现比原来出现的次数更多,可以分别截取多个字符查看城市出现的次数
select count(1)       as cnt,
       left(city, 3)  as pref
from citydemo
group by pref order by cnt desc limit 10;

# 列前缀的选择性接近于完整列的选择性
select count(1)       as cnt,
       left(city, 7)  as pref
from citydemo
group by pref order by cnt desc limit 10;

# 另一种方式计算列前缀选择性。当前缀长度到达7之后,再增加前缀长度,选择性提升的幅度已经很小
select count(distinct left(city, 3)) / count(*)  as sel3,
       count(distinct left(city, 4)) / count(*)  as sel4,
       count(distinct left(city, 5)) / count(*)  as sel5,
       count(distinct left(city, 6)) / count(*)  as sel6,
       count(distinct left(city, 7)) / count(*)  as sel7,
       count(distinct left(city, 8)) / count(*)  as sel8,
       count(distinct left(city, 9)) / count(*)  as sel9,
       count(distinct left(city, 10)) / count(*) as sel10,
       count(distinct left(city, 11)) / count(*) as sel11,
       count(distinct left(city, 12)) / count(*) as sel12,
       count(distinct left(city, 13)) / count(*) as sel13,
       count(distinct left(city, 14)) / count(*) as sel14
from citydemo;

# 通过city列前7个字符创建索引
alter table citydemo add key(city(7));

show index from citydemo;
+----------+--------+------------+-----------+---------+-----------+--------+------+----+----------+
|Non_unique|Key_name|Seq_in_index|Column_name|Collation|Cardinality|Sub_part|Packed|Null|Index_type|
+----------+--------+------------+-----------+---------+-----------+--------+------+----+----------+
|1         |city    |1           |city       |A        |596        |7       |NULL  |    |BTREE     |
+----------+--------+------------+-----------+---------+-----------+--------+------+----+----------+

# 和Cardinality大体一致 => 599
select count(distinct city) from citydemo;

Cardinality:基数。该列的唯一值有多少个

  1. 如果唯一值多,索引的检索就效率越高
  2. 如果唯一值越少,效率越低。重复的值越多,那么索引匹配的时候就会有更多的重复数据,这个是不利于索引的检索的
  3. hyperloglog去重算法,kylin也用到了,底层的思想都是一样的 —— 计算统计
  4. 探索HyperLogLog算法(含Java实现)open in new window

4. order by

explain Extra列

  1. 有《Using filesort》文件排序
  2. 没有《Using filesort》index_scan

1. sort_buffer

  1. 每个查询线程分配一块小内存,用于排序,称为sort_buffer
# 256KB
show variables like 'sort_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| sort_buffer_size | 262144 |
+------------------+--------+
  1. 临时文件排序(归并排序)

    sort_buffer装不下

image-20220603084731954

2. 全字段排序

一次回表:普通索引where过滤行。回表取行全字段,放到sort_buffer进行排序

  1. 优点
    • 只需要一次回表读取数据,而减少随机IO
  2. 缺点
    • 查询列特别多的时候,分多次sort_buffer,再进行归并排序
image-20220603084444909

3. rowId排序

  1. 两次回表:普通索引where过滤行。回表取行(rowId + order by)字段,放到sort_buffer进行排序,再回表
  2. combined_index取消了第一次回表操作,如果是covering_index,第二次回表也取消了
# 排序的列的总大小超过`max_length_for_sort_data`定义byte,选择rowId排序,反之使用全字段排序
show variables like 'max_length_for_sort_data';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| max_length_for_sort_data | 1024  |
+--------------------------+-------+
  1. 优点
    • sort_buffer尽可能多的容纳行数来进行排序操作
  2. 缺点
    • 效率低,第二次回表需要更多的随机IO
image-20220603084950687

4. index_scan, filesort

  1. 两种方式排序
    1. filesort(文件排序)Extra 有 Using filesort
      • full_table_scan(table_data):全表扫描
    2. index_scan(索引扫描)Extra 没有 Using filesort
      • full_index_scan(index_tree):索引扫描
  2. 查询需要关联多张表,当order by子句引用的字段全部为第一张表时,才能使用索引做排序
  3. (where + order by)=>《最左前缀》
  4. 没有where
    1. 默认filesort,mysql优化器认为普通二级索引,回表成本比全表扫描排序更高
    2. index_scan比filesort慢:不能covering_index,就回表,随机IO
    3. 表数据量大时limit n,会进行index_scan。有限次的回表成本 << table_scan
# 《name, age, address》联合索引
create table mysql_orderBy (
        id       varchar(36)  not null primary key,
        name     varchar(50)  null,
        age      int          null,
        address  varchar(255) null,
        upd_time datetime     null
)
    comment 'mysql_orderBy表';

create index idx_naa on mysql_orderBy (name, age, address);

# ------------------------------- covering_index ----------------------------------
# 1. 使用covering_index,必经过index_tree,type至少index,不会回表

# type: ref; Extra: Using index
explain select id, name, age from mysql_orderBy where name = '';
# type: index; Extra: Using where; Using index
explain select id, name, age from mysql_orderBy where age = '';

# type: index; Extra: Using index
explain select id, name, age from mysql_orderBy order by name, age;
# type: index; Extra: Using index
explain select id, name, age from mysql_orderBy order by name;
# type: index; Extra: Using index; Using filesort
explain select id, name, age from mysql_orderBy order by age;

# type: ref; Extra: Using where; Using index
explain select id, name, age from mysql_orderBy where name = '' order by name, age;
# type: ref; Extra: Using index
explain select id, name, age from mysql_orderBy where name = '' order by name;
# (where + order by)==>《最左前缀》
# type: ref; Extra: Using where; Using index
explain select id, name, age from mysql_orderBy where name = '' order by age;

# type: index; Extra: Using where; Using index; Using filesort
explain select id, name, age from mysql_orderBy where age = '' order by name, age;
# type: index; Extra: Using where; Using index(name应用了index)
explain select id, name, age from mysql_orderBy where age = '' order by name;
# type: index; Extra: Using where; Using index; Using filesort
explain select id, name, age from mysql_orderBy where age = '' order by age;

# ------------------------------- 全字段查询 ------------------------------------
# 2. 必经过table_data,经过index_tree的,需要回表

# type: ref; Extra: null
explain select * from mysql_orderBy where name = '';
# type: ALL; Extra: Using where
explain select * from mysql_orderBy where age = '';

# 单纯的`order by`并不使用index,filesort比index_scan性能高
# type: ALL; Extra: Using filesort
explain select * from mysql_orderBy order by name, age;

# 强制使用index
explain select * from mysql_orderBy force index(idx_naa) order by name, age;
+--+-----------+-------------+-----+-------------+-------+-------+----+----+--------+-----+
|id|select_type|table        |type |possible_keys|key    |key_len|ref |rows|filtered|Extra|
+--+-----------+-------------+-----+-------------+-------+-------+----+----+--------+-----+
|1 |SIMPLE     |mysql_orderBy|index|NULL         |idx_naa|1231   |NULL|1   |100     |NULL |
+--+-----------+-------------+-----+-------------+-------+-------+----+----+--------+-----+

# type: ALL; Extra: Using filesort
explain select * from mysql_orderBy order by name;
# type: ALL; Extra: Using filesort
explain select * from mysql_orderBy order by age;


# type: ref; Extra: Using index condition
explain select * from mysql_orderBy where name = '' order by name, age;
# type: ref; Extra: null
explain select * from mysql_orderBy where name = '' order by name;

# type: ref; Extra: Using index condition(最左前缀)
explain select * from mysql_orderBy where name = '' order by age, address;
# 1. desc asc,filesort
# type: ref; Extra: Using index condition; Using filesort
explain select * from mysql_orderBy where name = '' order by age desc, address;

# type: ref; Extra: Using where
explain select * from mysql_orderBy where name = '' order by age desc;
# type: range; Extra: Using index condition
explain select * from mysql_orderBy where name > '' order by name, age ;

# 2. order by不再索引中的列,filesort
# type: ref; Extra: Using index condition; Using filesort
explain select * from mysql_orderBy where name = '' order by age, upd_time;


# type: ALL; Extra: Using where; Using filesort
explain select * from mysql_orderBy where age = '' order by name, age;
# type: ALL; Extra: Using where; Using filesort
explain select * from mysql_orderBy where age = '' order by name;
# type: ALL; Extra: Using where; Using filesort
explain select * from mysql_orderBy where age = '' order by age;

5. Extra

# 1. Using index
# 使用covering_index,不回表
explain select id, name, age from mysql_orderBy;

# 2. Using where; Using index
explain select id, name, age from mysql_orderBy where age > '';

# 3. Using where
# 回表(使用index)
explain select id, address from mysql_orderBy where id > '';
# 不回表(没用index)不进行index_scan,直接table_scan
explain select id from mysql_orderBy where address > '';

# 4. Using index condition
# 首先index_scan,然后Using index condition,最后回表
explain select * from mysql_orderBy where name = '1';

5. union all, in, or

  1. union all代替union。union会去重,会创建临时表Using temporary。影响性能
  2. union all,in,or都能够使用index,推荐使用in
image-20220530233000171
explain
select * from mysql_standard ms1 where id = '1'
union
select * from mysql_standard ms2 where id = '2';
+----+------------+----------+-----+-------------+-------+-------+-----+----+--------+---------------+
|id  |select_type |table     |type |possible_keys|key    |key_len|ref  |rows|filtered|Extra          |
+----+------------+----------+-----+-------------+-------+-------+-----+----+--------+---------------+
|1   |PRIMARY     |ms1       |const|PRIMARY      |PRIMARY|146    |const|1   |100     |NULL           |
|2   |UNION       |ms2       |const|PRIMARY      |PRIMARY|146    |const|1   |100     |NULL           |
|NULL|UNION RESULT|<union1,2>|ALL  |NULL         |NULL   |NULL   |NULL |NULL|NULL    |Using temporary|
+----+------------+----------+-----+-------------+-------+-------+-----+----+--------+---------------+

explain
select * from mysql_standard ms1 where id = '1'
union all
select * from mysql_standard ms2 where id = '2';
+--+-----------+-----+-----+-------------+-------+-------+-----+----+--------+-----+
|id|select_type|table|type |possible_keys|key    |key_len|ref  |rows|filtered|Extra|
+--+-----------+-----+-----+-------------+-------+-------+-----+----+--------+-----+
|1 |PRIMARY    |ms1  |const|PRIMARY      |PRIMARY|146    |const|1   |100     |NULL |
|2 |UNION      |ms2  |const|PRIMARY      |PRIMARY|146    |const|1   |100     |NULL |
+--+-----------+-----+-----+-------------+-------+-------+-----+----+--------+-----+

explain
select * from mysql_standard ms where id in ('1', '2');
+--+-----------+-----+-----+-------------+-------+-------+----+----+--------+-----------+
|id|select_type|table|type |possible_keys|key    |key_len|ref |rows|filtered|Extra      |
+--+-----------+-----+-----+-------------+-------+-------+----+----+--------+-----------+
|1 |SIMPLE     |ms   |range|PRIMARY      |PRIMARY|146    |NULL|2   |100     |Using where|
+--+-----------+-----+-----+-------------+-------+-------+----+----+--------+-----------+

explain
select * from mysql_standard ms where id = '1' or id = '2';
+--+-----------+-----+-----+-------------+-------+-------+----+----+--------+-----------+
|id|select_type|table|type |possible_keys|key    |key_len|ref |rows|filtered|Extra      |
+--+-----------+-----+-----+-------------+-------+-------+----+----+--------+-----------+
|1 |SIMPLE     |ms   |range|PRIMARY      |PRIMARY|146    |NULL|2   |100     |Using where|
+--+-----------+-----+-----+-------------+-------+-------+----+----+--------+-----------+

1. exists

  1. 子查询有结果外层即可显示
  2. 查询结果不能出现子查询的字段
  3. 效率很高比in()都高
# exists相当于外层for(),子查询只要有结果外层即可显示
# 所以会查询出外层的全部
explain
select * from emp
where exists(select deptno from dept where deptno = 20 or deptno = 30);
+--+-----------+-----+-----+-------+-------+----+----+--------+------------------------+
|id|select_type|table|type |key    |key_len|ref |rows|filtered|Extra                   |
+--+-----------+-----+-----+-------+-------+----+----+--------+------------------------+
|1 |PRIMARY    |emp  |ALL  |NULL   |NULL   |NULL|14  |100     |NULL                    |
|2 |SUBQUERY   |dept |index|PRIMARY|4      |NULL|4   |50      |Using where; Using index|
+--+-----------+-----+-----+-------+-------+----+----+--------+------------------------+

# 优先级:and > or
explain
select * from emp e
where exists(select deptno from dept d where (deptno = 20 or deptno = 30) and e.deptno = d.deptno);
+--+------------------+-----+------+-------+-------+------------+----+--------+------------------------+
|id|select_type       |table|type  |key    |key_len|ref         |rows|filtered|Extra                   |
+--+------------------+-----+------+-------+-------+------------+----+--------+------------------------+
|1 |PRIMARY           |e    |ALL   |NULL   |NULL   |NULL        |14  |100     |Using where             |
|2 |DEPENDENT SUBQUERY|d    |eq_ref|PRIMARY|4      |mca.e.deptno|1   |100     |Using where; Using index|
+--+------------------+-----+------+-------+-------+------------+----+--------+------------------------+

6. 范围列索引

  • index最多用于一个范围列
image-20220530233159037
# > >= < <= between,使用index
explain select * from t_temp where name >= '';

# age不使用index
explain select * from t_temp where name >= '' and age = '';

7. 不要类型转换

  1. java.int = BD.varchar(index失效)
  2. java.String = BD.int(index有效)
# 1. java.String = BD.int(index有效)
explain
select * from mysql_index where id = '1';
+--+-----------+-----------+-----+-------------+-------+-------+-----+----+--------+-----+
|id|select_type|table      |type |possible_keys|key    |key_len|ref  |rows|filtered|Extra|
+--+-----------+-----------+-----+-------------+-------+-------+-----+----+--------+-----+
|1 |SIMPLE     |mysql_index|const|PRIMARY      |PRIMARY|4      |const|1   |100     |NULL |
+--+-----------+-----------+-----+-------------+-------+-------+-----+----+--------+-----+


# 2. java.int = BD.varchar(index失效)
explain
select * from mysql_index where name = 1;
+--+-----------+-----------+----+-------------+----+-------+----+----+--------+-----------+
|id|select_type|table      |type|possible_keys|key |key_len|ref |rows|filtered|Extra      |
+--+-----------+-----------+----+-------------+----+-------+----+----+--------+-----------+
|1 |SIMPLE     |mysql_index|ALL |idx_nap      |NULL|NULL   |NULL|1   |100     |Using where|
+--+-----------+-----------+----+-------------+----+-------+----+----+--------+-----------+

8. 更新频繁,区分度小

  1. upd频繁不建
    1. 更新会变更B+树,更新频繁的字段建立index会大大降低数据库性能
  2. 区分度小不建
    1. 类似于性别这类区分不大的属性,建立index是没有意义的,不能有效的过滤数据,
    2. 一般区分度在80%以上的时候就建立index,区分度可以使用count(distinct(列名)) / count(*)来计算

9. index不为null

  • 创建index列,不允许为null,可能得到不合预期的结果,null自己都不等于null

10. join < 3, 类型一致

  1. join官网——算法open in new window
  2. join官网——优化open in new window
  3. 小表(10)join大表(1000),效率高。大表匹配到了就结束loop,小表下一个,减少开销
# straight_join左表始终在右表前读取
select * from emp e straight_join dept d on e.deptno = d.deptno;

# 256K
show variables like '%join_buffer%';

# 以下两个sql等价
select * from emp e join dept d on e.deptno = d.deptno;
select * from emp e inner join dept d on e.deptno = d.deptno;

1. on, where

# 1. 内接连,一样
explain
select * from emp e inner join dept d on d.deptno = e.deptno and e.deptno = 20;
+--+-----------+-----+-----+-------------+----------+-------+-----+----+--------+-----+
|id|select_type|table|type |possible_keys|key       |key_len|ref  |rows|filtered|Extra|
+--+-----------+-----+-----+-------------+----------+-------+-----+----+--------+-----+
|1 |SIMPLE     |d    |const|PRIMARY      |PRIMARY   |4      |const|1   |100     |NULL |
|1 |SIMPLE     |e    |ref  |idx_deptno   |idx_deptno|5      |const|5   |100     |NULL |
+--+-----------+-----+-----+-------------+----------+-------+-----+----+--------+-----+

explain
select * from emp e inner join dept d on d.deptno = e.deptno where e.deptno = 20;
+--+-----------+-----+-----+-------------+----------+-------+-----+----+--------+-----+
|id|select_type|table|type |possible_keys|key       |key_len|ref  |rows|filtered|Extra|
+--+-----------+-----+-----+-------------+----------+-------+-----+----+--------+-----+
|1 |SIMPLE     |d    |const|PRIMARY      |PRIMARY   |4      |const|1   |100     |NULL |
|1 |SIMPLE     |e    |ref  |idx_deptno   |idx_deptno|5      |const|5   |100     |NULL |
+--+-----------+-----+-----+-------------+----------+-------+-----+----+--------+-----+

# 2. left join:会显示左表的全部数据
#   and:条件对右表进行限制
#   where:对join的结果集进行过滤
select * from emp e left join dept d on e.deptno = d.deptno and e.deptno = 20;
select * from emp e left join dept d on e.deptno = d.deptno where e.deptno = 20;

select * from emp e join dept d on 1 != 1;
select * from emp e left join dept d on 1 != 1;

11. limit提高效率

  1. 能使用limit,尽量使用
  2. limit不应该叫分页,应该要限制输出

12. 单表索引 <= 5

  • 增加IO

13. 组合索引字段 <= 5

  • Combined_index

14. 错误概念

  1. index越多越好
  2. 过早优化。在不了解系统的情况下进行优化

10. index_monitor

image-20220603153102203
# Handler_read_key、Handler_read_rnd_next:越大越好,意味着index生效
show status like 'Handler_read%';
+---------------------+-----+
|Variable_name        |Value|
+---------------------+-----+
|Handler_read_first   |4    | -- 读取索引第一个条目的次数
|Handler_read_key     |10   | -- 1. 通过index获取数据的次数
|Handler_read_last    |0    | -- 读取索引最后一个条目的次数
|Handler_read_next    |5    | -- 通过索引读取下一条数据的次数
|Handler_read_prev    |0    | -- 通过索引读取上一条数据的次数
|Handler_read_rnd     |0    | -- 从固定位置读取数据的次数
|Handler_read_rnd_next|1079 | -- 2. 从数据节点读取下一条数据的次数
+---------------------+-----+