05-sel_optimize

2. 优化数据访问

减少访问数据量
1. 应用程序是否在检索大量超过需要的数据
2. mysql_server是否在分析大量超过需要的数据行
向DB请求了不需要的数据
1. 查询不需要的记录
2. 多表关联时返回全部列
3. 总是取出全部列
4. 重复查询相同数据

2. 减少响应数据量（IO量）

查询不需要的记录
- 查询后面添加limit
多表关联时返回需要列，表加别名

# 返回需要的列
select actor.*
from actor
         inner join film_actor using (actor_id)
         inner join film using (film_id)
where film.title = 'Academy Dinosaur';

总是取出全部列
- 禁止使用select *，影响查询的性能
重复查询相同的数据
- 这部分数据缓存起来，提高查询效率。用Redis，内存淘汰策略：LRU

Variable_name	Value
optimizer_switch	index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on

index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on

2. straight_join

# mysql优化器比手动指定的性能好
# 1. mysql默认的join顺序
explain
select a.film_id,
       a.title,
       a.release_year,
       c.actor_id,
       c.first_name,
       c.last_name
from film a
         inner join film_actor b using (film_id)
         inner join actor c using (actor_id);
+--+-----------+-----+------+
|id|select_type|table|type  |
+--+-----------+-----+------+
|1 |SIMPLE     |c    |ALL   |
|1 |SIMPLE     |b    |ref   |
|1 |SIMPLE     |a    |eq_ref|
+--+-----------+-----+------+

# 2. 指定join顺序
explain
select straight_join a.film_id,
                     a.title,
                     a.release_year,
                     c.actor_id,
                     c.first_name,
                     c.last_name
from film a
         inner join film_actor b using (film_id)
         inner join actor c using (actor_id);
+--+-----------+-----+------+
|id|select_type|table|type  |
+--+-----------+-----+------+
|1 |SIMPLE     |a    |ALL   |
|1 |SIMPLE     |b    |ref   |
|1 |SIMPLE     |c    |eq_ref|
+--+-----------+-----+------+

# 3. 查看执行的成本
show status like 'last_query_cost';

6. order by优化

04_index =>《9.4. order by》

4. 优化特定类型的查询

1. count()

count(字段) < count(primary) < count(1) ≈ count(*)
使用近似值
- 在某些应用场景中，不需要完全精确的值，可以参考使用近似值来代替
- eg：可以使用explain来获取近似的值，其实在很多OLAP的应用中，需要计算某一个列值的基数，有一个近似值的算法叫hyperloglog

# 下面三种一模一样
explain select count(*) from rental;
explain select count(1) from rental;
explain select count(rental_id) from rental;

# 查询效率
show status like 'last_query_cost';

1. count(*)

《阿里巴巴Java开发手册》中强制要求count(*)，SQL92定义的标准统计行数的语法，与数据库无关，与null无关
数据库进很多优化
1. MyISAM：表级锁，无where条件，表总行数单独记录，直接返回
2. InnoDB：MySQL 8.0.13以后，选择一个成本较低的index（非聚簇索引）

2. count(1)

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

COUNT(1), COUNT(*)，MySQL的优化是完全一样的，根本不存在谁比谁快

3. count(expr)

全表扫描，检索的行中expr的值不为NULL的数量。结果是一个BIGINT值

2. 关联查询

确保on或者using子句中的列上有索引，创建索引时考虑到关联顺序
- 当表A和表B使用列C关联的时候，如果优化器的关联顺序是B、A，那么就不需要在B表的对应列上建index，没有用到的index只会带来额外的负担
- 一般情况下来说，只需要在关联顺序中的第二个表的相应列上创建index
group by和order by中的表达式只涉及到一个表中的列，才可能使用index

3. 子查询

用关联查询join代替。因为其要使用临时表，增加IO

4. group by, distinct(无意义)

index最有效。Mysql使用相同的方法来优化group by和distinct查询
- 无法使用index，可以使用临时表或者filesort来分组
如果对关联查询做分组，并且是按照查找表中的某个列进行分组，那么可以采用查找表的标识列分组的效率比其他列更高（没有实际意义）

select a.first_name, a.last_name, count(*)
from film_actor fa
         inner join actor a using (actor_id)
group by a.first_name, a.last_name;

# group by 特例
# 查询字段不包含group by字段，不报错
# a表唯一，group by无意义
# a表不唯一，上下sql结果不一致
select a.first_name, a.last_name, count(*)
from film_actor fa
         inner join actor a using (actor_id)
group by a.actor_id;

5. limit

数据进行分页，一般会使用limit，同时加上order by，这种方式有索引的帮助，效率通常不错。可是通常
- order by：进行大量的文件排序操作
- limit 10000, 10：偏移量非常大的时候，前面的大部分数据都会被抛弃，代价很高
要么是在页面中限制分页数量，要么优化大偏移量的性能

# 26s
explain
select * from oox_ooxxxx_state
order by crt_time desc limit 10000000, 5;
+--+-----------+----------------+----+-------------+----+-------+----+--------+--------+--------------+
|id|select_type|table           |type|possible_keys|key |key_len|ref |rows    |filtered|Extra         |
+--+-----------+----------------+----+-------------+----+-------+----+--------+--------+--------------+
|1 |SIMPLE     |oox_ooxxxx_state|ALL |NULL         |NULL|NULL   |NULL|21429734|100     |Using filesort|
+--+-----------+----------------+----+-------------+----+-------+----+--------+--------+--------------+


# 2.65s
explain
select * from oox_ooxxxx_state t1
    inner join (select id from oox_ooxxxx_state order by crt_time desc limit 10000000, 5) t2 using (id);
+--+-----------+----------------+------+-------------+------------+-------+-----+--------+--------+-----------+
|id|select_type|table           |type  |possible_keys|key         |key_len|ref  |rows    |filtered|Extra      |
+--+-----------+----------------+------+-------------+------------+-------+-----+--------+--------+-----------+
|1 |PRIMARY    |<derived2>      |ALL   |NULL         |NULL        |NULL   |NULL |10000005|100     |NULL       |
|1 |PRIMARY    |t1              |eq_ref|PRIMARY      |PRIMARY     |146    |t2.id|1       |100     |NULL       |
|2 |DERIVED    |oox_ooxxxx_state|index |NULL         |idx_crt_time|6      |NULL |10000005|100     |Using index|
+--+-----------+----------------+------+-------------+------------+-------+-----+--------+--------+-----------+


# 报错：his version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
select * from oox_ooxxxx_state
where id in (select id from oox_ooxxxx_state order by crt_time desc limit 10000000, 5);

6. union

union：创建并填充临时表。尽量使用union all
union all
intersect：交集
minus：差集

Mysql通过创建并填充临时表来执行union查询，很多优化策略在union都没法很好使用。经常需要手工的将where, limit, order by等子句下推到各个子查询中，以便优化器可以充分利用这些条件进行优化
除非确实需要服务器消除重复的行，否则一定要使用union all。union会给临时表加上distinct的关键字，代价很高

1. 行转列

90_practice.md

7. 自定义变量

类似于oracle_rowNum
开窗函数了解一下（8以后支持）

# @：自定义变量，@@：系统变量
select @@autocommit;

set @one := 1;
select @one;

set @i := 1;
select @i := @i + 1;

set @max_actor := (select max(actor_id)
                   from actor);
select @max_actor;

# 上一周
set @last_week := current_date - interval 1 week;
select @last_week;

1. 限制

无法使用查询缓存
不能在使用常量或者标识符的地方使用自定义变量。eg：表名、列名、limit子句
用户自定义变量的生命周期是在一个连接中有效，所以不能用它来做连接间的通信。和事务无关
不能显式地声明自定义变量的类型
Mysql优化器在某些场景下可能会将这些变量优化掉，这可能导致代码不按预想地方式运行
赋值符号:=的优先级非常低，在使用赋值表达式的时候应该明确的使用括号
使用未定义变量不会产生任何语法错误

2. 排名语句

# 1. 给一个变量赋值的同时使用这个变量
set @rowNum := 0;
select actor_id, @rowNum := @rowNum + 1 as rownum
from actor
limit 10;

# 2. 出演电影次数最多的前10名演员，倒序排名
set @actor_number := 0;
select actor_id, cnt, @actor_number := @actor_number + 1
from (select actor_id, count(*) as cnt
      from film_actor
      group by actor_id
      order by cnt desc
      limit 10) t;

3. 查询刚更新数据

update ooxx set upd_time = now() where id = 'id1';
select upd_time from ooxx where id = 'id1';

# 高效更新时间戳，同时返回时间戳
update ooxx set upd_time = now() where id = 'id1' and @now := now();
select @now;

4. 取值顺序问题

# where和select在查询的不同阶段执行，所以看到查询到两条记录，这不符合预期
# 1. 一行一行处理data。where => select => where => select
set @rowNum := 0;
select actor_id, @rowNum := @rowNum + 1 as cnt
from actor
where @rowNum <= 1;
+--------+------+
|actor_id|rowNum|
+--------+------+
|58      |1     |
|92      |2     |
+--------+------+

# 2. 引入了order by之后，打印出了全部结果。order by进行了filesort
# 显示200行
# 整体结果集处理，where => order by => select
set @rowNum := 0;
select actor_id, @rowNum := @rowNum + 1 as cnt
from actor
where @rowNum <= 1
order by first_name;
+----------+--------+---+
|first_name|actor_id|cnt|
+----------+--------+---+
|ADAM      |71      |1  |
|ADAM      |132     |2  |
|AL        |165     |3  |
|ALAN      |173     |4  |
|ALBERT    |125     |5  |
|ALBERT    |146     |6  |
|ALEC      |29      |7  |
|...       |...     |...|
+----------+--------+---+

# 3. 解决这个问题的关键在于让变量的赋值和取值发生在执行查询的同一阶段
# 一行一行处理data，where => select
set @rowNum := 0;
select actor_id, @rowNum as cnt
from actor
where (@rowNum := @rowNum + 1) <= 1;
+--------+------+
|actor_id|rowNum|
+--------+------+
|58      |1     |
+--------+------+

05-sel_optimize

1. 查询慢的原因

2. 优化数据访问

1. 减少检索数据量

2. 减少响应数据量（IO量）

3. 执行过程优化

1. 查询缓存

2. 分析器

3. 优化器

1. 优化成本来源

2. 选择错误的执行计划

3. 优化策略

4. 优化类型

5. join查询

1. join原理

1. Simple Nested-Loop Join

2. Index Nested-Loop Join

3. Block Nested-Loop Join