MySQL 8.0 执行COUNT()很慢原因分析

费尔默 · 发表于 2024-11-6 15:16:56

MySQL 8.0 执行COUNT()很慢原因分析

1.1 问题描述

线上 MySQL8.0.32 环境在执行 SELECT COUNT (1) FROM t0 获取表行数很慢，同样场景下该 SQL 在 MySQL5.7 环境很快就能拿到结果
1.2 问题复现

测试版本：8.0.25 MySQL Community Server - GPL 和 5.7.21-log MySQL Community Server (GPL)
1.2.1 复现准备

创建表并初始化数据

greatsql> DROP TABLE if EXISTS t0;
Query OK, 0 rows affected (0.05 sec)
greatsql> CREATE TABLE `t0` (
`id` int NOT NULL AUTO_INCREMENT,
`i1` int NOT NULL DEFAULT '0',
`c1` varchar(300) NOT NULL DEFAULT 'fander',
`c2` varchar(300) NOT NULL DEFAULT 'fander',
`c3` varchar(300) NOT NULL DEFAULT 'fander',
`c4` varchar(300) NOT NULL DEFAULT 'fander',
`c5` varchar(300) NOT NULL DEFAULT 'fander',
`c6` varchar(300) NOT NULL DEFAULT 'fander',
`c7` varchar(300) NOT NULL DEFAULT 'fander',
PRIMARY KEY (`id`) USING BTREE,
KEY `idx_i1` (`i1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.05 sec)
greatsql> INSERT INTO t0 VALUES(1,0,REPEAT('a', 100),REPEAT('b', 100),REPEAT('c', 100),REPEAT('d', 100),REPEAT('e', 100),REPEAT('f', 100),REPEAT('g', 100));
Query OK, 1 row affected (0.02 sec)
greatsql> SELECT * FROM t0\G
*************************** 1. row ***************************
id: 1
i1: 0
c1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
c2: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
c3: cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c4: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
c5: eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
c6: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
c7: gggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg
1 row in set (0.00 sec)
greatsql> INSERT INTO t0(i1,c1,c2,c3,c4,c5,c6,c7) SELECT i1,c1,c2,c3,c4,c5,c6,c7 FROM t0;
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
Repeatedly execute the forementioned SQL 21 times, until:
greatsql> INSERT INTO t0(i1,c1,c2,c3,c4,c5,c6,c7) SELECT i1,c1,c2,c3,c4,c5,c6,c7 FROM t0;
Query OK, 1048576 rows affected (29.15 sec)
Records: 1048576 Duplicates: 0 Warnings: 0
greatsql> SELECT COUNT(1) FROM t0;
+----------+
| count(1) |
+----------+
| 2097152 |
+----------+
1 row in set (6.72 sec)

复制代码

修改配置文件，设置 innodb_buffer_pool_load_at_startup=OFF
重启数据库，确保下次查询时从磁盘加载，systemctl restart mysql3307

1.2.2 8.0.25的测试结果

执行计划显示走的是二级索引

greatsql> EXPLAIN SELECT COUNT(1) FROM t0;
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | t0 | NULL | index | NULL | idx_i1 | 4 | NULL | 1963965 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

复制代码

执行很慢，需要8秒

greatsql> SELECT COUNT(1) FROM t0;
+----------+
| count(1) |
+----------+
| 2097152 |
+----------+
1 row in set (8.07 sec)

复制代码

执行期间的top显示CPU冲高到200%+，磁盘I/O也很高，说明扫描了聚簇索引树，启用了并行查询

CPU监控

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20094 mysql 20 0 4977160 2.5g 17936 S 240.0 16.4 0:34.02 mysqld

复制代码

磁盘监控

----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ------memory-usage----- ----swap--- sda- sr1-
time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | used buff cach free| used free|util:util
30-08 10:32:05| 1 0 99 0 0 0| 0 0 | 12k 4344B| 0 0 |1391 1842 |3116M 265M 11.3G 933M| 0 0 | 0: 0
30-08 10:32:06| 1 0 99 0 0 0| 0 0 |9125B 214B| 0 0 |1598 2051 |3117M 265M 11.3G 932M| 0 0 | 0: 0
30-08 10:32:07| 7 10 83 0 0 0| 233M 0 |8856B 556B| 0 0 | 49k 59k|3347M 265M 11.3G 701M| 0 0 |95.5: 0
30-08 10:32:08| 5 9 82 4 0 0| 211M 68k|9500B 1187B| 0 0 | 42k 53k|3559M 265M 11.3G 490M| 0 0 |98.4: 0
30-08 10:32:09| 8 10 82 0 0 1| 210M 0 |9042B 15k| 0 0 | 43k 52k|3771M 265M 11.3G 277M| 0 0 |98.4: 0
30-08 10:32:10| 6 18 76 0 0 1| 181M 0 |8685B 476B| 0 0 | 40k 47k|3953M 264M 11.2G 181M| 0 0 |93.3: 0
30-08 10:32:11| 7 11 82 0 0 1| 182M 0 |8696B 13k| 0 0 | 39k 48k|4133M 263M 11.0G 176M| 0 0 |98.0: 0
30-08 10:32:12| 8 13 78 0 0 1| 171M 0 |8648B 2130B| 0 0 | 34k 42k|4302M 261M 10.9G 179M| 0 0 |97.2: 0
30-08 10:32:13| 5 10 84 0 0 1| 161M 0 | 13k 778B| 0 0 | 34k 41k|4462M 253M 10.7G 162M| 0 0 |95.3: 0
30-08 10:32:14| 6 11 76 6 0 1| 180M 56k| 10k 15k| 0 0 | 37k 45k|4642M 252M 10.6G 183M| 0 0 |97.8: 0
30-08 10:32:15| 4 6 90 0 0 0| 111M 0 | 12k 4410B| 0 0 | 23k 29k|4753M 251M 10.5G 170M| 0 0 |28.0: 0
30-08 10:32:16| 1 1 99 0 0 0| 876k 0 |8976B 66B| 0 0 |1860 2390 |4756M 251M 10.5G 167M| 0 0 |7.30: 0
30-08 10:32:17| 0 0 99 0 0 0| 0 0 | 10k 278B| 0 0 |1108 1443 |4756M 251M 10.5G 167M| 0 0 | 0: 0

复制代码

1.2.3 5.7.21的测试结果

执行计划显示走的是二级索引

greatsql> EXPLAIN SELECT COUNT(1) FROM t0;
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | t0 | NULL | index | NULL | idx_i1 | 4 | NULL | 1992321 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

复制代码

执行很快，0.81秒就执行完成

greatsql> SELECT COUNT(1) FROM t0;
+----------+
| count(1) |
+----------+
| 2097152 |
+----------+
1 row in set (0.81 sec)

复制代码

执行期间的top显示CPU只有20%+，磁盘I/O也很低，说明根本没通过聚簇索引

CPU监控

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28155 mysql 20 0 5238280 2.5g 17788 S 20.7 16.3 0:35.20 mysqld

复制代码

磁盘监控

----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ------memory-usage----- ----swap--- sda- sr1-
time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | used buff cach free| used free|util:util
30-08 10:41:37| 1 1 99 0 0 0| 0 0 |9820B 16k| 0 0 |2078 2877 |4340M 204M 8434M 2907M| 0 0 | 0: 0
30-08 10:41:38| 0 0 99 0 0 0| 0 64k|9320B 344B| 0 0 |1125 1579 |4340M 204M 8434M 2907M| 0 0 |0.30: 0
30-08 10:41:39| 2 1 96 1 0 0|9808k 0 |9206B 7890B| 0 0 |2650 3146 |4350M 204M 8434M 2897M| 0 0 |9.30: 0
30-08 10:41:40| 4 0 94 1 0 0| 18M 0 |8579B 344B| 0 0 |4197 4183 |4368M 204M 8434M 2879M| 0 0 |12.2: 0
30-08 10:41:41| 1 1 99 0 0 0| 0 0 | 10k 14k| 0 0 |2218 3058 |4369M 204M 8434M 2878M| 0 0 | 0: 0

复制代码

1.2.4 复现结论

通过以上8.0.25和5.7.21的对比测试，我们发现尽管两者 explain 的执行计划中都声明采用的是二级索引 idx_i1 ，但是实际执行中，8.0.25还是用的聚簇索引，资源占用高并且执行慢；而5.7.21真实的走二级索引，资源占用低并且执行很快
这带来了两个缺陷：

实际的执行计划和 explain 的结果不一致，会给SQL排查带来干扰。需要将 explain 的 key 列改成 PRIMARY
采用的索引不是最优，导致执行得很慢

2. 问题分析

在8.0.17版本中引入了 records_from_index(ha_rows *num_rows, uint) 函数，该函数忽略了上层传入的index参数，直接调用InnoBase::records()让InnoDB自己计算行数并返回，并且强制写了走主键索引的逻辑，导致的结果是无法选择最小索引树来实现遍历，实际执行中只能用到主键索引，即使SQL中加了使用二级索引的hint也不行。当然，等二级索引支持并行查询后就可以在调用records_from_index时实际用到传入的index，但是在8.0.17至8.0.36之间的版本执行select count都会造成很大的执行代价，并且执行计划还会误导DBA以为执行器是用二级索引树执行的扫描。

MySQL 8.0.37中做了优化，解决方式是在 sql/handler.cc 中添加handler::records_from_index(ha_rows *num_rows, uint index) 使用具体的二级索引来执行查询，详细结果见 https://gitee.com/mirrors/mysql-server/commit/22768a0f830c5be769bea0c464a8721ec266beef

commit 22768a0f830c5be769bea0c464a8721ec266beef
tree 4fca26e08bdacb88c31588110f3f614a08b2ebc6
parent 76eeb8ffbf4eb7cf927715a98fe2af5333d8e360
author Sreeharsha Ramanavarapu <sreeharsha.ramanavarapu@oracle.com> 1526702382 +0530
committer Sreeharsha Ramanavarapu <sreeharsha.ramanavarapu@oracle.com> 1526702382 +0530
WL#10398: Improve SELECT COUNT(*) performance by using
handler::records_from_index(*num_rows, index)
in execution phase

复制代码

同时在 MySQL 8.0.37 的changelog https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-37.html 中有这样的描述：

InnoDB: MySQL no longer ignores the optimizer hint to use a secondary index scan, which instead forced a clustered (parallel) index scan. (Bug #100597, Bug #112767, Bug #31791868, Bug #35952353)

因此，从 MySQL 8.0.37 及以后的版本中，不再强制使用聚集索引的并行查询，而是遵循 hint/优化器的建议可以使用二级索引扫描。
3. 解决方案和优化建议

最直接的建议是升级到MySQL 8.0.37，但是也要注意不要使用MySQL 8.0.38/8.4.1/9.0.0版本，因为这三个版本中存在致命 Bug #36808732 (当创建表超过 8000 以后启动失败)，不过这三个版本已经下载不到了，只是tag还保留着。
4. 参考文章

MySQL 8.0.37的发布文档 https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-37.html
INDEX hint does not affect count(*) execution https://bugs.mysql.com/bug.php?id=100597
The performance of version 8.0 when using count(1) is significantly lower compar https://bugs.mysql.com/bug.php?id=111969

Enjoy GreatSQL
来源:https://www.cnblogs.com/greatsql/p/18530449
免责声明：由于采集信息均来自互联网，如果侵犯了您的权益，请联系我们【E-Mail:cb@itdo.tech】我们会及时删除侵权内容，谢谢合作！

MySQL 8.0 执行COUNT()很慢原因分析

本帖子中包含更多资源