翼度科技»论坛 编程开发 mysql 查看内容

MySQL 8.0 Reference Manual(读书笔记53节--Optimizing SQL Statements)

3

主题

3

帖子

9

积分

新手上路

Rank: 1

积分
9
1.Optimizing INFORMATION_SCHEMA Queries

Applications that monitor databases may make frequent【ˈfriːkwənt , friˈkwent 频繁的;经常发生的;】 use of INFORMATION_SCHEMA tables. To write queries for these tables most efficiently, use the following general【ˈdʒenrəl 一般的;总的;普遍的;大致的,大概的(方向或地区);常规的;大体的;整体的;正常的;全体的;首席的;笼统的;概括性的;非专门的;】 guidelines:
• Try to query only INFORMATION_SCHEMA tables that are views on data dictionary tables.
• Try to query only for static metadata. Selecting columns or using retrieval conditions for dynamic metadata along with static metadata adds overhead to process the dynamic metadata.
These INFORMATION_SCHEMA tables are implemented【ˈɪmplɪmentɪd 实施;执行;贯彻;使生效;】 as views on data dictionary tables, so queries on them retrieve information from the data dictionary:
  1. CHARACTER_SETS
  2. CHECK_CONSTRAINTS
  3. COLLATIONS
  4. COLLATION_CHARACTER_SET_APPLICABILITY
  5. COLUMNS
  6. EVENTS
  7. FILES
  8. INNODB_COLUMNS
  9. INNODB_DATAFILES
  10. INNODB_FIELDS
  11. INNODB_FOREIGN
  12. INNODB_FOREIGN_COLS
  13. INNODB_INDEXES
  14. INNODB_TABLES
  15. INNODB_TABLESPACES
  16. INNODB_TABLESPACES_BRIEF
  17. INNODB_TABLESTATS
  18. KEY_COLUMN_USAGE
  19. PARAMETERS
  20. PARTITIONS
  21. REFERENTIAL_CONSTRAINTS
  22. RESOURCE_GROUPS
  23. ROUTINES
  24. SCHEMATA
  25. STATISTICS
  26. TABLES
  27. TABLE_CONSTRAINTS
  28. TRIGGERS
  29. VIEWS
  30. VIEW_ROUTINE_USAGE
  31. VIEW_TABLE_USAGE
复制代码
Some types of values, even for a non-view INFORMATION_SCHEMA table, are retrieved by lookups from the data dictionary. This includes values such as database and table names, table types, and storage engines.
Some INFORMATION_SCHEMA tables contain columns that provide table statistics:
  1. STATISTICS.CARDINALITY
  2. TABLES.AUTO_INCREMENT
  3. TABLES.AVG_ROW_LENGTH
  4. TABLES.CHECKSUM
  5. TABLES.CHECK_TIME
  6. TABLES.CREATE_TIME
  7. TABLES.DATA_FREE
  8. TABLES.DATA_LENGTH
  9. TABLES.INDEX_LENGTH
  10. TABLES.MAX_DATA_LENGTH
  11. TABLES.TABLE_ROWS
  12. TABLES.UPDATE_TIME
复制代码
Those columns represent【ˌreprɪˈzent 代表,表示;(在竞赛或体育赛事中)代表(国家或地区);(标志或符号)意味着,代表着,标志着;描绘,(形象地)表现,描写;代理(个人或团体);代表(个人或团体)出席;使】 dynamic table metadata; that is, information that changes as table contents change.
By default, MySQL retrieves cached values for those columns from the mysql.index_stats and mysql.table_stats dictionary tables when the columns are queried, which is more efficient than retrieving statistics directly from the storage engine. If cached statistics are not available or have expired, MySQL retrieves the latest statistics from the storage engine and caches them in the mysql.index_stats and mysql.table_stats dictionary tables. Subsequent【ˈsʌbsɪkwənt 随后的;之后的;后来的;接后的;】 queries retrieve the cached statistics until the cached statistics expire. A server restart or the first opening of the mysql.index_stats and mysql.table_stats tables do not update cached statistics automatically.
The information_schema_stats_expiry session variable defines the period of time before cached statistics expire. The default is 86400 seconds (24 hours), but the time period can be extended to as much as one year.
To update cached values at any time for a given table, use ANALYZE TABLE.
Querying statistics columns does not store or update statistics in the mysql.index_stats and mysql.table_stats dictionary tables under these circumstances【ˈsɜrkəmˌstænsəz , ˈsɜrkəmˌstænsɪz 环境;条件;状况;境况;境遇;(尤指)经济状况;命运;客观环境;】:
• When cached statistics have not expired.
• When information_schema_stats_expiry is set to 0.
• When the server is in read_only, super_read_only, transaction_read_only, or innodb_read_only mode.
• When the query also fetches Performance Schema data.
information_schema_stats_expiry is a session variable, and each client session can define its own expiration value. Statistics that are retrieved from the storage engine and cached by one session are available to other sessions.
【If the innodb_read_only system variable is enabled, ANALYZE TABLE may fail because it cannot update statistics tables in the data dictionary, which use InnoDB. For ANALYZE TABLE operations that update the key distribution, failure may occur even if the operation updates the table itself (for example, if it is a MyISAM table). To obtain the updated distribution statistics, set information_schema_stats_expiry=0.】
For INFORMATION_SCHEMA tables implemented as views on data dictionary tables, indexes on the underlying data dictionary tables permit the optimizer to construct efficient query execution plans. To see the choices made by the optimizer, use EXPLAIN. To also see the query used by the server to execute an INFORMATION_SCHEMA query, use SHOW WARNINGS immediately following EXPLAIN.
Consider this statement, which identifies【aɪˈdentɪfaɪz 确认;发现;鉴定;显示;找到;认出;说明身份;】 collations for the utf8mb4 character set:
  1. mysql> SELECT COLLATION_NAME
  2. FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
  3. WHERE CHARACTER_SET_NAME = 'utf8mb4';
  4. +----------------------------+
  5. | COLLATION_NAME             |
  6. +----------------------------+
  7. | utf8mb4_general_ci         |
  8. | utf8mb4_bin                |
  9. | utf8mb4_unicode_ci         |
  10. | utf8mb4_icelandic_ci       |
  11. | utf8mb4_latvian_ci         |
  12. | utf8mb4_romanian_ci        |
  13. | utf8mb4_slovenian_ci       |
  14. ...
复制代码
How does the server process that statement? To find out, use EXPLAIN:
  1. mysql> EXPLAIN SELECT COLLATION_NAME
  2. FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
  3. WHERE CHARACTER_SET_NAME = 'utf8mb4'\G
  4. *************************** 1. row ***************************
  5. id: 1
  6. select_type: SIMPLE
  7. table: cs
  8. partitions: NULL
  9. type: const
  10. possible_keys: PRIMARY,name
  11. key: name
  12. key_len: 194
  13. ref: const
  14. rows: 1
  15. filtered: 100.00
  16. Extra: Using index
  17. *************************** 2. row ***************************
  18. id: 1
  19. select_type: SIMPLE
  20. table: col
  21. partitions: NULL
  22. type: ref
  23. possible_keys: character_set_id
  24. key: character_set_id
  25. key_len: 8
  26. ref: const
  27. rows: 68
  28. filtered: 100.00
  29. Extra: NULL
  30. 2 rows in set, 1 warning (0.01 sec)
复制代码
To see the query used to satisfy that statement, use SHOW WARNINGS:
  1. mysql> SHOW WARNINGS\G
  2. *************************** 1. row ***************************
  3. Level: Note
  4. Code: 1003
  5. Message: /* select#1 */ select `mysql`.`col`.`name` AS `COLLATION_NAME`
  6. from `mysql`.`character_sets` `cs`
  7. join `mysql`.`collations` `col`
  8. where ((`mysql`.`col`.`character_set_id` = '45')
  9. and ('utf8mb4' = 'utf8mb4'))
复制代码
As indicated by SHOW WARNINGS, the server handles the query on COLLATION_CHARACTER_SET_APPLICABILITY as a query on the character_sets and collations data dictionary tables in the mysql system database.
2. Optimizing Performance Schema Queries

Applications that monitor databases may make frequent use of Performance Schema tables. To write queries for these tables most efficiently, take advantage of their indexes. For example, include a WHERE clause that restricts retrieved rows based on comparison to specific values in an indexed column.
Most Performance Schema tables have indexes. Tables that do not are those that normally contain few rows or are unlikely to be queried frequently. Performance Schema indexes give the optimizer access to execution plans other than full table scans. These indexes also improve performance for related objects, such as sys schema views that use those tables.
To see whether a given Performance Schema table has indexes and what they are, use SHOW INDEX or SHOW CREATE TABLE:
  1. mysql> SHOW INDEX FROM performance_schema.accounts\G
  2. *************************** 1. row ***************************
  3. Table: accounts
  4. Non_unique: 0
  5. Key_name: ACCOUNT
  6. Seq_in_index: 1
  7. Column_name: USER
  8. Collation: NULL
  9. Cardinality: NULL
  10. Sub_part: NULL
  11. Packed: NULL
  12. Null: YES
  13. Index_type: HASH
  14. Comment:
  15. Index_comment:
  16. Visible: YES
  17. *************************** 2. row ***************************
  18. Table: accounts
  19. Non_unique: 0
  20. Key_name: ACCOUNT
  21. Seq_in_index: 2
  22. Column_name: HOST
  23. Collation: NULL
  24. Cardinality: NULL
  25. Sub_part: NULL
  26. Packed: NULL
  27. Null: YES
  28. Index_type: HASH
  29. Comment:
  30. Index_comment:
  31. Visible: YES
  32. mysql> SHOW CREATE TABLE performance_schema.rwlock_instances\G
  33. *************************** 1. row ***************************
  34. Table: rwlock_instances
  35. Create Table: CREATE TABLE `rwlock_instances` (
  36. `NAME` varchar(128) NOT NULL,
  37. `OBJECT_INSTANCE_BEGIN` bigint(20) unsigned NOT NULL,
  38. `WRITE_LOCKED_BY_THREAD_ID` bigint(20) unsigned DEFAULT NULL,
  39. `READ_LOCKED_BY_COUNT` int(10) unsigned NOT NULL,
  40. PRIMARY KEY (`OBJECT_INSTANCE_BEGIN`),
  41. KEY `NAME` (`NAME`),
  42. KEY `WRITE_LOCKED_BY_THREAD_ID` (`WRITE_LOCKED_BY_THREAD_ID`)
  43. ) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
复制代码
To see the execution plan for a Performance Schema query and whether it uses any indexes, use EXPLAIN:
  1. mysql> EXPLAIN SELECT * FROM performance_schema.accounts
  2. WHERE (USER,HOST) = ('root','localhost')\G
  3. *************************** 1. row ***************************
  4. id: 1
  5. select_type: SIMPLE
  6. table: accounts
  7. partitions: NULL
  8. type: const
  9. possible_keys: ACCOUNT
  10. key: ACCOUNT
  11. key_len: 278
  12. ref: const,const
  13. rows: 1
  14. filtered: 100.00
  15. Extra: NULL
复制代码
The EXPLAIN output indicates that the optimizer uses the accounts table ACCOUNT index that comprises the USER and HOST columns.
Performance Schema indexes are virtual: They are a construct of the Performance Schema storage engine and use no memory or disk storage. The Performance Schema reports index information to the optimizer so that it can construct efficient execution plans. The Performance Schema in turn uses optimizer information about what to look for (for example, a particular key value), so that it can perform efficient lookups without building actual index structures. This implementation provides two important benefits:
• It entirely avoids the maintenance cost normally incurred for tables that undergo frequent updates.
• It reduces at an early stage of query execution the amount of data retrieved. For conditions on the indexed columns, the Performance Schema efficiently returns only table rows that satisfy the query conditions. Without an index, the Performance Schema would return all rows in the table, requiring that the optimizer later evaluate the conditions against each row to produce the final result.
Performance Schema indexes are predefined and cannot be dropped, added, or altered.
Performance Schema indexes are similar to hash indexes. For example:
• They are used only for equality comparisons that use the = or  operators.
• They are unordered. If a query result must have specific row ordering characteristics, include an ORDER BY clause.
3.Optimizing Data Change Statements

This section explains how to speed up data change statements: INSERT, UPDATE, and DELETE. Traditional OLTP applications and modern web applications typically do many small data change operations, where concurrency is vital【ˈvaɪtl 至关重要的,必不可少的;生命的;生命统计的,生死统计的;生气勃勃的,充满生机的;致命的,生死攸关的;对…极重要的;维持生命所必需的;热情洋溢的;】. Data analysis and reporting applications typically run data change operations that affect many rows at once, where the main considerations is the I/O to write large amounts of data and keep indexes up-to-date. For inserting and updating large volumes of data (known in the industry as ETL, for “extract-transform-load”), sometimes you use other SQL statements or external commands, that mimic the effects of INSERT, UPDATE, and DELETE statements.
3.1 Optimizing INSERT Statements

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.
The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
• Connecting: (3)
• Sending query to server: (2)
• Parsing query: (2)
• Inserting row: (1 × size of row)
• Inserting indexes: (1 × number of indexes)
• Closing: (1)
This does not take into consideration the initial【ɪˈnɪʃl 始的;最初的;第一的;】 overhead to open tables, which is done once for each concurrently running query.
The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.
You can use the following methods to speed up inserts:
• If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.
• When loading a table from a text file, use LOAD DATA. This is usually 20 times faster than using INSERT statements.
• Take advantage of the fact that columns have default values. Insert values explicitly【ɪkˈsplɪsətli 明确地;明白地;】 only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.
3.2 Optimizing UPDATE Statements

An update statement is optimized like a SELECT query with the additional overhead of a write. The speed of the write depends on the amount of data being updated and the number of indexes that are updated. Indexes that are not changed do not get updated.
Another way to get fast updates is to delay updates and then do many updates in a row later. Performing multiple updates together is much quicker than doing one at a time if you lock the table.
For a MyISAM table that uses dynamic row format, updating a row to a longer total length may split the row. If you do this often, it is very important to use OPTIMIZE TABLE occasionally【əˈkeɪʒnəli 偶尔;偶然;有时候;偶尔地;】.
3.3 Optimizing DELETE Statements

The time required to delete individual rows in a MyISAM table is exactly proportional【prəˈpɔːrʃənl 成比例的;相称的;均衡的;】 to the number of indexes. To delete rows more quickly, you can increase the size of the key cache by increasing the key_buffer_size system variable.
To delete all rows from a MyISAM table, TRUNCATE TABLE tbl_name is faster than DELETE FROM tbl_name. Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock.
4.Optimizing Database Privileges

The more complex your privilege setup, the more overhead【ˌoʊvərˈhed , ˈoʊvərhed 高架的;管理费用的;经费的;头上方的;地面以上的;】 applies to all SQL statements. Simplifying the privileges established by GRANT statements enables MySQL to reduce permission-checking overhead【ˌoʊvərˈhed , ˈoʊvərhed 开销;经常费用;经常开支;(尤指飞机的)顶舱;用于高射投影器的幻灯片;】 when clients execute statements. For example, if you do not grant any table-level or column-level privileges, the server need not ever check the contents of the tables_priv and columns_priv tables. Similarly, if you place no resource limits on any accounts, the server does not have to perform resource counting. If you have a very high statement-processing load, consider using a simplified grant structure【ˈstrʌktʃər 结构;构造;体系;(尤指)建筑物;结构体;精心组织;周密安排;】 to reduce permission-checking overhead.
5.Other Optimization Tips

This section lists a number of miscellaneous【ˌmɪsəˈleɪniəs 各种各样的;混杂的;】 tips for improving query processing speed:
• If your application makes several database requests to perform related updates, combining the statements into a stored routine can help performance. Similarly, if your application computes a single result based on several column values or large volumes of data, combining the computation into a loadable function can help performance. The resulting fast database operations are then available to be reused by other queries, applications, and even code written in different programming languages.
• To fix any compression issues that occur with ARCHIVE tables, use OPTIMIZE TABLE.
• If possible, classify【ˈklæsɪfaɪ 分类;划分;将…分类;界定;】 reports as “live” or as “statistical”, where data needed for statistical reports is created only from summary tables that are generated periodically from the live data.
• If you have data that does not conform well to a rows-and-columns table structure, you can pack【pæk 包装,装(箱);(把…)打包,(把…)装箱;收拾(行李);堆积,压实;挤满,塞满;备有,含有;(用某物)保存,保藏;佩带,携带(枪支);停止,结束;加工包装 (食品),把】 and store data into a BLOB column. In this case, you must provide code in your application to pack and unpack information, but this might save I/O operations to read and write the sets of related values.
• With Web servers, store images and other binary assets as files, with the path name stored in the database rather than the file itself. Most Web servers are better at caching files than database contents, so using files is generally faster. (Although you must handle backups and storage issues yourself in this case.)
• If you need really high speed, look at the low-level MySQL interfaces. For example, by accessing the MySQL InnoDB or MyISAM storage engine directly, you could get a substantial speed increase compared to using the SQL interface.
Similarly, for databases using the NDBCLUSTER storage engine, you may wish to investigate possible use of the NDB API.
• Replication can provide a performance benefit for some operations. You can distribute【dɪˈstrɪbjuːt 分配;使分布;分发;分销;分散;使散开;】 client retrievals among replicas to split up【splɪt ʌp (使)分手;分开;(使)离婚;拆分;(使)离散;(把…)分成若干较小部分;】 the load. To avoid slowing down the source while making backups, you can make backups using a replica.
 

来源:https://www.cnblogs.com/xuliuzai/p/18200914
免责声明:由于采集信息均来自互联网,如果侵犯了您的权益,请联系我们【E-Mail:cb@itdo.tech】 我们会及时删除侵权内容,谢谢合作!

举报 回复 使用道具