|
Queries, in the form of SELECT statements, perform all the lookup operations in the database. Tuning these statements is a top priority, whether to achieve sub-second response times for dynamic web pages, or to chop hours off the time to generate huge overnight reports.
Besides SELECT statements, the tuning techniques for queries also apply to constructs such as CREATE TABLE...AS SELECT, INSERT INTO...SELECT, and WHERE clauses in DELETE statements. Those statements have additional performance considerations because they combine write operations with the read-oriented query operations.
NDB Cluster supports a join pushdown optimization whereby a qualifying join is sent in its entirety to NDB Cluster data nodes, where it can be distributed among them and executed in parallel.
The main considerations for optimizing queries are:
• To make a slow SELECT ... WHERE query faster, the first thing to check is whether you can add an index. Set up indexes on columns used in the WHERE clause, to speed up evaluation, filtering, and the final retrieval of results. To avoid wasted disk space, construct a small set of indexes that speed up many related queries used in your application.
Indexes are especially important for queries that reference different tables, using features such as joins and foreign keys. You can use the EXPLAIN statement to determine which indexes are used for a SELECT.
• Isolate and tune any part of the query, such as a function call, that takes excessive【ɪkˈsesɪv 过度的;过分的;】 time. Depending on how the query is structured, a function could be called once for every row in the result set, or even once for every row in the table, greatly magnifying any inefficiency.
• Minimize the number of full table scans in your queries, particularly for big tables.
• Keep table statistics up to date by using the ANALYZE TABLE statement periodically, so the optimizer has the information needed to construct an efficient execution plan.
• Learn the tuning techniques, indexing techniques, and configuration parameters that are specific to the storage engine for each table. Both InnoDB and MyISAM have sets of guidelines for enabling and sustaining high performance in queries.
• You can optimize single-query transactions for InnoDB tables.
• Avoid transforming the query in ways that make it hard to understand, especially if the optimizer does some of the same transformations automatically.
• If a performance issue is not easily solved by one of the basic guidelines, investigate the internal details of the specific query by reading the EXPLAIN plan and adjusting your indexes, WHERE clauses, join clauses, and so on. (When you reach a certain level of expertise, reading the EXPLAIN plan might be your first step for every query.)
• Adjust the size and properties of the memory areas that MySQL uses for caching. With efficient use of the InnoDB buffer pool, MyISAM key cache, and the MySQL query cache, repeated queries run faster because the results are retrieved from memory the second and subsequent times.
• Even for a query that runs fast using the cache memory areas, you might still optimize further so that they require less cache memory, making your application more scalable. Scalability means that your application can handle more simultaneous users, larger requests, and so on without experiencing a big drop in performance.
• Deal with locking issues, where the speed of your query might be affected by other sessions accessing the tables at the same time.
1. WHERE Clause Optimization
This section discusses optimizations that can be made for processing WHERE clauses. The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements.
You might be tempted【ˈtemptɪd 想做…的;希望做的;】 to rewrite your queries to make arithmetic operations faster, while sacrificing【ˈsækrɪfaɪsɪŋ 牺牲;献出;以(人或动物)作祭献;】 readability【ˌridəˈbɪlɪti 可读性;易读性;可阅读性;】. Because MySQL does similar optimizations automatically, you can often avoid this work, and leave the query in a more understandable and maintainable form. Some of the optimizations performed by MySQL follow:
• Removal of unnecessary parentheses【pəˈrɛnθəˌsiz 插入语;】:- ((a AND b) AND c OR (((a AND b) AND (c AND d))))
- -> (a AND b AND c) OR (a AND b AND c AND d)
复制代码 • Constant folding:- (a<b AND b=c) AND a=5
- -> b>5 AND b=c AND a=5
复制代码 • Constant condition removal:- (b>=5 AND b=5) OR (b=6 AND 5=5) OR (b=7 AND 5=6)
- -> b=5 OR b=6
复制代码 In MySQL 8.0.14 and later, this takes place during preparation rather than during the optimization phase, which helps in simplification of joins.
• Constant expressions used by indexes are evaluated【ɪˈvæljueɪtɪd 评价;评估;估计;】 only once.
• Beginning with MySQL 8.0.16, comparisons of columns of numeric types with constant values are checked and folded or removed for invalid or out-of-rage values:- # CREATE TABLE t (c TINYINT UNSIGNED NOT NULL);
- SELECT * FROM t WHERE c ≪ 256;
- -≫ SELECT * FROM t WHERE 1;
复制代码 • COUNT(*) on a single table without a WHERE is retrieved directly from the table information for MyISAM and MEMORY tables. This is also done for any NOT NULL expression when used with only one table.
• Early detection of invalid constant expressions. MySQL quickly detects that some SELECT statements are impossible and returns no rows.
• HAVING is merged with WHERE if you do not use GROUP BY or aggregate functions (COUNT(), MIN(), and so on).
• For each table in a join, a simpler WHERE is constructed to get a fast WHERE evaluation for the table and also to skip rows as soon as possible.
• All constant tables are read first before any other tables in the query. A constant table is any of the following:
- An empty table or a table with one row.
- A table that is used with a WHERE clause on a PRIMARY KEY or a UNIQUE index, where all index parts are compared to constant expressions and are defined as NOT NULL.
All of the following tables are used as constant tables:- SELECT * FROM t WHERE primary_key=1;
- SELECT * FROM t1,t2
- WHERE t1.primary_key=1 AND t2.primary_key=t1.id;
复制代码 • The best join combination for joining the tables is found by trying all possibilities. If all columns in ORDER BY and GROUP BY clauses come from the same table, that table is preferred first when joining.
• If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
• If you use the SQL_SMALL_RESULT modifier, MySQL uses an in-memory temporary table.
• Each table index is queried, and the best index is used unless the optimizer believes that it is more efficient to use a table scan. At one time, a scan was used based on whether the best index spanned【spænd 跨越;包括(广大地区);横跨;持续;涵盖(多项内容);贯穿;】 more than 30% of the table, but a fixed percentage no longer determines the choice between using an index or a scan. The optimizer now is more complex and bases its estimate on additional factors such as table size, number of rows, and I/O block size.
• In some cases, MySQL can read rows from the index without even consulting the data file. If all columns used from the index are numeric, only the index tree is used to resolve the query.
• Before each row is output, those that do not match the HAVING clause are skipped.
Some examples of queries that are very fast:- SELECT COUNT(*) FROM tbl_name;
- SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;
- SELECT MAX(key_part2) FROM tbl_name
- WHERE key_part1=constant;
- SELECT ... FROM tbl_name
- ORDER BY key_part1,key_part2,... LIMIT 10;
- SELECT ... FROM tbl_name
- ORDER BY key_part1 DESC, key_part2 DESC, ... LIMIT 10;
复制代码 MySQL resolves the following queries using only the index tree, assuming that the indexed columns are numeric:- SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;
- SELECT COUNT(*) FROM tbl_name
- WHERE key_part1=val1 AND key_part2=val2;
- SELECT MAX(key_part2) FROM tbl_name GROUP BY key_part1;
复制代码 The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass:- SELECT ... FROM tbl_name
- ORDER BY key_part1,key_part2,... ;
- SELECT ... FROM tbl_name
- ORDER BY key_part1 DESC, key_part2 DESC, ... ;
复制代码 2.Range Optimization
The range access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals【ˈɪntərvəlz (时间上的)间隔,间隙,间歇;(戏剧、电影或音乐会的)幕间休息,休息时间;(其他事情)穿插出现的间隙;】. It can be used for a single-part or multiple-part index. The following sections describe conditions under which the optimizer uses range access.
2.1 Range Access Method for Single-Part Indexes
For a single-part index, index value intervals can be conveniently【kən'viniəntli 方便地;通常;便利地;顺利;】 represented by corresponding【ˌkɔːrəˈspɑːndɪŋ 相应的;相关的;符合的;】 conditions in the WHERE clause, denoted as range conditions rather than “intervals.”
The definition of a range condition for a single-part index is as follows:
• For both BTREE and HASH indexes, comparison【kəmˈpærɪsn 比较;对比;相比;】 of a key part with a constant value is a range condition when using the =, , IN(), IS NULL, or IS NOT NULL operators.
• Additionally, for BTREE indexes, comparison of a key part with a constant value is a range condition when using the >, =, 1 AND key_col < 10;SELECT * FROM t1 WHERE key_col = 1 OR key_col IN (15,18,20);SELECT * FROM t1 WHERE key_col LIKE 'ab%' OR key_col BETWEEN 'bar' AND 'foo';[/code]Some nonconstant values may be converted to constants during the optimizer constant propagation phase.
MySQL tries to extract range conditions from the WHERE clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.
Consider the following statement, where key1 is an indexed column and nonkey is not indexed:- SELECT * FROM t1
- WHERE key_col > 1
- AND key_col < 10;
- SELECT * FROM t1
- WHERE key_col = 1
- OR key_col IN (15,18,20);
- SELECT * FROM t1
- WHERE key_col LIKE 'ab%'
- OR key_col BETWEEN 'bar' AND 'foo';
复制代码 The extraction process for key key1 is as follows:
1. Start with original WHERE clause:- SELECT * FROM t1 WHERE
- (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
- (key1 < 'bar' AND nonkey = 4) OR
- (key1 < 'uux' AND key1 > 'z');
复制代码 2. Remove nonkey = 4 and key1 LIKE '%b' because they cannot be used for a range scan. The correct way to remove them is to replace them with TRUE, so that we do not miss any matching rows when doing the range scan. Replacing them with TRUE yields:- (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
- (key1 < 'bar' AND nonkey = 4) OR
- (key1 < 'uux' AND key1 > 'z')
复制代码 3. Collapse conditions that are always true or false:
• (key1 LIKE 'abcde%' OR TRUE) is always true
• (key1 < 'uux' AND key1 > 'z') is always false
Replacing these conditions with constants yields:- (key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR
- (key1 < 'bar' AND TRUE) OR
- (key1 < 'uux' AND key1 > 'z')
复制代码 Removing unnecessary TRUE and FALSE constants yields:- (key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)
复制代码 4. Combining overlapping intervals into one yields the final condition to be used for the range scan:- (key1 < 'abc') OR (key1 < 'bar')
复制代码 In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive【rɪˈstrɪktɪv 限制性的;约束的;】 than the WHERE clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE clause.
The range condition extraction algorithm can handle nested AND/OR constructs of arbitrary【ˈɑːrbɪtreri 任意的;武断的;专横的;专制的;随心所欲的;】 depth, and its output does not depend on the order in which conditions appear in WHERE clause.
MySQL does not support merging multiple ranges for the range access method for spatial indexes. To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.
2.2 Range Access Method for Multiple-Part Indexes
Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.
For example, consider a multiple-part index defined as key1(key_part1, key_part2, key_part3), and the following set of key tuples listed in key order:The condition key_part1 = 1 defines this interval【ˈɪntərvl 间隔;(时间上的)间隙;间歇;音程;休息时间;(戏剧、电影或音乐会的)幕间休息;(其他事情)穿插出现的间隙;】:- key_part1 key_part2 key_part3
- NULL 1 'abc'
- NULL 1 'xyz'
- NULL 2 'foo'
- 1 1 'abc'
- 1 1 'xyz'
- 1 2 'abc'
- 2 1 'aaa'
复制代码 The single interval is:- (1,-inf,-inf) <= (key_part1,key_part2,key_part3) < (1,+inf,+inf)
复制代码 It is possible that the created interval contains more rows than the initial condition. For example, the preceding interval includes the value ('foo', 11, 0), which does not satisfy the original condition.
• If conditions that cover sets of rows contained within intervals are combined with OR, they form a condition that covers a set of rows contained within the union of their intervals. If the conditions are combined with AND, they form a condition that covers a set of rows contained within the intersection of their intervals. For example, for this condition on a two-part index:- key_part1 cmp const1
- AND key_part2 cmp const2
- AND ...
- AND key_partN cmp constN;
复制代码 The intervals are:- key_part1 = 1 AND key_part2 IS NULL AND key_part3 = 'foo'
复制代码 In this example, the interval on the first line uses one key part for the left bound【baʊnd 跳跃;蹦跳;】 and two key parts for the right bound. The interval on the second line uses only one key part. The key_len column in the EXPLAIN output indicates the maximum length of the key prefix used.
In some cases, key_len may indicate that a key part was used, but that might be not what you would expect. Suppose that key_part1 and key_part2 can be NULL. Then the key_len column displays two key part lengths for the following condition:- key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10
复制代码 But, in fact, the condition is converted to this:- ('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)
复制代码 2.3 Equality Range Optimization of Many-Valued Comparisons
Consider these expressions, where col_name is an indexed column:- (key_part1 = 1 AND key_part2 < 2) OR (key_part1 > 5)
复制代码 Each expression is true if col_name is equal to any of several values. These comparisons【kəmˈpɛrəsənz 比较;对比;相比;】 are equality【iˈkwɑːləti 平等;相等;均等;】 range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:
• If there is a unique index on col_name, the row estimate for each range is 1 because at most one row can have the given value.
• Otherwise, any index on col_name is nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.
With index dives【daɪvz (戴呼吸装备)潜水;跳水(头和两臂先入水);下潜;潜到更深的水下;】, the optimizer makes a dive【(dive 的复数)下落;暴跌;跳水;假摔;】 at each end of a range and uses the number of rows in the range as the estimate. For example, the expression col_name IN (10, 20, 30) has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields【jiːldz 产生(收益、效益等);屈服;提供;让步;放弃;出产(作物);缴出;】 an estimate of the number of rows that have the given value.
ndex dives provide accurate【ˈækjərət 精确的;准确的(掷、射、击等);正确无误的;】 row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.
The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. To disable use of statistics and always use index dives regardless of N, set eq_range_index_dive_limit to 0.
To update table index statistics for best estimates【ˈestɪmeɪts (对数量、成本等的)估计;估价;估计的成本;】, use ANALYZE TABLE.
Prior to MySQL 8.0, there is no way of skipping the use of index dives to estimate index usefulness, except by using the eq_range_index_dive_limit system variable. In MySQL 8.0, index dive skipping is possible for queries that satisfy all these conditions:
• The query is for a single table, not a join on multiple tables.
• A single-index FORCE INDEX index hint is present. The idea is that if index use is forced, there is nothing to be gained from the additional overhead of performing dives into the index.
• The index is nonunique and not a FULLTEXT index.
• No subquery is present.
• No DISTINCT, GROUP BY, or ORDER BY clause is present.
For EXPLAIN FOR CONNECTION, the output changes as follows if index dives are skipped:
• For traditional output, the rows and filtered values are NULL.
• For JSON output, rows_examined_per_scan and rows_produced_per_join do not appear, skip_index_dive_due_to_force is true, and cost calculations are not accurate.
Without FOR CONNECTION, EXPLAIN output does not change when index dives are skipped.
After execution of a query for which index dives are skipped, the corresponding row in the Information Schema OPTIMIZER_TRACE table contains an index_dives_for_range_access value of skipped_due_to_force_index.
2.4 Skip Scan Range Access Method
Consider the following scenario:- (1,-inf) < (key_part1,key_part2) < (1,2)
- (5,-inf) < (key_part1,key_part2)
复制代码 To execute this query, MySQL can choose an index scan to fetch all rows (the index includes all columns to be selected), then apply the f2 > 40 condition from the WHERE clause to produce the final result set.
A range scan is more efficient than a full index scan, but cannot be used in this case because there is no condition on f1, the first index column. However, as of MySQL 8.0.13, the optimizer can perform multiple range scans, one for each value of f1, using a method called Skip Scan that is similar to Loose Index Scan:
1. Skip between distinct values of the first index part, f1 (the index prefix).
2. Perform a subrange【子范围;子域;子界类型;子界;】 scan on each distinct prefix value for the f2 > 40 condition on the remaining index part.
For the data set shown earlier, the algorithm operates like this:
1. Get the first distinct value of the first key part (f1 = 1).
2. Construct【kənˈstrʌkt , ˈkɑːnstrʌkt 建筑;建造;修建;(按照数学规则)编制,绘制;组成;创建;】 the range based on the first and second key parts (f1 = 1 AND f2 > 40).
3. Perform a range scan.
4. Get the next distinct value of the first key part (f1 = 2).
5. Construct the range based on the first and second key parts (f1 = 2 AND f2 > 40).
6. Perform a range scan.
Using this strategy decreases【dɪˈkriːsɪz (使大小、数量等)减少,减小,降低;】 the number of accessed rows because MySQL skips the rows that do not qualify for each constructed range. This Skip Scan access method is applicable under the following conditions:
• Table T has at least one compound【ˈkɑːmpaʊnd , kəmˈpaʊnd 复合的;】 index with key parts of the form ([A_1, ..., A_k,] B_1, ..., B_m, C [, D_1, ..., D_n]). Key parts A and D may be empty, but B and C must be nonempty.
• The query references only one table.
• The query does not use GROUP BY or DISTINCT.
• The query references only columns in the index.
• The predicates【ˈpredɪkeɪts】 on A_1, ..., A_k must be equality predicates and they must be constants. This includes the IN() operator.
• The query must be a conjunctive【连接的;联合的,连接(着)的;契合的;合取的;】 query; that is, an AND of OR conditions: (cond1(key_part1) OR cond2(key_part1)) AND (cond1(key_part2) OR ...) AND ...
• There must be a range condition on C.
• Conditions on D columns are permitted. Conditions on D must be in conjunction with the range condition on C.
Use of Skip Scan is indicated in EXPLAIN output as follows:
• Using index for skip scan in the Extra column indicates that the loose index Skip Scan access method is used.
• If the index can be used for Skip Scan, the index should be visible in the possible_keys column.
Use of Skip Scan is indicated in optimizer trace output by a "skip scan" element of this form:- key_part1 >= 1 AND key_part2 < 2
复制代码 You may also see a "best_skip_scan_summary" element. If Skip Scan is chosen as the best range access variant, a "chosen_range_access_summary" is written. If Skip Scan is chosen as the overall best access method, a "best_access_path" element is present.
Use of Skip Scan is subject to the value of the skip_scan flag of the optimizer_switch system variable. By default, this flag is on. To disable it, set skip_scan to off.
In addition to using the optimizer_switch system variable to control optimizer use of Skip Scan session-wide, MySQL supports optimizer hints to influence the optimizer on a per-statement basis.
2.5 Range Optimization of Row Constructor Expressions
The optimizer is able to apply the range scan access method to queries of this form:- key_part1 >= 1 AND key_part2 IS NOT NULL
复制代码 Previously, for range scans to be used, it was necessary to write the query as:- col_name IN(val1, ..., valN)
- col_name = val1 OR ... OR col_name = valN
复制代码 For the optimizer to use a range scan, queries must satisfy these conditions:
• Only IN() predicates are used, not NOT IN().
• On the left side of the IN() predicate, the row constructor contains only column references.
• On the right side of the IN() predicate, row constructors contain only runtime constants, which are either literals or local column references that are bound to constants during execution.
• On the right side of the IN() predicate, there is more than one row constructor.
2.6 Limiting Memory Use for Range Optimization
To control the memory available to the range optimizer, use the range_optimizer_max_mem_size system variable:
• A value of 0 means “no limit.”
• With a value greater than 0, the optimizer tracks the memory consumed when considering the range access method. If the specified limit is about to be exceeded, the range access method is abandoned and other methods, including a full table scan, are considered instead. This could be less optimal. If this happens, the following warning occurs (where N is the current range_optimizer_max_mem_size value):- CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL, PRIMARY KEY(f1, f2));
- INSERT INTO t1 VALUES
- (1,1), (1,2), (1,3), (1,4), (1,5),
- (2,1), (2,2), (2,3), (2,4), (2,5);
- INSERT INTO t1 SELECT f1, f2 + 5 FROM t1;
- INSERT INTO t1 SELECT f1, f2 + 10 FROM t1;
- INSERT INTO t1 SELECT f1, f2 + 20 FROM t1;
- INSERT INTO t1 SELECT f1, f2 + 40 FROM t1;
- ANALYZE TABLE t1;
- EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40;
复制代码 • For UPDATE and DELETE statements, if the optimizer falls back to a full table scan and the sql_safe_updates system variable is enabled, an error occurs rather than a warning because, in effect, no key is used to determine which rows to modify.
For individual queries that exceed the available range optimization memory and for which the optimizer falls back to less optimal plans, increasing the range_optimizer_max_mem_size value may improve performance.
To estimate the amount of memory needed to process a range expression, use these guidelines:
• For a simple query such as the following, where there is one candidate key for the range access method, each predicate combined with OR uses approximately 230 bytes:- "skip_scan_range": {
- "type": "skip_scan",
- "index": index_used_for_skip_scan,
- "key_parts_used_for_access": [key_parts_used_for_access],
- "range": [range]
- }
复制代码 • Similarly for a query such as the following, each predicate combined with AND uses approximately 125 bytes:- SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));
复制代码 • For a query with IN() predicates:- SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
- OR ( col_1 = 'c' AND col_2 = 'd' );
复制代码 Each literal value in an IN() list counts as a predicate combined with OR. If there are two IN() lists, the number of predicates combined with OR is the product of the number of literal values in each list. Thus, the number of predicates combined with OR in the preceding case is M × N.
来源:https://www.cnblogs.com/xuliuzai/p/18190242
免责声明:由于采集信息均来自互联网,如果侵犯了您的权益,请联系我们【E-Mail:cb@itdo.tech】 我们会及时删除侵权内容,谢谢合作! |
|