|
9 Comparison of B-Tree and Hash Indexes
Understanding the B-tree and hash data structures can help predict【prɪˈdɪkt 预测;预报;预言;预告;】 how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes.
9.1 B-Tree Index Characteristics
A B-tree index can be used for column comparisons【kəmˈpɛrəsənz 比较;对比;相比;】 in expressions that use the =, >, >=, SELECT INDEX_NAME, IS_VISIBLE FROM INFORMATION_SCHEMA.STATISTICS WHERE TABLE_SCHEMA = 'db1' AND TABLE_NAME = 't1';+------------+------------+| INDEX_NAME | IS_VISIBLE |+------------+------------+| i_idx | YES | | j_idx | NO || k_idx | NO |+------------+------------+[/code]Invisible indexes make it possible to test the effect of removing an index on query performance, without making a destructive change that must be undone should the index turn out to be required. Dropping and re-adding an index can be expensive for a large table, whereas making it invisible and visible are fast, inplace operations.
If an index made invisible actually is needed or used by the optimizer, there are several ways to notice the effect of its absence on queries for the table:
• Errors occur for queries that include index hints that refer to the invisible index.
• Performance Schema data shows an increase in workload for affected queries.
• Queries have different EXPLAIN execution plans.
• Queries appear in the slow query log that did not appear there previously
The use_invisible_indexes flag of the optimizer_switch system variable controls whether the optimizer uses invisible indexes for query execution plan construction. If the flag is off (the default), the optimizer ignores invisible indexes (the same behavior as prior to the introduction of this flag). If the flag is on, invisible indexes remain invisible but the optimizer takes them into account for execution plan construction.
Using the SET_VAR optimizer hint to update the value of optimizer_switch temporarily, you can enable invisible indexes for the duration of a single query only, like this:- SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
- SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';
复制代码 Index visibility does not affect index maintenance. For example, an index continues to be updated per changes to table rows, and a unique index prevents insertion of duplicates into a column, regardless of whether the index is visible or invisible.
A table with no explicit primary key may still have an effective implicit primary key if it has any UNIQUE indexes on NOT NULL columns. In this case, the first such index places the same constraint on table rows as an explicit primary key and that index cannot be made invisible. Consider the following table definition:- SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
- SELECT * FROM tbl_name WHERE key_col LIKE other_col;
复制代码 The definition includes no explicit primary key, but the index on NOT NULL column j places the same constraint on rows as a primary key and cannot be made invisible:- ... WHERE index_part1=1 AND index_part2=2 AND other_column=3
- /* index = 1 OR index = 2 */
- ... WHERE index=1 OR A=10 AND index=2
- /* optimized like "index_part1='hello'" */
- ... WHERE index_part1='hello' AND index_part3=5
- /* Can use index on index1 but not on index2 or index3 */
- ... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
复制代码 Now suppose that an explicit primary key is added to the table:- /* index_part1 is not used */
- ... WHERE index_part2=1 AND index_part3=2
- /* Index is not used in both parts of the WHERE clause */
- ... WHERE index=1 OR A=10
- /* No index spans all rows */
- ... WHERE index_part1=1 OR index_part2=10
复制代码 The explicit primary key cannot be made invisible. In addition, the unique index on j no longer acts as an implicit primary key and as a result can be made invisible:- CREATE TABLE t1 (
- i1 INT NOT NULL DEFAULT 0,
- i2 INT NOT NULL DEFAULT 0,
- d DATE DEFAULT NULL,
- PRIMARY KEY (i1, i2),
- INDEX k_d (d)
- ) ENGINE = InnoDB;
复制代码 13 Descending Indexes
MySQL supports descending【dɪˈsendɪŋ (次序)下降的,递减的;】 indexes: DESC in an index definition is no longer ignored but causes storage of key values in descending order. Previously, indexes could be scanned in reverse order but at a performance penalty【ˈpenəlti 处罚;惩罚;刑罚;点球;(对犯规者的)判罚;不利;害处;】. A descending index can be scanned in forward order, which is more efficient. Descending indexes also make it possible for the optimizer to use multiple-column indexes when the most efficient scan order mixes ascending order for some columns and descending order for others.
Consider the following table definition, which contains two columns and four two-column index definitions for the various combinations of ascending and descending indexes on the columns:- INSERT INTO t1 VALUES
- (1, 1, '1998-01-01'), (1, 2, '1999-01-01'),
- (1, 3, '2000-01-01'), (1, 4, '2001-01-01'),
- (1, 5, '2002-01-01'), (2, 1, '1998-01-01'),
- (2, 2, '1999-01-01'), (2, 3, '2000-01-01'),
- (2, 4, '2001-01-01'), (2, 5, '2002-01-01'),
- (3, 1, '1998-01-01'), (3, 2, '1999-01-01'),
- (3, 3, '2000-01-01'), (3, 4, '2001-01-01'),
- (3, 5, '2002-01-01'), (4, 1, '1998-01-01'),
- (4, 2, '1999-01-01'), (4, 3, '2000-01-01'),
- (4, 4, '2001-01-01'), (4, 5, '2002-01-01'),
- (5, 1, '1998-01-01'), (5, 2, '1999-01-01'),
- (5, 3, '2000-01-01'), (5, 4, '2001-01-01'),
- (5, 5, '2002-01-01');
复制代码 The table definition results in four distinct indexes. The optimizer can perform a forward index scan for each of the ORDER BY clauses and need not use a filesort operation:- EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'
复制代码 Use of descending indexes is subject to【be subject to 受…影响;受…支配;降伏;给; 】 these conditions:
• Descending indexes are supported only for the InnoDB storage engine, with these limitations:
- Change buffering is not supported for a secondary index if the index contains a descending index key column or if the primary key includes a descending index column.
- The InnoDB SQL parser does not use descending indexes. For InnoDB full-text search, this means that the index required on the FTS_DOC_ID column of the indexed table cannot be defined as a descending index.
• Descending indexes are supported for all data types for which ascending indexes are available.
• Descending indexes are supported for ordinary (nongenerated) and generated columns (both VIRTUAL and STORED).
• DISTINCT can use any index containing matching columns, including descending key parts.
• Indexes that have descending key parts are not used for MIN()/MAX() optimization of queries that invoke aggregate functions but do not have a GROUP BY clause.
• Descending indexes are supported for BTREE but not HASH indexes. Descending indexes are not supported for FULLTEXT or SPATIAL indexes.
Explicitly【ɪkˈsplɪsətli 明确地;明白地;】 specified ASC and DESC designators for HASH, FULLTEXT, and SPATIAL indexes results in an error.
You can see in the Extra column of the output of EXPLAIN that the optimizer is able to use a descending index, as shown here:- mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'\G
- *************************** 1. row ***************************
- id: 1
- select_type: SIMPLE
- table: t1
- type: ref
- possible_keys: PRIMARY,k_d
- key: k_d
- key_len: 4
- ref: const
- rows: 5
- Extra: Using where; Using index
复制代码 In EXPLAIN FORMAT=TREE output, use of a descending index is indicated by the addition of (reverse) following the name of the index, like this:- mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'\G
- *************************** 1. row ***************************
- id: 1
- select_type: SIMPLE
- table: t1
- type: ref
- possible_keys: PRIMARY,k_d
- key: k_d
- key_len: 8
- ref: const,const
- rows: 1
- Extra: Using index
复制代码 14 Indexed Lookups from TIMESTAMP Columns
Temporal【ˈtempərəl 时间的;颞的;世俗的;太阳穴的;现世的;世间的;】 values are stored in TIMESTAMP columns as UTC values, and values inserted into and retrieved from TIMESTAMP columns are converted between the session time zone and UTC. (This is the same type of conversion performed by the CONVERT_TZ() function. If the session time zone is UTC, there is effectively no time zone conversion.)
Due to conventions for local time zone changes such as Daylight Saving Time (DST), conversions between UTC and non-UTC time zones are not one-to-one in both directions【dəˈrɛkʃənz 方向;方面;趋势;方位;动向;】. UTC values that are distinct【dɪˈstɪŋkt 不同的;明显的;清晰的;清楚的;明白的;确切的;有区别的;不同种类的;确定无疑的;】 may not be distinct in another time zone. The following example shows distinct UTC values that become identical【aɪˈdentɪkl 完全相同的;相同的;同一的;完全同样的;】 in a non-UTC time zone:- FLUSH TABLE t1;
- FLUSH STATUS;
- SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
- SHOW STATUS LIKE 'handler_read%'
复制代码 【To use named time zones such as 'MET' or 'Europe/Amsterdam', the time zone tables must be properly set up.】
You can see that the two distinct UTC values are the same when converted to the 'MET' time zone. This phenomenon【fəˈnɑːmɪnən 现象;非凡的人(或事物);杰出的人;】 can lead to different results for a given TIMESTAMP column query, depending on whether the optimizer uses an index to execute the query.
Suppose that a query selects values from the table shown earlier using a WHERE clause to search the ts column for a single specific value such as a user-provided timestamp literal【ˈlɪtərəl 字面意义的;缺乏想象力的;完全按原文的;】:- +-----------------------+-------+
- | Variable_name | Value |
- +-----------------------+-------+
- | Handler_read_first | 0 |
- | Handler_read_key | 1 |
- | Handler_read_last | 0 |
- | Handler_read_next | 5 |
- | Handler_read_prev | 0 |
- | Handler_read_rnd | 0 |
- | Handler_read_rnd_next | 0 |
- +-----------------------+-------+
复制代码 Suppose further that the query executes under these conditions:
• The session time zone is not UTC and has a DST shift. For example:- +-----------------------+-------+
- | Variable_name | Value |
- +-----------------------+-------+
- | Handler_read_first | 0 |
- | Handler_read_key | 1 |
- | Handler_read_last | 0 |
- | Handler_read_next | 1 |
- | Handler_read_prev | 0 |
- | Handler_read_rnd | 0 |
- | Handler_read_rnd_next | 0 |
- +-----------------------+-------+
复制代码 • Unique UTC values stored in the TIMESTAMP column are not unique in the session time zone due to DST shifts. (The example shown earlier illustrates how this can occur.)
• The query specifies a search value that is within the hour of entry into DST in the session time zone.
Under those conditions, the comparison in the WHERE clause occurs in different ways for nonindexed and indexed lookups and leads to different results:
• If there is no index or the optimizer cannot use it, comparisons occur in the session time zone. The optimizer performs a table scan in which it retrieves each ts column value, converts it from UTC to the session time zone, and compares it to the search value (also interpreted in the session time zone):- SET optimizer_switch = 'use_index_extensions=off';
复制代码 Because the stored ts values are converted to the session time zone, it is possible for the query to return two timestamp values that are distinct as UTC values but equal in the session time zone: One value that occurs before the DST shift when clocks are changed, and one value that was occurs after the DST shift.
• If there is a usable index, comparisons occur in UTC. The optimizer performs an index scan, first converting the search value from the session time zone to UTC, then comparing the result to the UTC index entries:- CREATE TABLE t1 (f1 INT, gc INT AS (f1 + 1) STORED, INDEX (gc))
复制代码 In this case, the (converted) search value is matched only to index entries, and because the index entries for the distinct stored UTC values are also distinct, the search value can match only one of them.
Due to different optimizer operation for nonindexed and indexed lookups, the query produces different results in each case. The result from the nonindexed lookup returns all values that match in the session time zone. The indexed lookup cannot do so:
• It is performed within the storage engine, which knows only about UTC values.
• For the two distinct session time zone values that map to the same UTC value, the indexed lookup matches only the corresponding UTC index entry and returns only a single row.
In the preceding discussion, the data set stored in tstable happens to consist of distinct UTC values. In such cases, all index-using queries of the form shown match at most one index entry.
If the index is not UNIQUE, it is possible for the table (and the index) to store multiple instances of a given UTC value. For example, the ts column might contain multiple instances of the UTC value '2018-10-28 00:30:00'. In this case, the index-using query would return each of them (converted to the MET value '2018-10-28 02:30:00' in the result set). It remains true that index-using queries match the converted search value to a single value in the UTC index entries, rather than matching multiple UTC values that convert to the search value in the session time zone.
If it is important to return all ts values that match in the session time zone, the workaround is to suppress use of the index with an IGNORE INDEX hint:- SELECT * FROM t1 WHERE gc > 9;
复制代码 The same lack of one-to-one mapping for time zone conversions【kənˈvɜrʒənz 转换;转化;转变;】 in both directions occurs in other contexts as well, such as conversions performed with the FROM_UNIXTIME() and UNIX_TIMESTAMP() functions.
来源:https://www.cnblogs.com/xuliuzai/p/18205146
免责声明:由于采集信息均来自互联网,如果侵犯了您的权益,请联系我们【E-Mail:cb@itdo.tech】 我们会及时删除侵权内容,谢谢合作! |
|