20120428

MySQL 四種 index: key, unique, primary, fulltext

MySQL supports four types of indexes:


A nonunique index is one in which any key value may occur multiple times. This type of index is defined with the keyword INDEX or KEY.

A UNIQUE index is unique-valued; that is, every key value is required to be different than all others. (The exception is that NULL values may occur multiple times.)

A PRIMARY KEY also is a unique-valued index. It is similar to a UNIQUE index, but has additional restrictions:

  • A table may have multiple UNIQUE indexes, but at most one PRIMARY KEY.
  • A UNIQUE index can contain NULL values, whereas a PRIMARY KEY cannot.
  • A FULLTEXT index is specially designed for text searching.

To define indexes when you're initially creating a table, use CREATE TABLE. To add indexes to an already existing table, use ALTER TABLE or CREATE INDEX. To drop indexes, use ALTER TABLE or DROP INDEX.

ALTER TABLE can add or drop several indexes in the same statement, which is faster than processing each one separately. CREATE INDEX and DROP INDEX allow only one index to be added or dropped at a time.

設計 index 時注意


Declare an indexed column NOT NULL if possible. Although NULL values can be indexed, NULL is a special value that requires additional decisions when performing comparisons on key values. An index without NULL can be processed more simply and thus faster.

Avoid overindexing; don't index a column just because you can. If you never refer to a column in comparisons (such as in WHERE, ORDER BY, or GROUP BY clauses), there's no need to index it. Another reason to avoid unnecessary indexing is that every index you create slows down table updates. If you insert or delete a row, an entry must be added to or removed from each of the table's indexes. If you update a row, any change to indexed columns require the appropriate indexes to be updated as well.

One strategy the MySQL optimizer uses is that if it appears an index will return a large percentage of the records in the table, it will be just as fast to scan the table as to incur the overhead required to process the index. As a consequence, an index on a column that has very few distinct values is unlikely to do much good. Suppose that a column is declared as ENUM('Y','N') and the values are roughly evenly distributed such that a search for either value returns about half of the records. In this case, an index on the column is unlikely to result in faster queries.

Choose unique and nonunique indexes appropriately. The choice might be influenced by the type of a column. If the column is declared as an ENUM, there is a fixed number of distinct column values that can be stored in it. This number is equal to the number of enumeration elements, plus one for the '' (empty string) element that is used when you attempt to store an illegal value. Should you choose to index an ENUM column, you likely should create a nonunique index. A PRIMARY KEY allows only as many rows as the number of distinct enumeration values. A UNIQUE index enforces a similar restriction, except that unless the column is declared NOT NULL, the index allows NULL values.

Index a column prefix rather than the entire column. MySQL caches index blocks in memory to avoid whenever possible reading them from disk repeatedly. Shortening the length of indexed values can improve performance by reducing the amount of disk I/O needed to read the index and by increasing the number of key values that fit into the key cache. This technique is discussed further in section 13.1.3.1, "Indexing Column Prefixes."

Avoid creating multiple indexes that overlap (have the same initial columns). This is wasteful because MySQL can use a multiple-column index even when a query uses just the initial columns for lookups.

decide prefix


It might require some testing to find the optimal balance between long index values that provide good uniqueness versus shorter values that compare more quickly but have more duplicates. To determine the number of records in the table, the number of distinct values in the column, and the number of duplicates, use this query:

SELECT
COUNT(*) AS 'Total Rows',
COUNT(DISTINCT name) AS 'Distinct Values',
COUNT(*) - COUNT(DISTINCT name) AS 'Duplicate Values'
FROM t;

That gives you an estimate of the amount of uniqueness in the name values. Then run a similar query on the prefix values:

SELECT
COUNT(DISTINCT LEFT(name,n)) AS 'Distinct Prefix Values',
COUNT(*) - COUNT(DISTINCT LEFT(name,n)) AS 'Duplicate Prefix Values'
FROM t;

That tells you how the uniqueness characteristics change when you use an n-byte prefix of the name values. Run the query with different values of n to determine an acceptable prefix length.

Note that when an index on a full column is a PRIMARY KEY or UNIQUE index, you might have to change the index to be nonunique if you decide to index prefix values instead. If you index partial column values and require the index to be unique, that means the prefix values must be unique, too.

Note that a leftmost prefix of an index and an index on a column prefix are two different things. A leftmost prefix of an index consists of leading columns in a multiple-column index. An index on a column prefix indexes the leading bytes of values in the column.

沒有留言:

張貼留言