While dealing with slow MySQL queries my go-to approach is to quickly run an an
EXPLAIN on the query and check if the indices I expect to be used are actually being used. A quick check is to look at the “Extra” column of the explain output to see if it contains the term “Using index”
However, I found out that “Using index” does not mean what I thought it meant.
I picked a simple N:M relationship table of orders and items as an example.
CREATE TABLE order_items ( order_id int(11) NOT NULL, item_id int(11) NOT NULL, PRIMARY KEY (order_id, item_id) ) ENGINE=InnoDB;
It must support two lookup operations:
- Get items for a given order
- Get orders that an item appears on
Let’s focus on the latter:
EXPLAIN SELECT order_id FROM order_items WHERE item_id = 1;
|1||SIMPLE||order_items||index||NULL||PRIMARY||8||NULL||1021772||Using where; Using index|
Using_index just means that the data is coming from the index, but it doesn’t mean that the lookup is efficient. The more important bit is
possible_keys which is NULL in this case.
Let’s test it with a simple benchmark on MariaDB 10.2.32:
We fill the table with data:
INSERT IGNORE INTO order_items ( SELECT FLOOR(RAND() * 100000) AS order_id, FLOOR(RAND() * 1000000) as item_id FROM seq_1_to_1000000 ORDER BY order_id, item_id );
This is quck way to populate the table, although the query is quite peculiar:
- seq_1_to_1000000 is a virtual table from the SEQUENCE engine that generates a sequence of integers from 1 to 10M
- The random data is ordered before inserting so it matches the primary index order – this makes
INSERT IGNOREis used, because the combinations of random numbers are not guaranteed to be unique
The results is quite realistic – we get around 10 items per order on average:
SELECT MAX(cnt), MIN(cnt), AVG(cnt) FROM ( SELECT order_id, COUNT(item_id) as cnt FROM order_items GROUP BY order_id ) t;
And now for the actual benchmark. To avoid venturing out of SQL and to present a more realistic usage, the lookup is done by JOIN-in from a table of 100 random items:
SELECT lookup.item_id AS lookup_id, order_items.order_id FROM ( SELECT FLOOR(RAND() * 1000000) AS item_id FROM seq_1_to_100 ) lookup LEFT JOIN order_items ON order_items.item_id = lookup.item_id;
Runtime after 4 repeated runs: 14s 1 ms
That’s an awful lot for 100 items!
Using join buffer
(flat, BNL join)
Using index is still there, but possible_keys is NULL (as expected).
Let’s create another index just on
CREATE INDEX order_items_item_id_index ON order_items (item_id);
the explain output now shows that it’s being used:
|1||PRIMARY||order_items||ref||order_items_item_id_index||order_items_item_id_index||4||lookup.item_id||1||Using where; Using index|
The runtime drops to 51 ms.
It’s clear that seeing “using index” in the explain output does not guarantee good performance.
Using index in the explain output just means that the data is fetched from the index structure. It does not mean that the lookup itself is efficient. I’d advise looking at “possible keys”, “key”, “rows” and checking whether the lookup columns are a prefix of any of the available indices. Where conditions that don’t include the first column(s) of an index will not be efficient.