"Using index" doesn't necessarily mean fast lookup in MySQL

While dealing with slow MySQL queries my go-to approach is to quickly run an an EXPLAIN on the query and check if the indices I expect to be used are actually being used. A quick check is to look at the “Extra” column of the explain output to see if it contains the term “Using index”

However, I found out that “Using index” does not mean what I thought it meant.

I picked a simple N:M relationship table of orders and items as an example.

CREATE TABLE order_items (  
  order_id int(11) NOT NULL,  
  item_id int(11) NOT NULL,  
  PRIMARY KEY (order_id, item_id)  
) ENGINE=InnoDB;

It must support two lookup operations:

Get items for a given order
Get orders that an item appears on

Let’s focus on the latter:

EXPLAIN SELECT order_id FROM order_items WHERE item_id = 1;

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	order_items	index	NULL	PRIMARY	8	NULL	1021772	Using where; Using index

Using_index just means that the data is coming from the index, but it doesn’t mean that the lookup is efficient. The more important bit is possible_keys which is NULL in this case.

Let’s test it with a simple benchmark on MariaDB 10.2.32:

We fill the table with data:

INSERT IGNORE INTO order_items (
    SELECT FLOOR(RAND() * 100000) AS order_id, FLOOR(RAND() * 1000000) as item_id
    FROM seq_1_to_1000000
    ORDER BY order_id, item_id
);

This is quck way to populate the table, although the query is quite peculiar:

seq_1_to_1000000 is a virtual table from the SEQUENCE engine that generates a sequence of integers from 1 to 10M
The random data is ordered before inserting so it matches the primary index order – this makes
INSERT IGNORE is used, because the combinations of random numbers are not guaranteed to be unique

The results is quite realistic – we get around 10 items per order on average:

SELECT MAX(cnt), MIN(cnt), AVG(cnt) FROM (
    SELECT order_id, COUNT(item_id) as cnt FROM order_items GROUP BY order_id
) t;

MAX(cnt)	MIN(cnt)	AVG(cnt)
26	1	10.0005

And now for the actual benchmark. To avoid venturing out of SQL and to present a more realistic usage, the lookup is done by JOIN-in from a table of 100 random items:

SELECT lookup.item_id AS lookup_id, order_items.order_id  
FROM (
	SELECT FLOOR(RAND() * 1000000) AS item_id 
	FROM seq_1_to_100
) lookup  
LEFT JOIN order_items ON order_items.item_id = lookup.item_id;

Runtime after 4 repeated runs: 14s 1 ms
That’s an awful lot for 100 items!

Explain:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	100
1	PRIMARY	order_items	index	NULL	PRIMARY	8	NULL	1021772	Using where; Using index; Using join buffer (flat, BNL join)
2	DERIVED	seq_1_to_100	index	NULL	PRIMARY	8	NULL	100	Using index

Using index is still there, but possible_keys is NULL (as expected).

Let’s create another index just on item_id:

CREATE INDEX order_items_item_id_index ON order_items (item_id);

the explain output now shows that it’s being used:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	100
1	PRIMARY	order_items	ref	order_items_item_id_index	order_items_item_id_index	4	lookup.item_id	1	Using where; Using index
2	DERIVED	seq_1_to_100	index	NULL	PRIMARY	8	NULL	100	Using index

The runtime drops to 51 ms.

It’s clear that seeing “using index” in the explain output does not guarantee good performance.

Using index in the explain output just means that the data is fetched from the index structure. It does not mean that the lookup itself is efficient. I’d advise looking at “possible keys”, “key”, “rows” and checking whether the lookup columns are a prefix of any of the available indices. Where conditions that don’t include the first column(s) of an index will not be efficient.

“Using index” doesn’t necessarily mean fast lookup in MySQL

Add comment

Cancel reply

Pages