Too Many Afterthoughts The greatest ideas are often far too late

“Using index” doesn’t necessarily mean fast lookup in MySQL

While dealing with slow MySQL queries my go-to approach is to quickly run an an EXPLAIN on the query and check if the indices I expect to be used are actually being used. A quick check is to look at the “Extra” column of the explain output to see if it contains the term “Using index”

However, I found out that “Using index” does not mean what I thought it meant.

I picked a simple N:M relationship table of orders and items as an example.

CREATE TABLE order_items (  
  order_id int(11) NOT NULL,  
  item_id int(11) NOT NULL,  
  PRIMARY KEY (order_id, item_id)  
) ENGINE=InnoDB;

It must support two lookup operations:

  1. Get items for a given order
  2. Get orders that an item appears on

Let’s focus on the latter:

EXPLAIN SELECT order_id FROM order_items WHERE item_id = 1;
idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1SIMPLEorder_itemsindexNULLPRIMARY8NULL1021772Using where; Using index

Using_index just means that the data is coming from the index, but it doesn’t mean that the lookup is efficient. The more important bit is possible_keys which is NULL in this case.

Let’s test it with a simple benchmark on MariaDB 10.2.32:

We fill the table with data:

INSERT IGNORE INTO order_items (
    SELECT FLOOR(RAND() * 100000) AS order_id, FLOOR(RAND() * 1000000) as item_id
    FROM seq_1_to_1000000
    ORDER BY order_id, item_id
);

This is quck way to populate the table, although the query is quite peculiar:

  1. seq_1_to_1000000 is a virtual table from the SEQUENCE engine that generates a sequence of integers from 1 to 10M
  2. The random data is ordered before inserting so it matches the primary index order – this makes
  3. INSERT IGNORE is used, because the combinations of random numbers are not guaranteed to be unique

The results is quite realistic – we get around 10 items per order on average:

SELECT MAX(cnt), MIN(cnt), AVG(cnt) FROM (
    SELECT order_id, COUNT(item_id) as cnt FROM order_items GROUP BY order_id
) t;
MAX(cnt)MIN(cnt)AVG(cnt)
26110.0005

And now for the actual benchmark. To avoid venturing out of SQL and to present a more realistic usage, the lookup is done by JOIN-in from a table of 100 random items:

SELECT lookup.item_id AS lookup_id, order_items.order_id  
FROM (
	SELECT FLOOR(RAND() * 1000000) AS item_id 
	FROM seq_1_to_100
) lookup  
LEFT JOIN order_items ON order_items.item_id = lookup.item_id;

Runtime after 4 repeated runs: 14s 1 ms
That’s an awful lot for 100 items!

Explain:

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1PRIMARY<derived2>ALLNULLNULLNULLNULL100
1PRIMARYorder_itemsindexNULLPRIMARY8NULL1021772Using where;
Using index;
Using join buffer
(flat, BNL join)
2DERIVEDseq_1_to_100indexNULLPRIMARY8NULL100Using index

Using index is still there, but possible_keys is NULL (as expected).

Let’s create another index just on item_id:

CREATE INDEX order_items_item_id_index ON order_items (item_id);

the explain output now shows that it’s being used:

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1PRIMARY<derived2>ALLNULLNULLNULLNULL100
1PRIMARYorder_itemsreforder_items_item_id_indexorder_items_item_id_index4lookup.item_id1Using where; Using index
2DERIVEDseq_1_to_100indexNULLPRIMARY8NULL100Using index

The runtime drops to 51 ms.

It’s clear that seeing “using index” in the explain output does not guarantee good performance.

Using index in the explain output just means that the data is fetched from the index structure. It does not mean that the lookup itself is efficient. I’d advise looking at “possible keys”, “key”, “rows” and checking whether the lookup columns are a prefix of any of the available indices. Where conditions that don’t include the first column(s) of an index will not be efficient.

Add comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Too Many Afterthoughts The greatest ideas are often far too late

Pages