TransWikia.com

MariaDB uses "wrong"/irrelevant Keys on large table

Database Administrators Asked by J-F on November 19, 2021

We have migrated our database from a local hosting company to AWS.
Now when we run a particular query on AWS it finishes in about a minute (processlist says “sending data”) where on the old server the query ran for only 0.5 seconds. Server specs are about the same.

EXPLAIN on both servers says that the queries use different indexes.
Old server where the query runs fast, MariaDB 5.5.56:

EXPLAIN SELECT COUNT(*) AS layer_impressions, DATE_FORMAT(i.created, "%H") AS the_date FROM layer_impression i INNER JOIN layer l ON l.id = i.layer_id WHERE i.created BETWEEN '2018-11-15' AND '2018-11-15 23:59:59' GROUP BY the_date;
+------+-------------+-------+--------+------------------+---------+---------+--------------------+--------+--------------------------------------------------------+
| id   | select_type | table | type   | possible_keys    | key     | key_len | ref                | rows   | Extra                                                  |
+------+-------------+-------+--------+------------------+---------+---------+--------------------+--------+--------------------------------------------------------+
|    1 | SIMPLE      | i     | range  | layer_id,created | created | 8       | NULL               | 249116 | Using index condition; Using temporary; Using filesort |
|    1 | SIMPLE      | l     | eq_ref | PRIMARY          | PRIMARY | 4       | getback.i.layer_id |      1 | Using index                                            |
+------+-------------+-------+--------+------------------+---------+---------+--------------------+--------+--------------------------------------------------------+

New server (AWS where the query runs slow), MariaDB 10.3.8:

EXPLAIN SELECT COUNT(*) AS layer_impressions, DATE_FORMAT(i.created, "%H") AS the_date FROM layer_impression i INNER JOIN layer l ON l.id = i.layer_id WHERE i.created BETWEEN '2018-11-15' AND '2018-11-15 23:59:59' GROUP BY the_date;
+------+-------------+-------+-------+------------------+------------+---------+--------------+------+----------------------------------------------+
| id   | select_type | table | type  | possible_keys    | key        | key_len | ref          | rows | Extra                                        |
+------+-------------+-------+-------+------------------+------------+---------+--------------+------+----------------------------------------------+
|    1 | SIMPLE      | l     | index | PRIMARY          | voucher_id | 4       | NULL         |  652 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | i     | ref   | layer_id,created | layer_id   | 4       | getback.l.id |  545 | Using where                                  |
+------+-------------+-------+-------+------------------+------------+---------+--------------+------+----------------------------------------------+

What surprises me is that “possible_keys” lists PRIMARY but uses “voucher_id”, a column which has nothing to do with our query.

If we add USE INDEX or make a STRAIGHT_JOIN the query runs fast and the correct indexes are being used:

EXPLAIN SELECT COUNT(*) AS layer_impressions, DATE_FORMAT(i.created, "%H") AS the_date FROM layer_impression i STRAIGHT_JOIN layer l ON l.id = i.layer_id WHERE i.created BETWEEN '2018-11-15' AND '2018-11-15 23:59:59' GROUP BY the_date;
+------+-------------+-------+--------+------------------+---------+---------+--------------------+--------+--------------------------------------------------------+
| id   | select_type | table | type   | possible_keys    | key     | key_len | ref                | rows   | Extra                                                  |
+------+-------------+-------+--------+------------------+---------+---------+--------------------+--------+--------------------------------------------------------+
|    1 | SIMPLE      | i     | range  | layer_id,created | created | 5       | NULL               | 272230 | Using index condition; Using temporary; Using filesort |
|    1 | SIMPLE      | l     | eq_ref | PRIMARY          | PRIMARY | 4       | getback.i.layer_id |      1 | Using index                                            |
+------+-------------+-------+--------+------------------+---------+---------+--------------------+--------+--------------------------------------------------------+

What behaviour is that and how can we prevent this without modifying all our queries throughout our application?

It’s more a general question about how this can happen and less about this particular query. We tried so far: Switching it off and on again, OPTIMIZE and ANALYZE, FLUSH, setting extended_keys=off (in the optimizer_switch variable).

Schema (omitted some additional columns):

CREATE TABLE IF NOT EXISTS `layer` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`shop_id` int(10) unsigned NOT NULL,
`voucher_id` int(10) unsigned NOT NULL,
`templ_id` int(10) unsigned NOT NULL COMMENT 'Mailtemplate ID',
`doi_templ_id` int(10) unsigned NOT NULL,
`key` char(5) CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,
`status` enum('active','inactive','deleted') NOT NULL DEFAULT 'active',
`created` datetime NOT NULL,
`createdby` int(10) unsigned NOT NULL,
`modified` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
PRIMARY KEY (`id`),
KEY `shop_id` (`shop_id`,`status`) USING BTREE,
KEY `key` (`key`),
KEY `voucher_id` (`voucher_id`),
KEY `templ_id` (`templ_id`),
KEY `doi_templ_id` (`doi_templ_id`)
) ENGINE=InnoDB AUTO_INCREMENT=656 DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `layer_impression` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`layer_id` int(10) unsigned NOT NULL,
`session_id` bigint(20) unsigned NOT NULL,
`device` enum('desktop','mobile','tablet') NOT NULL DEFAULT 'desktop',
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `layer_id` (`layer_id`),
KEY `session_id` (`session_id`),
KEY `created` (`created`,`device`) USING BTREE,
CONSTRAINT `layer_impression_ibfk_1` FOREIGN KEY (`layer_id`) REFERENCES `layer` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=34184266 DEFAULT CHARSET=utf8;

layer has 652 rows, layer_impression has ~34M rows

The cardinality on the layer_id index is also quite different:

show index from layer_impression;
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table            | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| layer_impression |          0 | PRIMARY    |            1 | id          | A         |    34062000 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | layer_id   |            1 | layer_id    | A         |       62499 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | session_id |            1 | session_id  | A         |    34062000 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | created    |            1 | created     | A         |    34062000 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | created    |            2 | device      | A         |    34062000 |     NULL | NULL   |      | BTREE      |         |               |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

On AWS (above) it’s 62499 while on our old server it’s 18:

show index from layer_impression;
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table            | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| layer_impression |          0 | PRIMARY    |            1 | id          | A         |    34295989 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | layer_id   |            1 | layer_id    | A         |          18 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | session_id |            1 | session_id  | A         |    34295989 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | created    |            1 | created     | A         |    34295989 |     NULL | NULL   |      | BTREE      |         |               |
| layer_impression |          1 | created    |            2 | device      | A         |    34295989 |     NULL | NULL   |      | BTREE      |         |               |
+------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Thanks for any advice!

EDIT:

Most of the time we have additional WHERE clauses, eg.

WHERE l.shop_id = 1 # or 
WHERE l.shop_id IN (1,2,3)

I left this part out for simplicity because I was more interested in why one server would handle it completely different than the other.

Here’s a typical query which uses the layer table.
On AWS it runs for ~18sec, on the old server for ~2sec.

# AWS
EXPLAIN SELECT COUNT(*) AS layer_impressions, DATE_FORMAT(i.created, "%H") AS the_date FROM layer_impression i LEFT JOIN layer l ON l.id = i.layer_id WHERE i.created BETWEEN '2018-11-10' AND '2018-11-15 23:59:59' AND l.shop_id IN (42,36,103,63) GROUP BY the_date;
+------+-------------+-------+-------+------------------+----------+---------+--------------+------+-----------------------------------------------------------+
| id   | select_type | table | type  | possible_keys    | key      | key_len | ref          | rows | Extra                                                     |
+------+-------------+-------+-------+------------------+----------+---------+--------------+------+-----------------------------------------------------------+
|    1 | SIMPLE      | l     | range | PRIMARY,shop_id  | shop_id  | 4       | NULL         |   11 | Using where; Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | i     | ref   | layer_id,created | layer_id | 4       | getback.l.id |  545 | Using where                                               |
+------+-------------+-------+-------+------------------+----------+---------+--------------+------+-----------------------------------------------------------+


# Old server
EXPLAIN SELECT COUNT(*) AS layer_impressions, DATE_FORMAT(i.created, "%H") AS the_date FROM layer_impression i LEFT JOIN layer l ON l.id = i.layer_id WHERE i.created BETWEEN '2018-11-10' AND '2018-11-15 23:59:59' AND l.shop_id IN (42,36,103,63) GROUP BY the_date;
+------+-------------+-------+--------+------------------+---------+---------+--------------------+---------+--------------------------------------------------------+
| id   | select_type | table | type   | possible_keys    | key     | key_len | ref                | rows    | Extra                                                  |
+------+-------------+-------+--------+------------------+---------+---------+--------------------+---------+--------------------------------------------------------+
|    1 | SIMPLE      | i     | range  | layer_id,created | created | 8       | NULL               | 1615766 | Using index condition; Using temporary; Using filesort |
|    1 | SIMPLE      | l     | eq_ref | PRIMARY,shop_id  | PRIMARY | 4       | getback.i.layer_id |       1 | Using where                                            |
+------+-------------+-------+--------+------------------+---------+---------+--------------------+---------+--------------------------------------------------------+

One Answer

I have not located a reason for the change over time. May I suggest you file a bug report with MariaDB as a "regressions".

Meanwhile, performance (perhaps in all servers) might be helped by these two indexes:

layer_impression: INDEX(created, layer_id)
layer_impression: INDEX(layer_id, created)

The first index should help the first EXPLAIN by avoiding extra effort to get layer_id. The second index should help the second EXPLAIN by being able to use a range over created after going to the layer_id.

The only reason for mentioning the table layer seems to be to validate that a row exists there for each layer_id. If you don't need that check, remove it.

Answered by Rick James on November 19, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP