TransWikia.com

Searching large dataset, best way to speed this up

Database Administrators Asked on December 31, 2021

i’m querying a table full of vehicles, however each one of these vehicles MUST have another row within another table. This query is currently taking way too long!

Do I need to index the table, or is there another way I can do the query to improve this?

SELECT *
FROM vehicles v
WHERE v.group = 0
AND EXISTS (
    SELECT count(id) FROM quotes quote
    WHERE quote.vehicle_id = v.id
    AND quote.term = 60 
    AND quote.annual_mileage = 8000
    AND quote.deposit = 250
    AND quote.credit_rating = "Excellent"
) IS NOT NULL

vehicles table:

CREATE TABLE `vehicles` (
    `id` INT(11) NOT NULL DEFAULT '0',
    `group_id` INT(11) NOT NULL DEFAULT '0',
    `title` VARCHAR(8) NOT NULL DEFAULT '' COLLATE 'latin1_swedish_ci'
    PRIMARY KEY (`id`) USING BTREE
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;

quotes table:

CREATE TABLE `quotes` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `vehicle_id` INT(11) NOT NULL,
    `credit_rating` VARCHAR(20) NOT NULL COLLATE 'latin1_swedish_ci',
    `deposit` DECIMAL(7,2) NOT NULL,
    `term` TINYINT(2) NOT NULL,
    `annual_mileage` INT(11) NOT NULL
    PRIMARY KEY (`id`) USING BTREE,
    INDEX `vehicle_id` (`vehicle_id`) USING BTREE,
    INDEX `covering_ndx1` (`term`, `annual_mileage`, `deposit`, `credit_rating`, `vehicle_id`) USING BTREE,
    CONSTRAINT `FK_quotes_vehicles` FOREIGN KEY (`vehicle_id`) REFERENCES `database`.`vehicles` (`id`) ON UPDATE RESTRICT ON DELETE RESTRICT
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=1;

Edit:

Updated the create SQL above to reflex the changes I have made. This reduces my query from 0.516 to 0.032

4 Answers

I'd do a JOIN instead of the the inner query. That is still costly, but it's done once for all entries instead of once per entry.

Only problem would be that you might get duplicates IF multiple rows in quotes match the id of an entry in vehicles. In this case you can do the join to select the ids and then just join against that again (assuming the id is a PK in vehicles) to get the actual entries.

Answered by Frank Hopkins on December 31, 2021

The other answers have almost gotten there.

SELECT  *
    FROM  vehicles v
    WHERE  v.group_id = 0  -- Note: Typo?  `group_id`, not `group`
      AND  EXISTS 
    (
        SELECT  1         -- Note: simply "1" (or almost anything)
            FROM  quotes quote
            WHERE  quote.vehicle_id = v.id
              AND  quote.term = 60
              AND  quote.annual_mileage = 8000
              AND  quote.deposit = 250
              AND  quote.credit_rating = "Excellent" 
                         -- Note: group by not needed
    )                    -- Note: NULL check not needed

And, yes:

quote:  INDEX(credit_rating, deposit, annual_mileage, term, vehicle_id)
        -- Your existing `covering_ndx1` is fine.
vehicles:  INDEX(group_id, id) -- (may not be used; depends on frequency of `group_id = 0`)

Answered by Rick James on December 31, 2021

Yes, an index on quote.vehicle_id would help this greatly.

As a general rule it is a good idea to have an index on columns that are foreign keys, and you should define it as a foreign key with:

FOREIGN KEY (vehicle_id) REFERENCES vehicles(id)

This way the database will help verify your data and prevent a family of potential bugs happening.

Also you will find mySQL's query planner/engine produces a more efficient plan if you use WHERE EXISTS. Your sub-select with GROUP BY will force it to run that part once per outer row when more efficient strategies could be available. Also EXISTS can exit early, once the first matching row is found, but a subquery with an aggregate as above will stop that happening so more rows may be read and considered which is not needed:

SELECT *
FROM vehicles v
WHERE v.group = 0
AND EXISTS (
    SELECT id FROM quotes quote
    WHERE quote.vehicle_id = v.id
    AND quote.term = 60 
    AND quote.annual_mileage = 8000
    AND quote.deposit = 250
    AND quote.credit_rating = "Excellent"
    ) 

You still want the index on quote.vehicle_id for this to avoid full table scanning.

There are valid exceptions to this rule, which is part of why most database engines don't create such an index automatically, but they are fairly rare.

Answered by David Spillett on December 31, 2021

Create an index to support this

ALTER TABLE quote ADD INDEX covering_ndx1
(term,annual_mileage,deposit,credit_rating,vehicle_id);

The first four(4) columns support the JOIN and the last column supports the GROUP BY

I would also experiment with using the EXISTS clause

SELECT *
FROM vehicles v
WHERE v.group = 0
AND EXISTS (
    SELECT count(id) FROM quotes quote
    WHERE quote.vehicle_id = v.id
    AND quote.term = 60 
    AND quote.annual_mileage = 8000
    AND quote.deposit = 250
    AND quote.credit_rating = "Excellent"
    GROUP BY quote.vehicle_id
);

Answered by RolandoMySQLDBA on December 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP