Indexes not used for join

Question

I join listings_tail table and materialised view cals_reviewed_wide_scaled on listing_id column of type integer.
The table has around 1,5 billion rows and 2 mln unique listing ids. The MV has been tested with a range from 500 to 500 000 rows. Both columns are indexed with btree, hash, and brin.
explain always returns Seq Scan despite of the indexes available.
How to optimize the indexing to make them work?
query
SELECT * from listings_tail L
INNER JOIN cals_reviewed_wide_scaled R 
ON L.listing_id = R.listing_id;

tables
d listings_tail

id                   | bigint                      | not null default nextval('listings_tail_id_seq'::regclass)
 listing_id           | integer                     | not null
 parsing_timestamp    | timestamp without time zone | not null
 value                | character varying           | not null
 homes_details_inside | integer                     | not null
 tags                 | character varying[]         | not null
Indexes:
    "listings_tail_pkey" PRIMARY KEY, btree (id)
    "listings_tail_dd_unique" UNIQUE CONSTRAINT, btree (listing_id, value, homes_details_inside, tags)
    "listing_id_tail" btree (listing_id)
    "tail_listing_id_brin" brin (listing_id)
    "tail_listing_id_hash" hash (listing_id)

d random_cals_reviewed_wide_scaled

View "public.random_cals_reviewed_wide_scaled"
 Column  |  Type  | Modifiers
---------+--------+-----------
 revs_id | bigint |

d cals_reviewed_wide_scaled
Materialized view "public.cals_reviewed_wide_scaled"

materialized view
CREATE MATERIALIZED VIEW cals_reviewed_wide_scaled
      as 
      SELECT 
       C.listing_id, 
       C.date, 
       C.price,
       C.parsing_timestamp,
       C.min_nights,
       C.max_nights,
       C.created_at,
       C.revs_id
       FROM cals_reviewed_wide as C
       JOIN random_cals_reviewed_wide_scaled as V
       ON C.revs_id = V.revs_id;

explain
EXPLAIN (ANALYZE, BUFFERS):

Laurenz Albe · Answer

Both tables are not small, and you have no WHERE condition that restricts the number of rows in either table.
work_mem is big enough to contain the hash of cals_reviewed_wide_scaled, so the most efficient plan is a hash join.
With a hash join, no index can be used, and a sequential scan is the most efficient access method.

You can try if a nested loop join might be faster.
For that, make sure that there is an index on listings_tail.listing_id, then try the following to nudge PostgreSQL in the right direction:

Try increasing effective_cache_size to the amount of RAM you have got.

Try lowering random_page_cost to 1.0.

If all fails, experiment with setting the following in your database session:
SET enable_hashjoin = off;
SET enable_mergejoin = off;

Then try again, and you should see a nested loop join. That way you can test if that is actually faster and PostgreSQL is wrong.

Indexes not used for join

query

tables

materialized view

explain

One Answer

Add your own answers!

Ask a Question