TransWikia.com

Unable to do a parallel INSERT using Postgres 9.6.0 & PostGIS 2.3.0

Geographic Information Systems Asked by minus34 on September 28, 2021

The following PostGIS query executes in parallel using Postgres 9.6’s new parallel processing feature:

SELECT pnts.mb_code11, bdys.mb_code11
FROM testing.mb_random_points AS pnts
INNER JOIN testing.mb_2011_aust AS bdys ON ST_Contains(bdys.geom, pnts.geom);

Here’s the query plan:

enter image description here

Run the same query as an insert, and it only runs on a single CPU:

INSERT INTO test
SELECT pnts.mb_code11, bdys.mb_code11
FROM testing.mb_random_points AS pnts
INNER JOIN testing.mb_2011_aust AS bdys ON ST_Contains(bdys.geom, pnts.geom);

Here’s it’s query plan:

enter image description here

How can I get the INSERT to run in parallel?

2 Answers

Writing data and reading data is a different story. Modifying data from different parallel resources always needs to be serialized and prepared for changes in source data while writing. It is therefore more complicated and needs to be treated carefully while reading data in parallel is no trouble.

Also see the wiki page for parallel queries.

Even when parallel query is enabled in general, the query planner will never generate a parallel plan if any of the following are true:

  • The query writes data. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. This is a limitation of the current implementation which could be lifted in a future release.
  • [...]

Update for PostgreSQL >= 11

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE ... AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan."

https://postgresql.org/docs/11/when-can-parallel-query-be-used.html

Thanks @chrismarx

Correct answer by Matthias Kuhn on September 28, 2021

It took me a while to figure this out and I want to share the solution so that others can real the benefits. Indeed, @chrismarx mentioned the basis for the answer in the last comment above: CREATE TABLE AS

Here is the process that has worked out very well for me:

  1. Verify that the base query is parallel-compliant.
EXPLAIN
-- Insert query
SELECT
...
FROM...
;

Do whatever you need to do to get that to trigger parallel workers (doing that is a different subject).

  1. Build the parallel query in TEMP space:
CREATE TEMP TABLE temp_table_name
-- Insert query
SELECT
...
FROM...
;
  1. Insert the TEMP table into the destination table:
INSERT INTO destination_table_name
SELECT * FROM temp_table_name
;
-- optionally:
DROP TABLE temp_table_name;

I have been using this on a large data warehousing project with great success.

Answered by Alexi Theodore on September 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP