TransWikia.com

PostGIS query used to run very quickly on Postgres 9.6 / PostGIS 2.4, now extremely slow on Postgres 12.4 / PostGIS 3.0

Geographic Information Systems Asked on June 30, 2021

We have two virtual servers on the same hardware (identical specs, 6 core Intel Xeon E5-2643), one is running Postgres 9.6 with PostGIS 2.4, and the other is running Postgres 12.4 with PostGIS 3.0. Both are using GEOS.

We are using a custom PostGIS function which can be found here: https://github.com/iboates/ST_RayCast

Using a simple dataset, we can compute the result instantly on the 9.6/2.4 server, but it takes 8 seconds or more on the 12.4/3.0 server.

You can test it out using this data:

Points

CREATE TABLE public.canyon_points (
    id integer NOT NULL,
    geom public.geometry(Point,23947)
);

INSERT INTO public.canyon_points (id, geom) VALUES (1, '01010000208B5D00007AE68ED83ED111412DDA68187C884141');
INSERT INTO public.canyon_points (id, geom) VALUES (2, '01010000208B5D0000D7D006A6E5D1114171D906BB81884141');
INSERT INTO public.canyon_points (id, geom) VALUES (3, '01010000208B5D0000371DD89F68D211419981959D85884141');
INSERT INTO public.canyon_points (id, geom) VALUES (4, '01010000208B5D00002CC0B22166D21141C656944C96884141');
INSERT INTO public.canyon_points (id, geom) VALUES (5, '01010000208B5D0000C2C09A9F61D2114109D4646EAB884141');
INSERT INTO public.canyon_points (id, geom) VALUES (6, '01010000208B5D0000E1BFBE6268D21141DB50C7F1C3884141');
INSERT INTO public.canyon_points (id, geom) VALUES (7, '01010000208B5D00002CC0B22166D21141A34D4BBDDC884141');
INSERT INTO public.canyon_points (id, geom) VALUES (8, '01010000208B5D00000DC18E5E5FD21141594A1219F6884141');
INSERT INTO public.canyon_points (id, geom) VALUES (9, '01010000208B5D0000EDC16A9B58D2114151C7EE7B0D894141');
INSERT INTO public.canyon_points (id, geom) VALUES (10, '01010000208B5D0000D1B63243AED2114195C68C1E13894141');
INSERT INTO public.canyon_points (id, geom) VALUES (11, '01010000208B5D0000E3A0B65157D31141D746A22511894141');

ALTER TABLE ONLY public.canyon_points
    ADD CONSTRAINT canyon_points_pkey PRIMARY KEY (id);

Polygons

CREATE TABLE public.canyon_building (
    id integer NOT NULL,
    geom public.geometry(Polygon,23947),
    height double precision
);

INSERT INTO public.canyon_building (id, geom, height) VALUES (1, '01030000208B5D0000010000000500000093B51104F2D01141088E0BF70A894141FB185D700AD21141695CE44F0B89414111FFE9FD0FD21141E3F796FEE9884141A99B9E91F7D011416931FA61EB88414193B51104F2D01141088E0BF70A894141', 7);
INSERT INTO public.canyon_building (id, geom, height) VALUES (2, '01030000208B5D00000100000005000000A99B9E91F7D011416931FA61EB884141A99B9E91F7D01141B1DC2489CD884141FB185D700AD21141CAD4E8CCCB88414111FFE9FD0FD21141E3F796FEE9884141A99B9E91F7D011416931FA61EB884141', 6);
INSERT INTO public.canyon_building (id, geom, height) VALUES (3, '01030000208B5D00000100000005000000A99B9E91F7D01141B1DC2489CD88414193B51104F2D0114136EB9DFEAE88414175B40F1FE9D11141ED1489E9AC884141FB185D700AD21141CAD4E8CCCB884141A99B9E91F7D01141B1DC2489CD884141', 8);
INSERT INTO public.canyon_building (id, geom, height) VALUES (4, '01030000208B5D000001000000050000007CCF8476ECD011411FDA26838988414193B51104F2D0114136EB9DFEAE88414175B40F1FE9D11141ED1489E9AC884141535BBCCAE0D11141115529068E8841417CCF8476ECD011411FDA268389884141', 8);
INSERT INTO public.canyon_building (id, geom, height) VALUES (5, '01030000208B5D00000100000005000000DCC186AFC1D21141CB2ABDA80B894141709B175BA5D3114145F159450A8941418681A4E8AAD311418A8A4DB6F3884141098EA0CACCD21141D36062CBF5884141DCC186AFC1D21141CB2ABDA80B894141', 4);
INSERT INTO public.canyon_building (id, geom, height) VALUES (6, '01030000208B5D00000100000005000000098EA0CACCD21141D36062CBF5884141E7344D76C4D21141A01A0668B9884141385C377997D3114101E9DEC0B98841418681A4E8AAD311418A8A4DB6F3884141098EA0CACCD21141D36062CBF5884141', 5);
INSERT INTO public.canyon_building (id, geom, height) VALUES (7, '01030000208B5D00000100000005000000E7344D76C4D21141A01A0668B9884141F2A7133DC7D21141E8C5308F9B884141DFC3034381D3114101BEF4D299884141385C377997D3114101E9DEC0B9884141E7344D76C4D21141A01A0668B9884141', 8);

ALTER TABLE ONLY public.canyon_building
    ADD CONSTRAINT canyon_building_pkey PRIMARY KEY (id);

ST_RayCast function

CREATE OR REPLACE FUNCTION ST_RayCast(
    in_point GEOMETRY,
    in_boundaries GEOMETRY,
    out_geom_type TEXT,
    num_rays INTEGER,
    max_ray_dist FLOAT
)

RETURNS GEOMETRY AS

$$
DECLARE
    adj FLOAT;
    opp FLOAT;
    theta FLOAT = 0;
    srid INTEGER = ST_SRID(in_point);
    candidate_geom GEOMETRY;
    return_geom GEOMETRY;
    
BEGIN

    /* Do preliminary checks for common problems */
    IF ST_SRID(in_boundaries) != srid THEN
        RAISE EXCEPTION 'SRID of input points (%) does not match input boundaries SRID (%)', ST_SRID(in_point), ST_SRID(in_boundaries);
    END IF;

    IF out_geom_type NOT IN ('MULTIPOINT', 'MULTILINESTRING') THEN
        RAISE EXCEPTION 'Output geometry type (''%'') must be one of (''%'', ''%'')', out_geom_type, 'MULTIPOINT', 'MULTILINESTRING';
    END IF;

    /* Cast rays over the specified angle window */
    WHILE theta < 2*pi() LOOP

        candidate_geom = (
            /* Make a CTE for the casted ray endpoint so we only have to query it once */
            WITH
                ray
            AS MATERIALIZED (
                SELECT 
                    /* Make a ray */
                    ST_MakeLine(
                        ST_Transform(
                            ST_Project(
                                /* PostGIS only allows projecting points in geographical CRS, so we have to do some transforming here */
                                ST_Transform(
                                    in_point,
                                    4326
                                )::geography,
                                max_ray_dist,
                                theta
                            )::geometry,
                            srid
                        ),
                        in_point
                    ) AS geom
            )
            SELECT
                /* Intersect this ray with the input boundaries, ignore empty results (no ray intersection) */
                ST_Intersection(
                    ST_MakeLine(
                        ray.geom,
                        in_point
                    ),
                    in_boundaries
                ) AS geom
            FROM
                ray
            WHERE
                NOT ST_IsEmpty(
                    ST_Intersection(
                        ST_MakeLine(
                            ray.geom,
                            in_point
                        ),
                        in_boundaries
                    )
                )
        );

        /* In the case of multiple ray intersections, take the closest one */
        IF ST_NumGeometries(candidate_geom) > 1 THEN
            candidate_geom = (
                SELECT
                    dp.geom
                FROM
                    ST_DumpPoints(candidate_geom) AS dp
                ORDER BY
                    ST_Distance(
                        in_point,
                        dp.geom
                    ) ASC
                LIMIT 1
            );
        END IF;

        /* Either prep the point for return or make a line out of it for return depending on user input */
        IF out_geom_type = 'MULTIPOINT' THEN
            return_geom = ST_Collect(ST_CollectionExtract(return_geom, 1), candidate_geom);
        ELSIF out_geom_type = 'MULTILINESTRING' THEN
            return_geom = ST_Collect(
                ST_CollectionExtract(
                    return_geom,
                    2
                ),
                ST_MakeLine(in_point, candidate_geom)
            );
        END IF;

        theta = theta + 2*pi() / num_rays;

    END LOOP;

    /* Return all the points or lines created */
    IF out_geom_type = 'MULTIPOINT' THEN
        RETURN ST_Multi(ST_CollectionExtract(return_geom, 1));
    ELSIF out_geom_type = 'MULTILINESTRING' THEN
        RETURN ST_Multi(ST_CollectionExtract(return_geom, 2));
    END IF;

END
$$
LANGUAGE plpgsql;
COMMENT ON FUNCTION ST_RayCast(
    GEOMETRY,
    GEOMETRY,
    TEXT,
    INTEGER,
    FLOAT
) IS 'Created by Isaac Boates. Use of this software is at the user''s own risk, and no responsibility is claimed by the creator in the event of damages, whether tangible or financial caused directly or indirectly by the use of this software.';

Query

with buildings as (
    select height, geom as geom from canyon_building
),

street_points as (
    select
        s.id as id,
        s.geom as geom
    from
        canyon_points as s
)


select
    ST_RayCast(
        sp.geom,
        ST_CollectionExtract(ST_Collect(ST_ExteriorRing(b.geom)), 2),
        out_geom_type := 'MULTIPOINT',
        num_rays := 16,
        max_ray_dist := 50
    ) as geom
from
    street_points as sp,
    buildings as b
where
    ST_DWithin(sp.geom, b.geom, 50)
group by
    sp.geom

As I said, in Postgres 9.6 / PostGIS 2.4, it executes almost instantaneously. In Postgres 12.4 / PostGIS 3.0, it takes 8 seconds or more.

I am actually the author of this Raycast function and I was able to run it on my home computer with much lower specs (back then on 9.6), and it was also fast.

I have verified that there is no VACCUM or other process happening, and the CPU usage stays close to 0 (until we execute the query)

Does anybody have any idea what is going on?

If it helps, here is the result of EXPLAIN ANALYZE (can be visualized at https://tatiyants.com/pev):

9.6/2.4:

[
  {
    "Plan": {
      "Node Type": "Aggregate",
      "Strategy": "Sorted",
      "Partial Mode": "Simple",
      "Parallel Aware": false,
      "Startup Cost": 28419.14,
      "Total Cost": 28891.15,
      "Plan Rows": 200,
      "Plan Width": 64,
      "Actual Startup Time": 404.149,
      "Actual Total Time": 4241.192,
      "Actual Rows": 11,
      "Actual Loops": 1,
      "Group Key": ["s.geom"],
      "Plans": [
        {
          "Node Type": "Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Startup Cost": 28419.14,
          "Total Cost": 28422.73,
          "Plan Rows": 1435,
          "Plan Width": 64,
          "Actual Startup Time": 0.391,
          "Actual Total Time": 0.426,
          "Actual Rows": 77,
          "Actual Loops": 1,
          "Sort Key": ["s.geom"],
          "Sort Method": "quicksort",
          "Sort Space Used": 45,
          "Sort Space Type": "Memory",
          "Plans": [
            {
              "Node Type": "Nested Loop",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Join Type": "Inner",
              "Startup Cost": 25.88,
              "Total Cost": 28343.90,
              "Plan Rows": 1435,
              "Plan Width": 64,
              "Actual Startup Time": 0.220,
              "Actual Total Time": 0.297,
              "Actual Rows": 77,
              "Actual Loops": 1,
              "Inner Unique": false,
              "Plans": [
                {
                  "Node Type": "Nested Loop",
                  "Parent Relationship": "Outer",
                  "Parallel Aware": false,
                  "Join Type": "Inner",
                  "Startup Cost": 25.88,
                  "Total Cost": 28308.50,
                  "Plan Rows": 1,
                  "Plan Width": 32,
                  "Actual Startup Time": 0.205,
                  "Actual Total Time": 0.233,
                  "Actual Rows": 7,
                  "Actual Loops": 1,
                  "Inner Unique": false,
                  "Join Filter": "st_dwithin((st_union(s_1.geom)), canyon_building.geom, '50'::double precision)",
                  "Rows Removed by Join Filter": 0,
                  "Plans": [
                    {
                      "Node Type": "Aggregate",
                      "Strategy": "Plain",
                      "Partial Mode": "Simple",
                      "Parent Relationship": "Outer",
                      "Parallel Aware": false,
                      "Startup Cost": 25.88,
                      "Total Cost": 25.89,
                      "Plan Rows": 1,
                      "Plan Width": 32,
                      "Actual Startup Time": 0.185,
                      "Actual Total Time": 0.186,
                      "Actual Rows": 1,
                      "Actual Loops": 1,
                      "Plans": [
                        {
                          "Node Type": "Seq Scan",
                          "Parent Relationship": "Outer",
                          "Parallel Aware": false,
                          "Relation Name": "canyon_points",
                          "Alias": "s_1",
                          "Startup Cost": 0.00,
                          "Total Cost": 22.70,
                          "Plan Rows": 1270,
                          "Plan Width": 32,
                          "Actual Startup Time": 0.009,
                          "Actual Total Time": 0.012,
                          "Actual Rows": 11,
                          "Actual Loops": 1
                        }
                      ]
                    },
                    {
                      "Node Type": "Seq Scan",
                      "Parent Relationship": "Inner",
                      "Parallel Aware": false,
                      "Relation Name": "canyon_building",
                      "Alias": "canyon_building",
                      "Startup Cost": 0.00,
                      "Total Cost": 21.30,
                      "Plan Rows": 1130,
                      "Plan Width": 32,
                      "Actual Startup Time": 0.006,
                      "Actual Total Time": 0.007,
                      "Actual Rows": 7,
                      "Actual Loops": 1
                    }
                  ]
                },
                {
                  "Node Type": "Seq Scan",
                  "Parent Relationship": "Inner",
                  "Parallel Aware": false,
                  "Relation Name": "canyon_points",
                  "Alias": "s",
                  "Startup Cost": 0.00,
                  "Total Cost": 22.70,
                  "Plan Rows": 1270,
                  "Plan Width": 32,
                  "Actual Startup Time": 0.003,
                  "Actual Total Time": 0.005,
                  "Actual Rows": 11,
                  "Actual Loops": 7
                }
              ]
            }
          ]
        }
      ]
    },
    "Planning Time": 0.261,
    "Triggers": [
    ],
    "Execution Time": 4241.271
  }
]

12.4/3.0:

[
  {
    "Plan": {
      "Node Type": "Aggregate",
      "Strategy": "Sorted",
      "Partial Mode": "Simple",
      "Parallel Aware": false,
      "Startup Cost": 435.72,
      "Total Cost": 435.99,
      "Plan Rows": 1,
      "Plan Width": 64,
      "Actual Startup Time": 7.349,
      "Actual Total Time": 57.570,
      "Actual Rows": 11,
      "Actual Loops": 1,
      "Group Key": ["sp.geom"],
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "InitPlan",
          "Subplan Name": "CTE buildings",
          "Parallel Aware": false,
          "Relation Name": "canyon_building",
          "Alias": "canyon_building",
          "Startup Cost": 0.00,
          "Total Cost": 21.30,
          "Plan Rows": 1130,
          "Plan Width": 40,
          "Actual Startup Time": 0.006,
          "Actual Total Time": 0.009,
          "Actual Rows": 7,
          "Actual Loops": 1
        },
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "InitPlan",
          "Subplan Name": "CTE street_points",
          "Parallel Aware": false,
          "Relation Name": "canyon_points",
          "Alias": "s",
          "Startup Cost": 0.00,
          "Total Cost": 22.70,
          "Plan Rows": 1270,
          "Plan Width": 36,
          "Actual Startup Time": 0.002,
          "Actual Total Time": 0.004,
          "Actual Rows": 11,
          "Actual Loops": 1
        },
        {
          "Node Type": "Aggregate",
          "Strategy": "Plain",
          "Partial Mode": "Simple",
          "Parent Relationship": "InitPlan",
          "Subplan Name": "CTE street_points_union",
          "Parallel Aware": false,
          "Startup Cost": 25.88,
          "Total Cost": 25.89,
          "Plan Rows": 1,
          "Plan Width": 32,
          "Actual Startup Time": 0.216,
          "Actual Total Time": 0.217,
          "Actual Rows": 1,
          "Actual Loops": 1,
          "Plans": [
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Relation Name": "canyon_points",
              "Alias": "s_1",
              "Startup Cost": 0.00,
              "Total Cost": 22.70,
              "Plan Rows": 1270,
              "Plan Width": 32,
              "Actual Startup Time": 0.011,
              "Actual Total Time": 0.014,
              "Actual Rows": 11,
              "Actual Loops": 1
            }
          ]
        },
        {
          "Node Type": "Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Startup Cost": 365.83,
          "Total Cost": 365.83,
          "Plan Rows": 1,
          "Plan Width": 64,
          "Actual Startup Time": 0.416,
          "Actual Total Time": 0.436,
          "Actual Rows": 77,
          "Actual Loops": 1,
          "Sort Key": ["sp.geom"],
          "Sort Method": "quicksort",
          "Sort Space Used": 45,
          "Sort Space Type": "Memory",
          "Plans": [
            {
              "Node Type": "Nested Loop",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Join Type": "Inner",
              "Startup Cost": 0.00,
              "Total Cost": 365.82,
              "Plan Rows": 1,
              "Plan Width": 64,
              "Actual Startup Time": 0.261,
              "Actual Total Time": 0.319,
              "Actual Rows": 77,
              "Actual Loops": 1,
              "Plans": [
                {
                  "Node Type": "Nested Loop",
                  "Parent Relationship": "Outer",
                  "Parallel Aware": false,
                  "Join Type": "Inner",
                  "Startup Cost": 0.00,
                  "Total Cost": 327.72,
                  "Plan Rows": 1,
                  "Plan Width": 32,
                  "Actual Startup Time": 0.258,
                  "Actual Total Time": 0.296,
                  "Actual Rows": 7,
                  "Actual Loops": 1,
                  "Join Filter": "((spu.geom && st_expand(b.geom, '50'::double precision)) AND (b.geom && st_expand(spu.geom, '50'::double precision)) AND _st_dwithin(spu.geom, b.geom, '50'::double precision))",
                  "Rows Removed by Join Filter": 0,
                  "Plans": [
                    {
                      "Node Type": "CTE Scan",
                      "Parent Relationship": "Outer",
                      "Parallel Aware": false,
                      "CTE Name": "street_points_union",
                      "Alias": "spu",
                      "Startup Cost": 0.00,
                      "Total Cost": 0.02,
                      "Plan Rows": 1,
                      "Plan Width": 32,
                      "Actual Startup Time": 0.218,
                      "Actual Total Time": 0.219,
                      "Actual Rows": 1,
                      "Actual Loops": 1
                    },
                    {
                      "Node Type": "CTE Scan",
                      "Parent Relationship": "Inner",
                      "Parallel Aware": false,
                      "CTE Name": "buildings",
                      "Alias": "b",
                      "Startup Cost": 0.00,
                      "Total Cost": 22.60,
                      "Plan Rows": 1130,
                      "Plan Width": 32,
                      "Actual Startup Time": 0.006,
                      "Actual Total Time": 0.012,
                      "Actual Rows": 7,
                      "Actual Loops": 1
                    }
                  ]
                },
                {
                  "Node Type": "CTE Scan",
                  "Parent Relationship": "Inner",
                  "Parallel Aware": false,
                  "CTE Name": "street_points",
                  "Alias": "sp",
                  "Startup Cost": 0.00,
                  "Total Cost": 25.40,
                  "Plan Rows": 1270,
                  "Plan Width": 32,
                  "Actual Startup Time": 0.000,
                  "Actual Total Time": 0.002,
                  "Actual Rows": 11,
                  "Actual Loops": 7
                }
              ]
            }
          ]
        }
      ]
    },
    "Planning Time": 0.249,
    "Triggers": [
    ],
    "Execution Time": 57.667
  }
]

Difference in some pg_settings params

![enter image description here

One Answer

I can't see your explains apparently, but one of the major changes that slow a lot of my queries in term of performance is the CTE optimization.

In postgres 12, the CTE is now, if possible, treated as a subquery, and not as a temporary table. You should try to change WITH ray AS ( to WITH ray AS MATERIALIZED ( to be sure that the behaviour is similar as the old version (compute the CTE first), otherwise the planner may want to compute only on demand during the rest of the query.

You should also do that to the query that you use for your tests.


EDIT

I reproduced the example and I don't have any issue with it (postgresql 12.3 / 3.0):

enter image description here

Maybe a problem with the install, or the parameters of postgres?

EDIT2

I forgot that I also have an old postgres (10.10/2.4) running, so I tested this on it too and still no problem (150ms)

EDIT3

I don't really understand what's goin on, but as I said, on a big server with comfortable parameter for postgres there is no performance issue, no matter the version. So I tried to look with default postgres (I just ran some dockers) with differents versions of postgres and postgis and here is what I got:

  • 12/3 : problem (>10s)
  • 10/3 : problem (9s)
  • 9.6/3 : no problem (150ms)

My guess is that there is a link to the parallelization (lot of things parallelized in 10) and the memory bottlenecks that are easy to come by when you manipulate relatively big data (geom) in parallel. The defaults parameters of postgres are really too much conservatives for postgis know that a lot of things are parallelized and the typical computer have 8 thread at its disposal. You can look at this answer that I made where I list the parameters (more explanation for the parameters here) that I usually change (some needs to restart the server).

EDIT4

I checked to set all the parameters that I had set for the big database that worked on all the postgis docker, It didn't make any difference.

So I checked the libs in the differents versions, here is what I have:

Table of comparison of libs

My guess is that there is something with the GEOS version (I don't think proj have something to do with this problem). I could be wrong and have missed something in the parameters. I don't have time to test to change the GEOS lib on the 12 docker, but I think it is worth to check.

EDIT5

I've seen that ther was a docker postgis with postgresql 10 and postgis 2.5, so I tested with it

PostGIS_full_version: POSTGIS="2.5.5" [EXTENSION] PGSQL="100" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.2, released 2016/10/24" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" TOPOLOGY RASTER

To launch the docker: docker run --name test_postgis_10_st_25 -e POSTGRES_PASSWORD=mysecretpassword --shm-size=40g --memory="100g" --memory-swap="150g" --memory-swappiness=0 --ulimit memlock=81920000000:81920000000 -d postgis/postgis:10-2.5 -c shared_buffers='25GB' -c effective_cache_size='70GB' -c work_mem='256MB' -c maintenance_work_mem='5GB' -c autovacuum_work_mem='5GB' -c max_connections='1000' -c effective_io_concurrency='200' -c max_worker_processes='24' -c max_parallel_workers='24' -c max_parallel_workers_per_gather='12' -c wal_buffers='16MB' -c min_wal_size='1GB' -c max_wal_size='2GB'

It didn't have the bug (it works in 150ms). Same request with docker postgis/postgis:10-3.0-alpine finished in 10s

I really think that there is a problem with GEOS 3.8.1 for this request

Answered by robin loche on June 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP