TransWikia.com

Performance regression: fiona.transform much slower since GDAL 3

Geographic Information Systems Asked on March 29, 2021

I have run into some severe performance degradation when upgrading an environment to GDAL3. I could track the issue down to fiona.transform, which is a lot slower (about 15 (!) times) now than it was with GDAL 2.4.

The issue can be illustrated using this line, which only transform one point (the actual script transform a geometry):

python -m timeit -s "from fiona.transform import transform" "transform('EPSG:31287', 'EPSG:4236', [419908], [333400])"

These are my performance measurements with the images from perrygeo/gdal-base and using fiona 1.8.13:

latest  Python 3.8.5 | GDAL 3.1.3 | GEOS 3.8.1 | PROJ 7.1.1 | 20 loops, best of 5: 20 msec per loop
20181219-6f5f6a29   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 1000 loops, best of 3: 675 usec per loop
20181219-f379ec62   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 1000 loops, best of 3: 705 usec per loop
20181221-40f73e30   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 1000 loops, best of 3: 698 usec per loop
20181221-bc2d4bbd   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 1000 loops, best of 3: 688 usec per loop
20181221-f7a0a299   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 1000 loops, best of 3: 703 usec per loop
20190312-f69f8699   Python 3.6.8 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 6.0.0 | 1000 loops, best of 3: 634 usec per loop
20190322-800eed8a   Python 3.6.8 | GDAL 2.4.1 | GEOS 3.7.1 | PROJ 6.0.0 | 1000 loops, best of 3: 589 usec per loop
20190509-da2e635a   Python 3.6.8 | GDAL 3.0.0 | GEOS 3.7.2 | PROJ 6.0.0 | 100 loops, best of 3: 10.4 msec per loop
20191110-6cc84c7e   Python 3.6.9 | GDAL 3.0.2 | GEOS 3.8.0 | PROJ 6.2.1 | 100 loops, best of 3: 11.4 msec per loop
20200301-8437abbb   Python 3.8.2 | GDAL 3.0.4 | GEOS 3.8.0 | PROJ 7.0.0 | 20 loops, best of 5: 10.5 msec per loop
20200509-50546ca8   Python 3.8.2 | GDAL 3.1.0 | GEOS 3.8.1 | PROJ 7.0.1 | 20 loops, best of 5: 10.2 msec per loop
20200907-c7ec91bc   Python 3.8.5 | GDAL 3.1.3 | GEOS 3.8.1 | PROJ 7.1.1 | 20 loops, best of 5: 10.8 msec per loop

Once can clearly see that the line performs at ~0.7 msec before GDAL3 and beginning with GDAL3 the line takes >10 msec to finish.

Does anyone have a hint, what could be the root of the issue and how it could be fixed?

2 Answers

I would recommend using pyproj as it has dealt with this issue already: https://pyproj4.github.io/pyproj/stable/advanced_examples.html#optimize-transformations

The creation of the transformer has more overhead in PROJ 6+. That is why pyproj added the Transformer class. See: https://github.com/pyproj4/pyproj/issues/187

Correct answer by snowman2 on March 29, 2021

Indeed, as @snowman2 points out, using pyproj fixes the performance issue. The relevant command would look like this (for more complex geometries use shapely.ops.transform):

python -m timeit -s "from pyproj import Transformer" -s "transform = Transformer.from_crs(31287, 4236).transform" "transform(419908, 333400)"

It sets up a pyproj.Transformer that is being reused by the transformations.

The benchmark looks like this:

latest  Python 3.8.5 | GDAL 3.1.3 | GEOS 3.8.1 | PROJ 7.1.1 | 10000 loops, best of 5: 20.7 usec per loop
20181219-6f5f6a29   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 10000 loops, best of 3: 29.8 usec per loop
20181219-f379ec62   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 10000 loops, best of 3: 31.2 usec per loop
20181221-40f73e30   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 10000 loops, best of 3: 56.9 usec per loop
20181221-bc2d4bbd   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 10000 loops, best of 3: 33.3 usec per loop
20181221-f7a0a299   Python 3.6.7 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 5.2.0 | 10000 loops, best of 3: 56.4 usec per loop
20190312-f69f8699   Python 3.6.8 | GDAL 2.4.0 | GEOS 3.7.1 | PROJ 6.0.0 | 10000 loops, best of 3: 24.7 usec per loop
20190322-800eed8a   Python 3.6.8 | GDAL 2.4.1 | GEOS 3.7.1 | PROJ 6.0.0 | 10000 loops, best of 3: 83.9 usec per loop
20190509-da2e635a   Python 3.6.8 | GDAL 3.0.0 | GEOS 3.7.2 | PROJ 6.0.0 | 10000 loops, best of 3: 43.7 usec per loop
20191110-6cc84c7e   Python 3.6.9 | GDAL 3.0.2 | GEOS 3.8.0 | PROJ 6.2.1 | 10000 loops, best of 3: 58.6 usec per loop
20200301-8437abbb   Python 3.8.2 | GDAL 3.0.4 | GEOS 3.8.0 | PROJ 7.0.0 | 10000 loops, best of 5: 12 usec per loop
20200509-50546ca8   Python 3.8.2 | GDAL 3.1.0 | GEOS 3.8.1 | PROJ 7.0.1 | 20000 loops, best of 5: 10.5 usec per loop
20200907-c7ec91bc   Python 3.8.5 | GDAL 3.1.3 | GEOS 3.8.1 | PROJ 7.1.1 | 20000 loops, best of 5: 11.2 usec per loop

PS.: This is a thousand (!) times faster in GDAL 3/PROJ 7 than the fiona approach from the question.

Answered by Stefan on March 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP