How to efficiently access the features returned by QgsSpatialIndex?

Question

The PyQGIS Cookbook explains how to set up the spatial index but it only explains half of its usage:

create spatial index — the following code creates an empty index
index = QgsSpatialIndex()

add features to index — index takes QgsFeature object and adds it to the internal data structure. You can create the object manually or use one from previous call to provider’s nextFeature()
index.insertFeature(feat)

once spatial index is filled with some values, you can do some queries
# returns array of feature IDs of five nearest features
nearest = index.nearestNeighbor(QgsPoint(25.4, 12.7), 5)

What's the most efficient step to get the actual features belonging to the returned feature IDs?

gsherman · Accepted Answer

# assume a list of feature ids returned from index and a QgsVectorLayer 'lyr' fids = [1, 2, 4] request = QgsFeatureRequest() request.setFilterFids(fids) features = lyr.getFeatures(request) # can now iterate and do fun stuff: for feature in features: print feature.id(), feature 1 2 4

Snorfalorpagus · Answer

In a blog post on the subject, Nathan Woodrow (@Nathan W) provides the following code:
layer = qgis.utils.iface.activeLayer()

# Select all features along with their attributes
allAttrs = layer.pendingAllAttributesList()
layer.select(allAttrs)
# Get all the features to start
allfeatures = {feature.id(): feature for (feature) in layer}

def noindex():
    for feature in allfeatures.values():
        for f in allfeatures.values():
            touches = f.geometry().touches(feature.geometry())
            # It doesn't matter if we don't return anything it's just an example

def withindex():
    # Build the spatial index for faster lookup.
    index = QgsSpatialIndex()
    map(index.insertFeature, allfeatures.values())

# Loop each feature in the layer again and get only the features that are going to touch.
    for feature in allfeatures.values():
        ids = index.intersects(feature.geometry().boundingBox())
        for id in ids:
            f = allfeatures[id]
            touches = f.geometry().touches(feature.geometry())
            # It doesn't matter if we don't return anything it's just an example

import timeit
print "With Index: %s seconds " % timeit.timeit(withindex,number=1)
print "Without Index: %s seconds " % timeit.timeit(noindex,number=1)

This creates a dictionary allowing you to quickly look up a QgsFeature using it's FID.
I've found that for very large layers this isn't especially practical as it requires a lot of memory. However, the alternative (random access of the desired feature) using layer.getFeatures(QgsFeatureRequest().setFilterFid(fid)) seems to be comparatively very slow. I'm not sure why this is, as the equivalent call using the SWIG OGR bindings layer.GetFeature(fid) seems much faster than this.

gene · Answer

For comparisons, look at More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc. The solution presented use the Python modules Fiona, Shapely and  rtree (Spatial Index).

With PyQGIS and the same example two layers, point and polygon:

1) Without a spatial index:

polygons = [feature for feature in polygon.getFeatures()]
points = [feature for feature in point.getFeatures()]
for pt in points: 
    point = pt.geometry()
    for pl  in polygons:
        poly = pl.geometry()
        if poly.contains(point):
            print point.asPoint(), poly.asPolygon()
(184127,122472) [[(183372,123361), (184078,123130), (184516,122631),   (184516,122265), (183676,122144), (183067,122570), (183128,123105), (183372,123361)]]
(183457,122850) [[(183372,123361), (184078,123130), (184516,122631), (184516,122265), (183676,122144), (183067,122570), (183128,123105), (183372,123361)]]
(184723,124043) [[(184200,124737), (185368,124372), (185466,124055), (185515,123714), (184955,123580), (184675,123471), (184139,123787), (184200,124737)]]
(182179,124067) [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]

2) With the R-Tree PyQGIS spatial index:

# build the spatial index with all the polygons and not only a bounding box
index = QgsSpatialIndex()
for poly in polygons:
     index.insertFeature(poly)

# intersections with the index 
# indices of the index for the intersections
for pt in points:
    point = pt.geometry()
    for id in index.intersects(point.boundingBox()):
    print id
0
0
1
2

What does these indices mean ?

for i, pt in enumerate(points):
     point = pt.geometry()
     for id in index.intersects(point.boundingBox()):
        print "Point ", i, points[i].geometry().asPoint(), "is in Polygon ", id, polygons[id].geometry().asPolygon()
Point  1 (184127,122472) is in Polygon  0 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]
Point  2 (183457,122850) is in Polygon  0 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]
Point  4 (184723,124043) is in Polygon  1 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]
Point  6 (182179,124067) is in Polygon  2 [[(182520,125175), (183348,124286), (182605,123714), (182252,123544), (181753,123799), (181740,124627), (182520,125175)]]

Same conclusions as in More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc:

Without and index, you must iterate through all the geometries (polygons and points).  
With a bounding spatial index (QgsSpatialIndex()), you iterate only through the geometries which have a chance to intersect with your current geometry ('filter' which can save a considerable amount of calculations and time...). 
You can also use other spatial index Python modules (rtree, Pyrtree or Quadtree) with PyQGIS as in Using a QGIS spatial index to speed up your code (with QgsSpatialIndex() and rtree)
but a Spatial Index is not a magic wand. When a very large part of the dataset has to be retrieved, a Spatial Index cannot give any speed benefit.

Other example in GIS se: How to find the nearest line to a point in QGIS? [duplicate]

evod · Answer

Apparently the only method to get good performance is to avoid or bundle calls to layer.getFeatures(), even if the filter is as simple as an fid.

Now, here’s the trap: calling getFeatures is expensive. If you call it
  on a vector layer, QGIS will be required to setup an new connection to
  the data store (the layer provider), create some query to return data,
  and parse each result as it is returned from the provider. This can be
  slow, especially if you’re working with some type of remote layer,
  such as a PostGIS table over a VPN connection.

source: http://nyalldawson.net/2016/10/speeding-up-your-pyqgis-scripts/

How to efficiently access the features returned by QgsSpatialIndex?

4 Answers

Add your own answers!

Ask a Question