TransWikia.com

Problem with spatial join of large shapefiles

Geographic Information Systems Asked by Lucas_Edenhofer on January 1, 2021

I want to combine the attributes of five shapefiles into one shapefile using QGIS. Therefore, I used several data processing steps including clipping the five shapefiles to the boundaries of my investigation area using Vector > Geoprocessing Tools > Clip and using Fix Geometries to avoid invalid polygons. The polygon shape and number of the shapefiles are different, but they have the CRS.

I used Vector > Data Management Tools > Join Attributes by location to join the shapefiles.
But after I joined the third shapefile, the shapefile size has already reached around 1 GB. I have read that the maximum shapefile size is 2 GB. While joining the fourth shapefile, the computation time was really long and my computer crashed around half of it. I tried it a second time, but it happened again.

My three shapefiles look like this:

  1. Temperature_Precipation_Slope with 40.423 Features
    (already joined three shapefiles)
Field       Type      Length     Precision
Precipitat  Integer   4          0
Temperatur  Integer   2          0
Slope       Integer   1          0

enter image description here
2. BUEK200 with 101 features

Field       Type      Length     Precision
TKLE_NR     Integer   6          0

enter image description here
3. CLC10 with 2675 features

Field       Type      Length     Precision
clc         Integer   3          0

enter image description here
I already decreased and aggregated my data, by using ranges in the attribute tables to minimize the number of polygons within one shapefile. Moreover, I changed the type and size of the fields with Refactor fields.

I could add the fourth shapefile and save it as a GeoPackage, but the size of the GeoPackage was already 17 GB.
While adding the fifth shapefile and saving the result as a GeoPackage, my laptop crashed again. I guess I don’t have enough memory space to save such a large file.

I now try to use PostGIS (which I haven’t used before) and managed to upload my shapefiles into the two tables Temperature_Precipation_Slope and Buek200_CLC10.

If I understood correctly I can’t use ST_Union, as the shapefiles don’t have a common attribute.

SELECT * From "Temperature_Precipation_Slope"
SELECT * From "Buek200_CLC10"

However, I don’t understand how to perform the spatial join by location

2 Answers

As I didn't have enough memory space available and I couldn't further aggregate my data, I used PostGIS to solve my problem. I uploaded the already partly joined shapefiles into the two tables Temperature_Precipation_Slope and Buek200_CLC10 in PostGIS. There I used the following Query to perform a spatial join of the tables:

CREATE TABLE gis_data AS
SELECT tps.precipitat, tps.temperatur, tps.slope, bc.tkle_nr, bc.clc
From "Temperature_Precipation_Slope" tps, "Buek200_CLC10" bc
WHERE ST_Intersects(tps.geom, bc.geom);

Correct answer by Lucas_Edenhofer on January 1, 2021

You don't have to use PostGIS. There are other options available to get around the space size. A Geopackage is probably the easiest option.

Size of the output file will be driven by:

  • Number of records
  • Number of attributes per record Then to a lesser extent:
  • Complexity of attributes (eg: 10 integers uses more space than 5 integers, string fields of 255 characters, will use more than 20 characters).
  • Geometry complexity (number of vertices)

Perhaps put in your question

  • number of records
  • number of attributes per record.
  • geometry type

At the end of the day, if you have 1 million records, then you have 1 million records - If you don't need 1 million records, then think about aggregating your data. (Dissolve by attribute) before processing it any further.

Answered by nr_aus on January 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP