TransWikia.com

How to select classes from a GeoTIFF world map and compute the area by regions defined in a shapefile

Geographic Information Systems Asked by Antonello on May 18, 2021

I have two files:

Both files are EPSG 4326, that if I understood it well it is WTS84 datum with coordinates system (no projected) and cover the whole World.

I need to select the land use classes from fileLUse whose value is in a certain range ([40-130]) and compute the area and area ratio of this aggregate for each region in fileRegions.
My "output" would then be a CSV with region name (from fileRegions), total area, share of aggregate land use.

My main problem is that the files are not projected, so I don’t really know how to give them a unit, as all "equal area" projections I saw are specific for a certain area of the World.

Which approach should I take ? I could use QGIS, GRASS, R, Python… (I’m on Linux)

[EDIT]

What I have done so far:

  • Computed 0/1 binary data of Land Covers in the interested range:
gdal_calc.py -A fileLUse.tif --outfile=fileNatVegAreas.tif --calc="logical_and(A>=40,A<=140)"
  • I used rasterstats to get the mean/count of each "polygon"
from rasterstats import zonal_stats
stats = zonal_stats("fileRegions.shp", "fileNatVegAreas.tif")

What I am left with:

  • stats is a vector of dictionaries with the various stats, e.g.
>>> stats[0]
{'min': 0.0, 'max': 1.0, 'mean': 0.5842002754571467, 'count': 739861}

How do I link back my stats to the field "RegionName" in my shapefile, and possibly export everything as a CSV ?

One Answer

At the end I done like this:

  1. Computed the 0/1 Indicator Variable of the range
gdal_calc.py -A fileLUse.tif --outfile=natVegIndicator2019.tif --calc="logical_and(A>=40,A<=140)"

For some reason this created a 8GB map out of a 300MB original tif map.

  1. Used the following simple Python script
vectorBoundariesFile = "gadm36_1.shp"
rasterDataFile       = "natVegIndicator2019.tif"

sqKmPerCell          = (300*300)/(1000*1000)

from rasterstats import zonal_stats
import geopandas as gpd
import numpy as np
import pandas as pd

stats   = zonal_stats(vectorBoundariesFile, rasterDataFile, stats=['mean', 'count','sum'])
regdf   = gpd.read_file(vectorBoundariesFile)

countCells  = pd.Series([i['count'] for i in stats])
sqKm        = countCells * sqKmPerCell  

means   = pd.Series([i['mean'] for i in stats])
sums    = pd.Series([i['sum'] for i in stats])


data    = [regdf['GID_0'], regdf['NAME_0'],regdf['GID_1'], regdf['NAME_1'],means,sqKm,countCells,sums]
headers = ["GID_0", "NAME_0","GID_1", "NAME_1","natVegRatio","SqKm","countAllCells","countForCell"]
out     = pd. concat(data, axis=1, keys=headers)

out.to_csv("vegIndex2019.csv")

An other (and perhaps better) approach is to just use the step #2 passing the parameter categorical=True to the zonal_stats function call (see the rasterstat user manual) and then compute the ratio of the desired range ex-post from the dataframe in Python.

Note that the final SqKm could be biased and different than official country-level statistics.. I learn that even just "measuring an area" on a large part of the Earth is not a trivial task. However, using the counts of cells to obtain the ratio of different land classes should be an unbiased method.

Answered by Antonello on May 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP