TransWikia.com

Raster to GeoPandas

Geographic Information Systems Asked by Gregor D on March 31, 2021

I have written a little script to raster to GeoPandas dataframe:

with rasterio.open("INPUT FILE", 'r') as raster: 
    df = pd.DataFrame()
    puntos_calados_combinados=[]
    puntos=[]
    xmin,ymax=raster.transform* (0, 0)
    arr = raster.read(1)  # read all raster values
    for row in range(0,raster.width ):
        for col in range(0,raster.height):
            x=xmin+ (0.5*raster.transform[0] + row*raster.transform[0])
            y=ymax+ (0.5*raster.transform[4] + col*raster.transform[4])
            valor= arr[col,row]
            if arr[col,row]>0:
                x_y=[x,y]
                puntos_calados_combinados.append(x_y)

But it’s extremely slow, any idea to improve it?

2 Answers

Rather than looping through each pixel, you could try creating numpy arrays of X & Y coordinates then dumping those (plus the array of the raster values) into a DataFrame, then converting to a GeoDataFrame.

Below is a runnable example (Note, you do not want to do this with large rasters, you can run out of memory...):

import geopandas as gpd
import numpy as np
import pandas as pd
import rasterio as rio

with rio.Env():
    with rio.open('https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/float32.tif') as src:
        crs = src.crs

        # create 1D coordinate arrays (coordinates of the pixel center)
        xmin, ymax = np.around(src.xy(0.00, 0.00), 9)  # src.xy(0, 0)
        xmax, ymin = np.around(src.xy(src.height-1, src.width-1), 9)  # src.xy(src.width-1, src.height-1)
        x = np.linspace(xmin, xmax, src.width)
        y = np.linspace(ymax, ymin, src.height)  # max -> min so coords are top -> bottom



        # create 2D arrays
        xs, ys = np.meshgrid(x, y)
        zs = src.read(1)

        # Apply NoData mask
        mask = src.read_masks(1) > 0
        xs, ys, zs = xs[mask], ys[mask], zs[mask]

data = {"X": pd.Series(xs.ravel()),
        "Y": pd.Series(ys.ravel()),
        "Z": pd.Series(zs.ravel())}

df = pd.DataFrame(data=data)
geometry = gpd.points_from_xy(df.X, df.Y)
gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)

print(gdf.head())

Output:

          X          Y      Z                        geometry
0  440750.0  3751290.0  107.0  POINT (440750.000 3751290.000)
1  440810.0  3751290.0  123.0  POINT (440810.000 3751290.000)
2  440870.0  3751290.0  132.0  POINT (440870.000 3751290.000)
3  440930.0  3751290.0  115.0  POINT (440930.000 3751290.000)
4  440990.0  3751290.0  132.0  POINT (440990.000 3751290.000)

Correct answer by user2856 on March 31, 2021

Based on: https://gis.stackexchange.com/a/358057/144357

rds = rioxarray.open_rasterio("input.tif")
rds.name = "data"
df = rds.squeeze().to_dataframe().reset_index()
geometry = gpd.points_from_xy(df.x, df.y)
gdf = gpd.GeoDataFrame(df, crs=rds.rio.crs, geometry=geometry)

Answered by snowman2 on March 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP