TransWikia.com

Finding closest point to shapefile coastline Python

Geographic Information Systems Asked by DannyTG on August 3, 2020

I have a dataset consisting of 2,2 million houses in a Norway, and I want to find the distance to the coast for each dwelling. The location of the house is made up by latitude and longitude coordinates, and I have downloaded a shapefile of Europe with all the coastlines from (Could not find a shapefile for only Norway):
https://www.eea.europa.eu/data-and-maps/data/eea-coastline-for-analysis-1/gis-data/europe-coastline-shapefile

Which have the coastline in both MULTILINESTRING and POLYGON format.

Then I try using the nearest point function from the shapely library, but it only assign the same value for each point, and it takes too long to run.

import geopandas as gpd
from shapely.geometry import Point
from shapely.ops import nearest_points

def findClosestCoastline(df):

   #assign Point from the coordinates of dwelling
   df['point'] = [Point(x, y) for x, y in zip(df['coord_x'].astype(float), 
                                              df['coord_y'].astype(float))]
   #Load shapefile with MULTILINESTRING format
   map_2= gpd.read_file("Europe_coastline.shp")

   for i in range(df)
      #TODO: Fix this, get the closest point from the MULTISTRINGLINE
      np = nearest_points(map_2.iloc[0]['geometry'], df.iloc[i]['point'])[0]
      df.loc[i, 'closestPoint_Coast'] = np

      #TODO generate the distance in KM
      df.loc[i, 'distance_To_Coast'] =

return df

One Answer

The fastest way to do this is to create an image where the value of each pixel is the distance to the coastline, then pluck values from that image for each point. Of course, that means the precision of the distance is only going to be as good as the resolution of the image, so if you want want less than a few meters accuracy, the image would have to be huge.

Using GeoPandas, you can map across the points in parallel calling the shapely distance method to the coastline file. Since there is one feature in the coastline file point.distance(coastlines) will give you the distance to to coastline for each point. This is still fairly slow though, on a 4 core machine it runs around 340 features per second for me. That'll take some time to rip through 2.5MM.

import geopandas as gpd
from shapely.geometry import Point, box
from random import uniform
from concurrent.futures import ThreadPoolExecutor
from tqdm.notebook import tqdm

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

#single geom for Norway
norway = world[world["name"]=="Norway"].dissolve(by='name').iloc[0].geometry

#single geom for the coastline
coastline = gpd.clip(gpd.read_file("Europe_coastline.shp").to_crs('EPSG:4326'),
                     norway.buffer(0.25)).iloc[0].geometry

def make_point(id):
    point = None
    while point is None or not norway.contains(point):
        point = Point(uniform(norway.bounds[0],norway.bounds[2]),
                      uniform(norway.bounds[1],norway.bounds[3]))
    return {"id": id, "geometry": point}

def compute_distance(point):
    point['dist_to_coastline'] = point['geometry'].distance(coastline)
    return point

with ThreadPoolExecutor(max_workers=4) as tpe:
    points = list(tqdm(tpe.map(make_point, range(10000)), desc="generating points", total=10000))
    result = list(tqdm(tpe.map(compute_distance, points), desc="computing distances", total=len(points)))

gdf = gpd.GeoDataFrame.from_records(result)

ax = gpd.GeoDataFrame.from_records([{"geometry":coastline}]).plot(figsize=(18,18))
ax = gdf.plot(ax=ax, column='dist_to_coastline', cmap='rainbow')
ax

enter image description here

Correct answer by Jeremy Malczyk on August 3, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP