TransWikia.com

Issue using the S2 cloud probability mask on GEE

Geographic Information Systems Asked on September 29, 2021

I want to use the recently available Sentinel 2 Cloud probability mask to extract some reduced time series. I have tried using the example JS code and port it over to Python.

import datetime as dt
import warnings
import pandas as pd

# Import EEngine

import ee
ee.Initialize()

print(ee.__version__)

year = 2019 ; start_time = dt.datetime(year, 6, 1) ; end_time = dt.datetime(year, 9, 1)

region = ee.Geometry.Point((-2.3, 39.48))
location = ee.Geometry.Polygon(coords=[[
                [ -2.3435211181640625, 39.48827731619338 ],
                [ -2.3369979858398438, 39.48827731619338 ],
                [ -2.3369979858398438, 39.493576333300965 ],
                [ -2.3435211181640625, 39.493576333300965 ],
                [ -2.3435211181640625, 39.48827731619338 ]]],
                proj="EPSG:4326", geodesic=False)

def maskClouds(img, MAX_CLOUD_PROBABILITY=65):
    clouds = ee.Image(img.get('cloud_mask')).select('probability')
    isNotCloud = clouds.lt(MAX_CLOUD_PROBABILITY)
    return img.updateMask(isNotCloud)

def maskEdges(s2_img):
    return s2_img.updateMask(
      s2_img.select('B8A').mask()).updateMask(s2_img.select('B9').mask())

s2Sr = ee.ImageCollection('COPERNICUS/S2_SR').filterBounds(region).
                            filterDate(start=start_time, opt_end=end_time)

s2Clouds = ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY').filterBounds(region).
                            filterDate(start=start_time, opt_end=end_time)


s2Sr = s2Sr.map(maskEdges)


## Join S2 SR with cloud probability dataset to add cloud mask.
s2SrWithCloudMask = ee.Join.saveFirst('cloud_mask').
               apply(s2Sr, s2Clouds,
                    ee.Filter.equals(
                                leftField = "system.index",
                                 rightField ="system.index")
                    )
#s2SrWithCloudMask = ee.Join.saveFirst('cloud_mask').apply(
#  s2Sr, s2Clouds, ee.Filter.equals(leftField="system:index", rightField='system:index'))

s2CloudMasked = ee.ImageCollection(s2SrWithCloudMask).map(maskClouds)

#dd = s2CloudMasked.getInfo()

# Reducer over a geometry
reduce_me = lambda image: ee.Feature(None,
                                     image.reduceRegion(
                                         ee.Reducer.mean(), location, 10)
                                    )

data = s2CloudMasked.map(reduce_me).getInfo()
# Convert to pandas data frame
df = pd.DataFrame([dict(xx['properties'], **{'id':xx['id']}) 
                for xx in data['features']]) 
df['time'] = pd.to_datetime([x.split("_")[0] for x in df.id]) 
df = df.drop_duplicates('time') 
df = df.set_index('time')
df

This appears to work, returning ~18 columns of data (including the obvious cloud). If I extend the temporal period (e.g. change start_time=dt.datetime(year, 3, 1), the processing just returns None instead of the average reflectance value over the polygon, even for the same dates that with the reduced time period before it was working fine. The code works OK on JS (even with larger temporal windows)

var s2Sr = ee.ImageCollection('COPERNICUS/S2_SR');
var s2Clouds = ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY');

var START_DATE = ee.Date('2018-12-01');
var END_DATE = ee.Date('2019-12-01');
var MAX_CLOUD_PROBABILITY = 65;
var region = ee.Geometry.Point({coords:[-2.3, 39.48]});
var location = ee.Geometry.Polygon({coords:[[
                [ -2.3435211181640625, 39.48827731619338 ],
                [ -2.3369979858398438, 39.48827731619338 ],
                [ -2.3369979858398438, 39.493576333300965 ],
                [ -2.3435211181640625, 39.493576333300965 ],
                [ -2.3435211181640625, 39.48827731619338 ]]],
                geodesic:false})

Map.centerObject(region, 12);

function maskClouds(img) {
  var clouds = ee.Image(img.get('cloud_mask')).select('probability');
  var isNotCloud = clouds.lt(MAX_CLOUD_PROBABILITY);
  return img.updateMask(isNotCloud);
}

// Filter input collections by desired data range and region.
var criteria = ee.Filter.and(
    ee.Filter.bounds(region), ee.Filter.date(START_DATE, END_DATE));
s2Sr = s2Sr.filter(criteria);
s2Clouds = s2Clouds.filter(criteria);

// Join S2 SR with cloud probability dataset to add cloud mask.
var s2SrWithCloudMask = ee.Join.saveFirst('cloud_mask').apply({
  primary: s2Sr,
  secondary: s2Clouds,
  condition:
      ee.Filter.equals({leftField: 'system:index', rightField: 'system:index'})
});

var s2CloudMasked = ee.ImageCollection(
  s2SrWithCloudMask).map(maskClouds);


print(ui.Chart.image.series(
    s2CloudMasked, location, ee.Reducer.median()))

The Python API version is 0.1.225, and I had to downgrade from 0.1.228 as it was giving all sorts of weird errors. This could well be a bug, but since I’m a newbie with GEE, I thought I’d ask in case someone can advise 😉

One Answer

This will (eventually) work in Colab, but it is very, VERY slow and I had to click run a lot of times until I could see a result. If you are running in any other environment, frequently update to the latest version of the client library. Here's a trick: run the Code Editor example and wait for tiles to start loading so you know the cache is hot. Then run the Colab notebook at the same scale (I just eyeballed it). First, authenticate:

import ee
ee.Authenticate()
ee.Initialize()

Now you can run the rest:

# Sentinel-2 Level 1C data.  Bands B7, B8, B8A and B10 from this
# dataset are needed as input to CDI and the cloud mask function.
s2 = ee.ImageCollection('COPERNICUS/S2')
# Cloud probability dataset.  The probability band is used in
# the cloud mask function.
s2c = ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY')
# Sentinel-2 surface reflectance data for the composite.
s2Sr = ee.ImageCollection('COPERNICUS/S2_SR')

roi = ee.Geometry.Point([-122.4431, 37.7498])

# Dates over which to create a median composite.
start = ee.Date('2019-03-01')
end = ee.Date('2019-09-01')

# S2 L1C for Cloud Displacement Index (CDI) bands.
s2 = s2.filterBounds(roi).filterDate(start, end).select(['B7', 'B8', 'B8A', 'B10'])
# S2Cloudless for the cloud probability band.
s2c = s2c.filterDate(start, end).filterBounds(roi)
# S2 L2A for surface reflectance bands.
s2Sr = s2Sr.filterDate(start, end).filterBounds(roi).select(['B2', 'B3', 'B4', 'B5'])

# Join two collections on their 'system:index' property.
# The propertyName parameter is the name of the property
# that references the joined image.
def indexJoin(collectionA, collectionB, propertyName):
  joined = ee.ImageCollection(ee.Join.saveFirst(propertyName).apply(
    primary = collectionA,
    secondary = collectionB,
    condition = ee.Filter.equals(
      leftField = 'system:index',
      rightField = 'system:index'
    )
  ))
  # Merge the bands of the joined image.
  return joined.map(lambda image: image.addBands(ee.Image(image.get(propertyName))))

# Aggressively mask clouds and shadows.
def maskImage(image):
  # Compute the cloud displacement index from the L1C bands.
  cdi = ee.Algorithms.Sentinel2.CDI(image)
  s2c = image.select('probability')
  cirrus = image.select('B10').multiply(0.0001)

  # Assume low-to-mid atmospheric clouds to be pixels where probability
  # is greater than 65%, and CDI is less than -0.5. For higher atmosphere
  # cirrus clouds, assume the cirrus band is greater than 0.01.
  # The final cloud mask is one or both of these conditions.
  isCloud = s2c.gt(65).And(cdi.lt(-0.5)).Or(cirrus.gt(0.01))

  # Reproject is required to perform spatial operations at 20m scale.
  # 20m scale is for speed, and assumes clouds don't require 10m precision.
  isCloud = isCloud.focal_min(3).focal_max(16)
  isCloud = isCloud.reproject(crs = cdi.projection(), scale = 20)

  # Project shadows from clouds we found in the last step. This assumes we're working in
  # a UTM projection.
  shadowAzimuth = ee.Number(90).subtract(ee.Number(image.get('MEAN_SOLAR_AZIMUTH_ANGLE')))

  # With the following reproject, the shadows are projected 5km.
  isCloud = isCloud.directionalDistanceTransform(shadowAzimuth, 50)
  isCloud = isCloud.reproject(crs = cdi.projection(), scale = 100)

  isCloud = isCloud.select('distance').mask()
  return image.select(['B2', 'B3', 'B4']).updateMask(isCloud.Not())

# Join the cloud probability dataset to surface reflectance.
withCloudProbability = indexJoin(s2Sr, s2c, 'cloud_probability')
# Join the L1C data to get the bands needed for CDI.
withS2L1C = indexJoin(withCloudProbability, s2, 'l1c')

# Map the cloud masking function over the joined collection.
masked = ee.ImageCollection(withS2L1C.map(maskImage))

# Take the median, specifying a tileScale to avoid memory errors.
median = masked.reduce(ee.Reducer.median(), 8)

check = median.reduceRegion(ee.Reducer.first(), roi, 30)
# print(check.getInfo()) # OK

# Display the results.
viz = {'bands': ['B4_median', 'B3_median', 'B2_median'], 'min': 0, 'max': 3000}

map_id = median.getMapId(viz)

import folium

map = folium.Map(location=[37.6413, -122.2582])
folium.TileLayer(
  tiles=map_id['tile_fetcher'].url_format,
  attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
  overlay=True,
  name='median demo',
).add_to(map)
map.add_child(folium.LayerControl())
map

Note that I put a reduceRegion() check in there, which you should also do for a point in your study area. You can do the same thing with size() etc. to verify that there is a non-zero number of images in your region.

Answered by Nicholas Clinton on September 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP