Working with surftemp-sst data¶

  • Funding: ESA and AWS Public Dataset Program

Credits: Tutorial development¶

  • Niall McCarroll
  • Dr. Chelle Gentemann - Twitter - Farallon Institute

Credits: Creating of the Zarr CCI SST dataset.¶

  • Niall McCarroll
  • Dr. Chris Merchant

Tutorial 2 - Worked Example - Marine Heatwaves¶

This tutorial provides a more in-depth example of working with the Sea Surface Temperature Daily Analysis dataset: European Space Agency Climate Change Initiative product version 2.1, hosted by AWS opendata.

A brief summary of this dataset is:

Global daily-mean sea surface temperatures, presented on a 0.05° latitude-longitude grid, with gaps between available daily observations filled by statistical means, spanning late 1981 to recent time.

Synthesised from multiple Earth orbiting satellites carrying infrared imagers from thousands of billions of individual measurements. Underlying observation resolution ranges from 1 to 20 km, and after gap filling the typical feature resolution is ~20 km. Suitable for large-scale oceanographic meteorological and climatological applications, such as evaluating or constraining environmental models or case-studies of marine heat wave events.

Adhering to community data standards and names. Includes temperature uncertainty information and auxiliary information about land-sea fraction and sea-ice coverage. To understand the data for your application, read the paper [1] using www.nature.com/articles/s41597-019-0236-x to cite in any published usage.

The v2.1 record is known to have biases associated with desert dust aerosol and erratic calibration of early-record sensors [1]. Adjustments to reduce these biases and include additional uncertainty in these effects have been developed, as described in [2] and are applied to this data. These adjustments operate on monthly and >5 degree time-space scales.

[1] Merchant, C.J., Embury, O., Bulgin, C.E., Block, T., Corlett, G.K., Fiedler, E., Good, S.A., Mittaz, J., Rayner, N.A., Berry, D., Eastwood, S., Taylor, M., Tsushima, Y., Waterfall, A., Wilson, R. and Donlon, C. (2019), Satellite-based time-series of sea-surface temperature since 1981 for climate applications. Scientific Data 6, 223, doi:10.1038/s41597-019-0236-x

[2] Merchant, C.J. and Embury, O. (2020) Adjusting for desert-dust-related biases in a climate data record of sea surface temperature. Remote Sensing, 12 (16). 2554. ISSN 2072-4292 doi:10.3390/rs12162554

Although the entire datset is over 300Gb in size, a subset of the data can be downloaded selectively to inspect particular areas, locations or time periods. This tutorial will provide a worked example of doing this in order to investigate the occurence of marine heatwaves.

1.0 Importing packages¶

In [1]:
import fsspec
import xarray as xr
import matplotlib
import numpy as np

1.1 Opening the SST dataset¶

The following code will load the data and create an SST dataset based on the data stored on AWS opendata

In [2]:
# if you are running on AWS cloud you can use the line below and it will be faster to read and access the data
#sst_ds = xr.open_zarr(fsspec.get_mapper("s3://surftemp-sst/data/sst.zarr"),consolidated=True)

# if you are running anywhere other than AWS cloud (eg. your local computer or mybinder.org) this is the line that will work
sst_ds = xr.open_zarr('https://surftemp-sst.s3.us-west-2.amazonaws.com/data/sst.zarr')

sst_ds
Out[2]:
<xarray.Dataset>
Dimensions:                   (time: 14367, lat: 3600, lon: 7200, bnds: 2)
Coordinates:
  * lat                       (lat) float32 -89.97 -89.93 -89.88 ... 89.93 89.97
  * lon                       (lon) float32 -180.0 -179.9 -179.9 ... 179.9 180.0
  * time                      (time) datetime64[ns] 1981-09-01T12:00:00 ... 2...
Dimensions without coordinates: bnds
Data variables:
    analysed_sst              (time, lat, lon) float32 dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
    analysed_sst_uncertainty  (time, lat, lon) float32 dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
    lat_bnds                  (lat, bnds) float32 dask.array<chunksize=(360, 2), meta=np.ndarray>
    lon_bnds                  (lon, bnds) float32 dask.array<chunksize=(720, 2), meta=np.ndarray>
    mask                      (time, lat, lon) float32 dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
    sea_ice_fraction          (time, lat, lon) float32 dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
Attributes: (12/61)
    Conventions:                     CF-1.5, Unidata Observation Dataset v1.0
    Metadata_Conventions:            Unidata Dataset Discovery v1.0
    acknowledgment:                  Funded by the Copernicus Climate Change ...
    cdm_data_type:                   grid
    comment:                         These data were produced by the Met Offi...
    contact:                         http://copernicus-support.ecmwf.int
    ...                              ...
    time_coverage_resolution:        P1D
    time_coverage_start:             20201215T000000Z
    title:                           C3S SST L4 product
    tracking_id:                     7fdf2639-26e5-4d4f-a60e-0bcfc9744204
    uuid:                            7fdf2639-26e5-4d4f-a60e-0bcfc9744204
    westernmost_longitude:           -180.0
xarray.Dataset
    • time: 14367
    • lat: 3600
    • lon: 7200
    • bnds: 2
    • lat
      (lat)
      float32
      -89.97 -89.93 ... 89.93 89.97
      axis :
      Y
      bounds :
      lat_bnds
      comment :
      Latitude geographical coordinates,WGS84 projection
      long_name :
      Latitude
      reference_datum :
      geographical coordinates, WGS84 projection
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      array([-89.975, -89.925, -89.875, ...,  89.875,  89.925,  89.975],
            dtype=float32)
    • lon
      (lon)
      float32
      -180.0 -179.9 ... 179.9 180.0
      axis :
      X
      bounds :
      lon_bnds
      comment :
      Longitude geographical coordinates,WGS84 projection
      long_name :
      Longitude
      reference_datum :
      geographical coordinates, WGS84 projection
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      array([-179.975, -179.925, -179.875, ...,  179.875,  179.925,  179.975],
            dtype=float32)
    • time
      (time)
      datetime64[ns]
      1981-09-01T12:00:00 ... 2020-12-...
      axis :
      T
      bounds :
      time_bnds
      comment :
      long_name :
      reference time of sst field
      standard_name :
      time
      array(['1981-09-01T12:00:00.000000000', '1981-09-02T12:00:00.000000000',
             '1981-09-03T12:00:00.000000000', ..., '2020-12-29T12:00:00.000000000',
             '2020-12-30T12:00:00.000000000', '2020-12-31T12:00:00.000000000'],
            dtype='datetime64[ns]')
    • analysed_sst
      (time, lat, lon)
      float32
      dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
      depth :
      20 cm
      long_name :
      analysed sea surface temperature
      source :
      ATSR<1,2>-ESACCI-L3U-v2.0, AATSR-ESACCI-L3U-v2.0, AVHRR<07,09,11,12,14,15,16,17,18,19>_G-ESACCI-L3U-v2.0, AVHRRMTA_G-ESACCI-L3U-v2.0
      standard_name :
      sea_water_temperature
      units :
      kelvin
      valid_max :
      4500
      valid_min :
      -300
      Array Chunk
      Bytes 1.35 TiB 49.44 MiB
      Shape (14367, 3600, 7200) (50, 360, 720)
      Count 28801 Tasks 28800 Chunks
      Type float32 numpy.ndarray
      7200 3600 14367
    • analysed_sst_uncertainty
      (time, lat, lon)
      float32
      dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
      long_name :
      estimated error standard deviation of analysed_sst
      standard_name :
      sea_water_temperature standard_error
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      0
      Array Chunk
      Bytes 1.35 TiB 49.44 MiB
      Shape (14367, 3600, 7200) (50, 360, 720)
      Count 28801 Tasks 28800 Chunks
      Type float32 numpy.ndarray
      7200 3600 14367
    • lat_bnds
      (lat, bnds)
      float32
      dask.array<chunksize=(360, 2), meta=np.ndarray>
      comment :
      Contains the northern and southern boundaries of the grid cells.
      long_name :
      Latitude cell boundaries
      reference_datum :
      geographical coordinates, WGS84 projection
      valid_max :
      90.0
      valid_min :
      -90.0
      Array Chunk
      Bytes 28.12 kiB 2.81 kiB
      Shape (3600, 2) (360, 2)
      Count 11 Tasks 10 Chunks
      Type float32 numpy.ndarray
      2 3600
    • lon_bnds
      (lon, bnds)
      float32
      dask.array<chunksize=(720, 2), meta=np.ndarray>
      comment :
      Contains the eastern and western boundaries of the grid cells.
      long_name :
      Longitude cell boundaries
      reference_datum :
      geographical coordinates, WGS84 projection
      valid_max :
      180.0
      valid_min :
      -180.0
      Array Chunk
      Bytes 56.25 kiB 5.62 kiB
      Shape (7200, 2) (720, 2)
      Count 11 Tasks 10 Chunks
      Type float32 numpy.ndarray
      2 7200
    • mask
      (time, lat, lon)
      float32
      dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
      comment :
      b0: 1=grid cell is open sea water b1: 1=grid cell is land b2: 1=grid cell is lake surface b3: 1=grid cell is sea ice b4-b7: reserved for future grid mask data
      flag_masks :
      [1, 2, 4, 8, 16]
      flag_meanings :
      water land optional_lake_surface sea_ice optional_river_surface
      long_name :
      sea/land/lake/ice field composite mask
      source :
      NAVOCEANO_landmask_v1.0 EUMETSAT_OSI-SAF_icemask ARCLake_lakemask
      valid_max :
      31
      valid_min :
      1
      Array Chunk
      Bytes 1.35 TiB 49.44 MiB
      Shape (14367, 3600, 7200) (50, 360, 720)
      Count 28801 Tasks 28800 Chunks
      Type float32 numpy.ndarray
      7200 3600 14367
    • sea_ice_fraction
      (time, lat, lon)
      float32
      dask.array<chunksize=(50, 360, 720), meta=np.ndarray>
      comment :
      Sea ice area fraction
      long_name :
      sea ice area fraction
      source :
      EUMETSAT_OSI-SAF-ICE-OSI-450-v2.0, EUMETSAT_OSI-SAF-ICE-OSI-430-v1.2
      standard_name :
      sea_ice_area_fraction
      units :
      1
      valid_max :
      100
      valid_min :
      0
      Array Chunk
      Bytes 1.35 TiB 49.44 MiB
      Shape (14367, 3600, 7200) (50, 360, 720)
      Count 28801 Tasks 28800 Chunks
      Type float32 numpy.ndarray
      7200 3600 14367
  • Conventions :
    CF-1.5, Unidata Observation Dataset v1.0
    Metadata_Conventions :
    Unidata Dataset Discovery v1.0
    acknowledgment :
    Funded by the Copernicus Climate Change Service. Use of these data should acknowledge the Copernicus Climate Change Service
    cdm_data_type :
    grid
    comment :
    These data were produced by the Met Office as part of the C3S project. WARNING Some applications are unable to properly handle signed byte values. If values are encountered > 127, please subtract 256 from this reported value
    contact :
    http://copernicus-support.ecmwf.int
    corrections :
    Post-hoc corrections have been applied from Merchant and Embury (2020) https://doi.org/10.3390/rs12162554
    creation_date :
    2021-02-10T10:53:38Z
    creator_email :
    science.leader@esa-sst-cci.org
    creator_name :
    Copernicus Climate Change Service (C3S)
    creator_processing_institution :
    These data were produced on the JASMIN infrastructure at STFC as part of the C3S project
    creator_url :
    https://climate.copernicus.eu/
    date_created :
    20210210T105338Z
    easternmost_longitude :
    180.00001525878906
    file_quality_level :
    3
    gds_version_id :
    2.0
    geospatial_lat_max :
    90.0
    geospatial_lat_min :
    -90.0
    geospatial_lat_resolution :
    0.05000000074505806
    geospatial_lat_units :
    degrees_north
    geospatial_lon_max :
    180.0
    geospatial_lon_min :
    -180.0
    geospatial_lon_resolution :
    0.05000000074505806
    geospatial_lon_units :
    degrees_east
    geospatial_vertical_max :
    -0.20000000298023224
    geospatial_vertical_min :
    -0.20000000298023224
    history :
    Created using OSTIA reanalysis system v3.0
    id :
    OSTIA-C3S-L4-GLOB-v2.0
    institution :
    C3S
    keywords :
    Oceans > Ocean Temperature > Sea Surface Temperature
    keywords_vocabulary :
    NASA Global Change Master Directory (GCMD) Science Keywords
    license :
    Creative Commons Licence by attribution (https://creativecommons.org/licenses/by/4.0/)
    metadata_link :
    http://www.esa-cci.org
    naming_authority :
    org.ghrsst
    netcdf_version_id :
    4.2.1.1 of Oct 19 2012 14:25:16
    northernmost_latitude :
    90.0
    platform :
    NOAA-19, MetOpA, Sentinel-3A
    processing_level :
    L4
    product_specification_version :
    SST_CCI-PSD-UKMO-201-Issue-H
    product_version :
    2.0
    project :
    Copernicus Climate Change Service (C3S)
    publisher_email :
    copernicus-support@ecmwf.int
    publisher_name :
    Copernicus Climate Data Store
    publisher_url :
    https://climate.copernicus.eu/
    references :
    http://www.esa-sst-cci.org
    sensor :
    AVHRR_GAC, SLSTR
    source :
    AVHRR19_G-C3S-L3U-ICDR-v2.0 AVHRRMTA_G-C3S-L3U-ICDR-v2.0 SLSTRA-C3S-L3U-ICDR-v2.0, OSI-430
    southernmost_latitude :
    -90.0
    spatial_resolution :
    0.05 degree
    standard_name_vocabulary :
    NetCDF Climate and Forecast (CF) Metadata Convention
    start_time :
    20201215T000000Z
    stop_time :
    20201216T000000Z
    summary :
    OSTIA L4 product from the C3S project, produced using OSTIA reanalysis sytem v3.0
    time_coverage_duration :
    P1D
    time_coverage_end :
    20201216T000000Z
    time_coverage_resolution :
    P1D
    time_coverage_start :
    20201215T000000Z
    title :
    C3S SST L4 product
    tracking_id :
    7fdf2639-26e5-4d4f-a60e-0bcfc9744204
    uuid :
    7fdf2639-26e5-4d4f-a60e-0bcfc9744204
    westernmost_longitude :
    -180.0

1.2 Select an area to study¶

We'll look at an area of the great barrier reef, near Heron Island

In [3]:
lat_min = -25
lat_max = -21
lon_min = 150
lon_max = 154

Lets look at the sea surface temperatures in this region on a specific day

In [4]:
sst_ds["analysed_sst"].sel(time="2000-01-01").sel(lat=slice(lat_min,lat_max),lon=slice(lon_min,lon_max)) \
   .plot(cmap="coolwarm")
Out[4]:
<matplotlib.collections.QuadMesh at 0x7fa096f8ba90>

1.2 Create a climatology¶

We will choose a 28-year period for our climatology (the years 1982-2009) and compute the 90th percentile SSTs for each day of the year using an 11 day sliding window, over the given region only. An extra "binomial smoothing" function is also applied on the time axis to the 90th percentiles.

In [5]:
climatology_start = "1982-01-01"
climatology_end = "2009-12-31"
In [6]:
climatology_raw_data = sst_ds["analysed_sst"] \
     .sel(time=slice(climatology_start,climatology_end)) \
     .sel(lat=slice(lat_min,lat_max),lon=slice(lon_min,lon_max))

Compute the 90th percentile for each day of the year over the 11 day window

In [7]:
climatology_rolling = climatology_raw_data.chunk({"lat": 10, "lon": 10, "time":-1}) \
    .pad(pad_width={"time":5},mode="wrap") \
    .rolling(time=11,center=True) \
    .construct("window") \
    .isel(time=slice(5,-5))
In [8]:
climatology_90th_percentile = climatology_rolling.chunk({"time":-1}) \
    .groupby("time.dayofyear") \
    .quantile(0.9,dim=["time","window"]).load()
/Users/cv922550/miniconda3/envs/sstenv/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1396: RuntimeWarning: All-NaN slice encountered
  overwrite_input, interpolation)

Perform the binomial smoothing

In [9]:
binomial_coefficients = (np.poly1d([0.5, 0.5]) ** 30).coeffs
climatology_90th_percentile_smoothed = (climatology_90th_percentile \
    .pad(pad_width={"dayofyear":15},mode="wrap")                                    
    .rolling(dayofyear=31,center=True) \
    .construct("window") \
    .isel(dayofyear=slice(15,-15)) \
    * binomial_coefficients).sum(axis=-1,skipna=False)

Display a time series for a cell in the middle of the climatology region

In [10]:
climatology_90th_percentile_smoothed.sel(lat=(lat_min+lat_max)/2,lon=(lon_min+lon_max)/2,method="nearest").plot()
Out[10]:
[<matplotlib.lines.Line2D at 0x7fa08edd2550>]

1.3 Apply the climatology to SST data over a given period to look for marine heatwaves¶

We can now compare sea surface temperatures against our climatology. We'll use a simple way to identify marine heatwaves if the temperature is above the 90th percentile for 5 consecutive days, and search our region over a particular time period.

In [11]:
# define the period of interest

period = ("2018-01-01","2018-12-31")
In [12]:
import datetime

time_window_days = 5

start_date = datetime.datetime.strptime(period[0],"%Y-%m-%d")
end_date = datetime.datetime.strptime(period[1],"%Y-%m-%d")
# need to widen the time range by +/- time_window_days to get accurate heatwave information for the entire period
data_start_date = start_date - datetime.timedelta(days=time_window_days)
data_end_date = end_date +  datetime.timedelta(days=time_window_days)

sst_data = sst_ds["analysed_sst"].sel(time=slice(start_date,end_date)) \
     .sel(lat=slice(lat_min,lat_max),lon=slice(lon_min,lon_max)) \
     .load()
In [13]:
# create a new dataarray with dayofyear instead of time 
sst_data["dayofyear"] = sst_data.time.dt.dayofyear
sst_data_doy = xr.DataArray(data=sst_data.data, dims=["dayofyear","lat", "lon"], coords=[sst_data.dayofyear,
                                                                                     sst_data.lat,
                                                                                     sst_data.lon])
In [14]:
n_lats = len(sst_data_doy["lat"])
n_lons = len(sst_data_doy["lon"])
n_days = len(sst_data_doy["dayofyear"])
In [15]:
# create an array to hold a count of the numbers of consecutive hot days
# we will step through the time period and update this array
hot_day_count = np.zeros(shape=(n_lats,n_lons))
In [16]:
mhw_flags = np.zeros(shape=(n_days,n_lats,n_lons))
for day in range(n_days):
    doy = sst_data_doy["dayofyear"].values[day]
    # update the hot_day_count for this day
    hot_day_count = xr.where(sst_data_doy.sel(dayofyear=doy) > climatology_90th_percentile_smoothed \
                             .sel(dayofyear=doy),hot_day_count+1,0).values
    # wherever there are at least time_window_days of hot weather at a cell, flag the current day and the 
    # preceeding (time_window_days-1) days as heat wave days
    mhw = hot_day_count >= time_window_days

    for prev in range(max(0,day-time_window_days+1),day+1):
        mhw_flags[prev,:,:] = np.where(mhw,1,mhw_flags[prev,:,:])
    
mhw_flags = np.where(np.isnan(sst_data),np.nan,mhw_flags)

mhw = xr.DataArray(data=mhw_flags,dims=["time","lat", "lon"], 
                   coords=[sst_data.time, sst_data.lat, sst_data.lon])

# clip the dataset back to the original time period
mhw = mhw.sel(time=slice(period[0],period[1]))

Check the fraction of marine heatwave cells, firstly over time, then over space

In [17]:
mhw.mean(dim=["lat","lon"],skipna=True).plot()
Out[17]:
[<matplotlib.lines.Line2D at 0x7fa082e7ded0>]
In [18]:
mhw.mean(dim=["time"],skipna=True).plot()
Out[18]:
<matplotlib.collections.QuadMesh at 0x7fa083358290>

Plot marine heatwave areas in red for particular dates

In [19]:
# check which areas are experiencing a marine heatwave on a particular date
mhw.sel(time="2018-07-01").plot(cmap="coolwarm",add_colorbar=False)
Out[19]:
<matplotlib.collections.QuadMesh at 0x7fa083664690>
In [20]:
# plot the development of a marine heatwave "spike" over a two month period

mhw.sel(time=slice("2018-07-01","2018-08-31")) \
   .plot(cmap="coolwarm", col="time",col_wrap=4,add_colorbar=False)
   
Out[20]:
<xarray.plot.facetgrid.FacetGrid at 0x7fa08381f910>
In [ ]: