In this tutorial we will download C3S Satellite Soil Moisture data from the Copernicus Climate Data Store (CDS), read data stacks in python using xarray and perform simple analyses of soil moisture anomalies over selected study areas.
The tutorial consists of the following main steps:
Run the tutorial via free cloud platforms: |
---|
Note: Cells in this notebook should be executed in order (from top to bottom). Some of the later examples depend on previous ones!
%%capture --no-display
!pip install cdsapi
In the next cell we import all libraries necessary to run code in this notebook. Some of them are python standard libraries, that are installed by default. If you encounter any errors during import, you can install missing packages using the conda
package manager via:
!conda install -y -c conda-forge <PACKAGE>
A full list of dependencies required to run this notebook is available in the file environment.yml
at https://github.com/TUW-GEO/c3s_sm-tutorials. If you are on Binder or any other cloud platform, all necessary dependencies are already installed.
import os
import cdsapi
from pathlib import Path
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
import xarray as xr
import shutil
import pandas as pd
import numpy as np
import zipfile
from collections import OrderedDict
from matplotlib.lines import Line2D
import matplotlib.colors as colors
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
We also define a color map that we will use later for plotting the soil moisture data.
CM_SM = colors.LinearSegmentedColormap.from_list(
"BrownBlue",
np.array(
[[134, 80, 16],
[164, 117, 13],
[219, 190, 24],
[250, 249, 156],
[144, 202, 240],
[4, 145, 251],
[8, 83, 211],
[13, 37, 161]]) / 255.0
)
Satellite sensors are capable of observing the water content in the top soil layer from space. Various satellite missions from different space agencies provide measurements of radiation from the Earth's surface across different (microwave) frequency domains (e.g. Ku-, X-, C-, and L-band). These (raw) measurements are closely linked to the amount of water stored in the soil. There are two types of sensors that are used to measure this information: passive systems (radiometer) and active systems (radar).
For a detailed description, please see the C3S Soil Moisture Algorithm Theoretical Baseline Document, which is available along with the data on the Copernicus Climate Data Store.
Brightness temperature is the observable of passive sensors (expressed in $°K$). It is a function of kinetic temperature and emissivity. Wet soils have a higher emissivity than dry soils and therefore a higher brightness temperature. Passive soil moisture retrieval uses this difference between the kinetic temperature and brightness temperature to model the amount of water available in the soil of the observed area, while taking into account factors such as the water held by vegetation.
NASA's SMAP and ESA's SMOS satellites are examples for L-band radiometer missions. They are suitable for retrieving soil moisture globally, even when vegetation is present in a scene.
Different models to retrieve Soil Moisture from brightness temperature measurements exist. One of the them is the Land Parameter Retrieval Model (Owe et al., 2008, Owe et al., 2001, and van der Schalie et al., 2016). This model is used to derive soil moisture for all passive sensors in C3S.
The PASSIVE product of C3S Soil Moisture contains merged observations from passive systems only. It is given in volumetric units $[m^3 / m^3]$.
Active systems emit radiation in the microwave domain (C-band in C3S). As the energy pulses emitted by the radar hit the Earth's surface, a scattering effect occurs and part of the energy is reflected back, containing information on the surface state of the observed scene. The received energy is called “backscatter”, with rough and wet surfaces producing stronger signals than smooth or dry surfaces. Backscatter comprises reflections from the soil surface layer (“surface scatter”), vegetation (“volume scatter”) and interactions of the two.
ESA's ERS-1 and ERS-2, as well as EUMETSAT's Metop ASCAT sensors are active systems used in C3S soil moisture. In the case of Metop ASCAT, C3S Soil Moisture uses the Surface Soil Moisture products directly provided by H SAF, based on the WARP algorithm (Wagner et al., 199900036-X), Wagner et al., 2013).
The ACTIVE product of C3S Soil Moisture contains merged observations from active systems only. It is given in relative units $[\%$ $saturation]$.
Single-sensor products are limited by the life time of the satellite sensors. Climate change assessments, however, require the use of long-term data records, that span over multiple decades and provide consistent and comparable observations. The C3S Soil Moisture record therefore merges the observations from more than 15 sensors into one harmonized record. The main two steps of the product generation include scaling all sensors to a common reference, and subsequently merging them by applying a weighted average, where sensor with a lower error are assigned a higher weight.
The following figure shows all satellite sensors merged in the PASSIVE (only radiometers), ACTIVE (only scatterometers) and COMBINED (scatterometers and radiometers) C3S product (data set version v202212).
C3S Soil Moisture is based on the ESA CCI SM algorithm, which is described in Dorigo et al., 2017 and Gruber et al., 2019.
The COMBINED product is also given in volumetric units $[m^3 / m^3]$. However, the absolute values depend on the scaling reference, which is used to bring all sensors into the same dynamic range. In this case we use soil moisture simulations for the first 10 cm from the GLDAS Noah model (Rodell et al., 2004).
Different products and versions for C3S Soil Moisture are available on the Copernicus Climate Data Store. In general, there are 2 types of data records:
There are different options to specify a valid C3S Soil Moisture CDS data download request. You can use the CDS GUI to generate a valid request (button Show API Request
) and copy/paste it into your python script as shown below. To summarize the options:
variable
: Either volumetric_surface_soil_moisture
(must be chosen to download the PASSIVE or COMBINED data) or surface_soil_moisture
(required for ACTIVE product)type_of_sensor
: One of active
, passive
or combined_passive_and_active
(must match with the selected variable!)time_aggregation
: month_average
, 10_day_average
, or day_average
. The original data is daily. Monthly and 10-daily averages are often required for climate analyses and therefore provided for convenience.year
: a list of years to download data for (COMBINED and PASSIVE data is available from 1978 onward, ACTIVE starts in 1991)month
: a list of months to download data.day
: a list of days to download data for (note that for the monthly data, day
must always be '01'. For the 10-daily average, valid days
are: '01', '11', '21' (therefore the day always refers to the start of the period the data represents).area
: (optional) Coordinates of a bounding box to download data for.type_of_record
: cdr
and/or icdr
. It is recommended to select both, to use whichever data is available (there is no overlap between ICDR and CDR of a major version).version
: Data record version, currently available: v201706
, v201812
, v201812
, v201912
, v202012
(as of June 2023; new versions are added regularly). Sub-versions indicate new data that is meant to replace the previous sub-versions (necessary e.g. due to processing errors). It is therefore recommended to pass all sub-versions and use the file with the highest version for any time stamp in case of duplicate time stamps.format
: Either zip
or tgz
. Archive format that holds the individual netcdf images.In order to download data from the Climate Data Store (CDS) via the API you need:
If you do not provide a valid KEY in the next cell, the following API request will fail. However, you can then still continue with the example data provided together with this notebook, which is the same data you would get if the query is not changed: i.e., monthly volumetric surface soil moisture from passive observations at version v202012 over Europe, from CDR & ICDR. The provided example data is stored in the repository as this notebook (./DATA/sm_monthly_passive_v202012.zip
). It is recommended to use monthly data, as some of the examples in this notebook will not work with daily or 10-daily images!
URL = "https://cds.climate.copernicus.eu/api/v2"
# If you have a valid key, set it in the following line:
KEY = "#######################################"
In this cell we submit a data request. This will only succeed if
Otherwise the data request will fail. However, this is not a problem for this tutorial. We will just fall back to the data provided with this notebook.
# Specify some filename and directories to handle the data
DATADIR = "./data_dir"
os.makedirs(DATADIR, exist_ok=True)
# Filename for the zip file downloaded from the CDS
download_zip_file = os.path.join(DATADIR, "sm_monthly_passive_v202012.zip")
# Filename for the netCDF file which contain the merged contents of the monthly files.
merged_netcdf_file = os.path.join(DATADIR, "sm_monthly_passive_v202012.nc")
# If we have not downloaded the zip file, we make the data request via the cdsapi
if not os.path.isfile(download_zip_file):
c = cdsapi.Client(url=URL, key=KEY)
c.retrieve(
"satellite-soil-moisture",
{
"variable": "volumetric_surface_soil_moisture",
"type_of_sensor": "passive",
"time_aggregation": "month_average", # required for examples in this notebook
"year": [str(y) for y in range(1991, 2023)],
"month": [f"{m:02}" for m in range(1, 13)],
"day": "01",
"area": [72, -11, 34, 40],
"type_of_record": ["cdr", "icdr"],
"version": ["v202012"],
"format": "zip",
},
download_zip_file,
)
else:
print(f"Using previously downloaded file: {download_zip_file}")
Using previously downloaded file: ./data_dir/sm_monthly_passive_v202012.zip
From the previous cell, we have a variable DATA_PATH
which points to a .zip archive (either newly downloaded or the one provided with this tutorial) containing the selected data from CDS as individual images. We use the library xarray to read these data, but first we have to extract them. In the next cell we extract all files from the downloaded .zip archive into a new folder. We do this using standard python libraries. This step might take up to 1 minute to process during first execution!
if not os.path.isfile(merged_netcdf_file):
# Unzip the data. The dataset is split in monthly files.
with zipfile.ZipFile(download_zip_file, "r") as zip_ref:
filelist = [os.path.join(DATADIR, f) for f in zip_ref.namelist()]
zip_ref.extractall(DATADIR)
# Ensure the filelist is in the correct order:
filelist = sorted(filelist)
# Merge all unpacked files into one.
# We do this in batches of 100 files to avoid issues with dask
new_filelist = []
for i in range(int(len(filelist) / 100.0) + 1):
temp_fname = os.path.join(DATADIR, f"temp_file_{i}.nc")
new_filelist += [temp_fname]
ds = xr.open_mfdataset(filelist[i * 100 : (i * 100) + 100])
ds.to_netcdf(temp_fname)
ds = xr.open_mfdataset(new_filelist)
ds.to_netcdf(merged_netcdf_file)
# Recursively delete unpacked data
for f in filelist:
os.remove(f)
for f in new_filelist:
os.remove(f)
print(f"Preprocessing done. Netcdf stack now available at: {merged_netcdf_file}")
else:
print(f"No preprocessing required. Netcdf stack already available at: {merged_netcdf_file}")
No preprocessing required. Netcdf stack already available at: ./data_dir/sm_monthly_passive_v202012.nc
We can then use the function xarray.open_mfdataset to load all extracted files and concatenate them along the 'time' dimension automatically. This way we get a 3-dimensional (longitude, latitude, time) data cube, that we store in a global variable DS
. In addition we extract the unit and valid range of the soil moisture variable from the netCDF metadata (SM_UNIT
and SM_RANGE
).
DS = xr.open_mfdataset(merged_netcdf_file)
SM_UNIT = DS["sm"].attrs["units"]
SM_RANGE = DS["sm"].attrs["valid_range"]
Finally we plot a table that shows the contents of DS
. You can see the dimensions of the data cube, coordinates assigned to each dimension, and data variables (each data point has coordinates assigned for each dimension). You can also explore the 'Attributes' field (i.e. metadata assigned to the data cube) by clicking on it.
DS
<xarray.Dataset> Dimensions: (lat: 152, lon: 204, time: 384) Coordinates: * lat (lat) float32 71.88 71.62 71.38 71.12 ... 34.62 34.38 34.12 * lon (lon) float32 -10.88 -10.62 -10.38 -10.12 ... 39.38 39.62 39.88 * time (time) datetime64[ns] 1991-01-01 1991-02-01 ... 2022-12-01 Data variables: sm (time, lat, lon) float32 dask.array<chunksize=(384, 152, 204), meta=np.ndarray> sensor (time, lat, lon) float32 dask.array<chunksize=(384, 152, 204), meta=np.ndarray> freqbandID (time, lat, lon) float32 dask.array<chunksize=(384, 152, 204), meta=np.ndarray> nobs (time, lat, lon) float32 dask.array<chunksize=(384, 152, 204), meta=np.ndarray> Attributes: (12/40) title: C3S Surface Soil Moisture merged PASSIVE Product institution: EODC (AUT); TU Wien (AUT); VanderSat B.V. (NL) contact: C3S_SM_Science@eodc.eu source: LPRMv6/SMMR/Nimbus 7 L3 Surface Soil Moisture... platform: Nimbus 7, DMSP, TRMM, AQUA, Coriolis, GCOM-W1... sensor: SMMR, SSM/I, TMI, AMSR-E, WindSat, AMSR2, SMO... ... ... id: C3S-SOILMOISTURE-L3S-SSMV-PASSIVE-MONTHLY-199... history: 2021-03-29T13:46:57.630282 mean calculated date_created: 2021-03-29T13:46:57Z time_coverage_start: 1990-12-31T12:00:00Z time_coverage_end: 1991-01-31T12:00:00Z time_coverage_duration: P1M
First we define some potential study areas that we can use in the following applications. Below you can find a list of bounding boxes, plus one 'focus point' in each bounding box. You can add your own study area to the end of the list. If you do, make sure to pass the coordinates in the correct order: Each line consists of:
(<STUDY_AREA_NAME>, ([<BBOX min. Lon.>, <BBOX max. Lon.>, <BBOX min. Lat.>, <BBOX max. Lat.>], [<POINT Lon.>, <POINT Lat.>]))
BBOXES = OrderedDict(
[
# (Name, ([min Lon., max Lon., min Lat., max Lat.], [Lon, Lat])),
("Balkans", ([16, 29, 36, 45], [24, 42])),
("Cental Europe", ([6, 22.5, 46, 51], [15, 49.5])),
("France", ([-4.8, 8.4, 42.3, 51], [4, 47])),
("Germany", ([6, 15, 47, 55], [9, 50])),
("Iberian Peninsula", ([-10, 3.4, 36, 44.4], [-5.4, 41.3])),
("Italy", ([7, 19.0, 36.7, 47.0], [12.8, 43.4])),
("S-UK & N-France", ([-5.65, 2.5, 48, 54], [0, 51])),
]
)
Now we select one of the study areas and store its name in the global STUDY_AREA
variable. The here chosen study area will be used in all of the following applications!
# Set global variable (to access in later examples). You can select any name from the list above.
# Don't forget to re-run the cell after selecting a different study area!
STUDY_AREA = "Germany"
print(f"Available study areas are: {list(BBOXES.keys())}")
print(f"The chosen study area for the following examples is: {STUDY_AREA}")
Available study areas are: ['Balkans', 'Cental Europe', 'France', 'Germany', 'Iberian Peninsula', 'Italy', 'S-UK & N-France'] The chosen study area for the following examples is: Germany
Now that we have a data cube (DS
) to work with, we can start by visualizing some of the soil moisture data. In this first example we will visualize the absolute monthly soil moisture values for a certain date. In addition, we will show the previously selected study area together with the data on a map.
Our data cube DS
is an xarray.Dataset, which comes with a lot of functionalities. For example we can create a simple map visualization of soil moisture and the number of observations for a certain date (note that nobs
is only available for monthly and 10-daily averaged data downloaded from CDS). Here we extract an image from the data cube using the xarray.Dataset.sel method.
In the next cell, you can select one of the study areas (from the list above) and a date to plot.
We then select the correct soil moisture (and observation count) data and create a map visualization.
Try:
DATE = "2022-05-01" # Time stamp to create plot for, you can choose a different date!
# Check whether the above selected study area and date are available
if STUDY_AREA not in BBOXES:
raise KeyError(f"Unknown STUDY_AREA: {STUDY_AREA}. Select one of {BBOXES.keys()}")
_dates = [str(pd.to_datetime(t).date()) for t in DS["time"].values]
if DATE not in _dates:
raise KeyError(f"{DATE} not found in data. " f"Select one of {_dates}")
# Create an empty figure with 2 subplots
fig, axs = plt.subplots(1, 2, figsize=(17, 5), subplot_kw={"projection": ccrs.PlateCarree()})
# Extract and plot soil moisture image for chosen date using xarray.Dataset.sel and Dataset.plot:
p_sm = (
DS["sm"]
.sel(time=DATE)
.plot(
transform=ccrs.PlateCarree(),
ax=axs[0],
cmap=CM_SM,
cbar_kwargs={"label": f"Soil Moisture [{SM_UNIT}]"},
)
)
axs[0].set_title(f"{DATE} - Soil Moisture")
# Extract and plot 'nobs' image for chosen date:
if "nobs" in DS.variables:
# Note: nobs is only available for monthly and 10-daily data
p_obs = (
DS["nobs"]
.sel(time=DATE)
.plot(
transform=ccrs.PlateCarree(),
ax=axs[1],
vmax=31,
vmin=0,
cmap=plt.get_cmap("YlGnBu"),
cbar_kwargs={"label": "Days with valid observations"},
)
)
axs[1].set_title(f"{DATE} - Data coverage")
else:
p_obs = None
bbox = BBOXES[STUDY_AREA][0]
point = BBOXES[STUDY_AREA][1]
# Add basemape features:
for p in [p_sm, p_obs]:
if p is None:
continue
p.axes.add_feature(cartopy.feature.LAND, zorder=0, facecolor="gray")
p.axes.coastlines()
# Add focus point of study area to first map:
axs[0].plot(
[point[0]],
[point[1]],
color="red",
marker="X",
markersize=10,
transform=ccrs.PlateCarree(),
)
# Add study area bounding box to first map:
axs[0].plot(
[bbox[0], bbox[0], bbox[1], bbox[1], bbox[0]],
[bbox[2], bbox[3], bbox[3], bbox[2], bbox[2]],
color="red",
linewidth=3,
transform=ccrs.PlateCarree(),
)
# Draw grid lines and labels for both maps:
for ax in axs:
if ax is not None:
gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True, alpha=0.25)
gl.top_labels, gl.right_labels = False, False
# Add study area legend
axs[0].legend(
handles=[
Line2D([0], [0], color="r", lw=4, label=f"{STUDY_AREA} Study Area"),
Line2D([0], [0], lw=0, marker="x", color="r", label=f"{STUDY_AREA} Focus Point"),
]
)
<matplotlib.legend.Legend at 0x7fc7315aac10>
When looking at images of different dates, you can see data gaps (gray areas; especially in winter, mountainous regions and high latitude) and areas of high (blue) and low (brown) soil moisture content (left map). It can be seen that the number of observations is much larger in later periods of the record than in earlier ones due to the larger number of available satellites (right map).
In the following two examples we use the study area selected above. We use the chosen 'focus point' assigned to the study area (marked by the red X in the left map of the previous example) to extract a time series from the loaded stack at this location. We then compute the climatological mean (and standard deviation) for the chosen time series using the selected reference period (baseline). Finally, we subtract the climatology from the absolute soil moisture to derive a time series of anomalies. Anomalies therefore indicate the deviation of a single observation from the average (normal) conditions. A positive anomaly can be interpreted as "wetter than usual" soil moisture conditions, while a negative anomaly indicates "drier than usual" states.
There are different ways to express anomalies. For example:
Absolute Anomalies: Derived using the difference between the absolute values for a month ($i$) and year ($k$) ($SM_{k,i}$) and the climatology for the same month ($SM_i$). Same unit as the input data.
Relative Anomalies: The anomalies are expressed relative to the climatological mean for a month ($SM_i$), i.e. in % above / below the expected conditions.
Z-Scores: Z-scores are a way of standardizing values from different normal distributions. In the case of soil moisture, the spread of values between months can vary, e.g. due to differences in data coverage (i.e. a different distribution for values in summer than for those in winter is expected). This can affect the computed climatology and therefore the derived anomalies. By taking into account the standard deviation ($\sigma$) of samples used to compute the climatologies, we can normalize the anomalies and exclude this effect. Z-scores therefore express the number of standard deviations a value is above / below the expected value for a month.
The next cell contains the application.
In the first part, we extract a time series for the chosen 'focus point' of the study area (STUDY_AREA
variable, see location in the map of the previous example; try out a different study area!).
We then select data for the chosen baseline period from the extracted time series to compute the climatology. You can try a different reference period by changing the BASELINE_YEARS
variable. The first value is the start year of the reference period, the second values is the last year that is considered when computing the climatological reference.
Using the formulas from above, we then compute the three anomaly metrics for the extracted time series and store them in a pandas DataFrame (columns 'abs_anomaly', 'rel_anomaly' and 'z-score').
In the last part of the application we create the plots. One to show the extracted soil moisture time series, one for the climatology - the mean and standard deviation for each month in the chosen baseline period, and one for each anomaly metric derived from the extracted time series and climatology.
Try:
BASELINE_YEARS = (1991, 2020) # first and last year to consider for climatology, can be changed.
# Extract data at location at the 'focus point' of the chosen study area:
lon, lat = float(BBOXES[STUDY_AREA][1][0]), float(BBOXES[STUDY_AREA][1][1])
ts = DS["sm"].sel(lon=lon, lat=lat, method="nearest").to_pandas()
# Compute all metrics (climatology, abs. / rel. anomalies, z-scores):
clim_data = ts.loc[f"{BASELINE_YEARS[0]}-01-01":f"{BASELINE_YEARS[1]}-12-31"]
clim_std = pd.Series(clim_data.groupby(clim_data.index.month).std(), name="climatology_std")
clim_mean = pd.Series(clim_data.groupby(clim_data.index.month).mean(), name="climatology")
ts = pd.DataFrame(ts, columns=["sm"]).join(on=ts.index.month, other=clim_mean)
ts["climatology_std"] = ts.join(on=ts.index.month, other=clim_std)["climatology_std"]
ts["abs_anomaly"] = ts["sm"] - ts["climatology"]
ts["rel_anomaly"] = (ts["sm"] - ts["climatology"]) / ts["climatology"] * 100
ts["z_score"] = (ts["sm"] - ts["climatology"]) / ts["climatology_std"]
# Generate three time series plots (absolute SM, climatology, all 3 anomaly metrics):
fig, axs = plt.subplots(5, 1, figsize=(10, 12))
ts["sm"].plot(
ax=axs[0],
title=f"Soil Moisture at focus point of `{STUDY_AREA}` study area "
f"(Lon: {lon}° W, Lat: {lat}° N)",
ylabel=f"SM $[{SM_UNIT}]$",
xlabel="Time [year]",
)
for i, g in clim_data.groupby(clim_data.index.year):
axs[1].plot(range(1, 13), g.values, alpha=0.2)
clim_mean.plot(
ax=axs[1],
color="blue",
title=f"Soil Moisture Climatology at Lon: {lon}° W, Lat: {lat}° N",
ylabel=f"SM $[{SM_UNIT}]$",
label="mean",
)
clim_std.plot(ax=axs[1], label="std.dev. $\sigma$", xlabel="Time [month]")
axs[1].legend()
metrics = ["Absolute Anomalies", "Relative Anomalies", "Z-Scores"]
columns = ["abs_anomaly", "rel_anomaly", "z_score"]
ylabels = [f"Anomaly $[{SM_UNIT}]$", "Anomaly $[\%]$", "Z-score $[\sigma]$"]
for i, (metric, col, ylabel) in enumerate(zip(metrics, columns, ylabels), start=2):
axs[i].axhline(0, color="k")
axs[i].fill_between(ts[col].index, ts[col].values, where=ts[col].values >= 0, color="blue")
axs[i].fill_between(ts[col].index, ts[col].values, where=ts[col].values < 0, color="red")
axs[i].set_ylabel(ylabel)
axs[i].set_xlabel("Time [year]")
axs[i].set_title(f"Soil Moisture {metric} at Lon: {lon}° W, Lat: {lat}° N")
plt.tight_layout()
Choosing a different study area will change the input data and plot anomalies for a different 'focus point'. You can also change the baseline period and see how it affects the climatology (and anomaly) computation. While the 3 time series look the same at first glance, there are some differences depending on the magnitude of the climatology and the intra-annual soil moisture variance in a point.
We now compute the the anomalies for the whole image stack (not only on a time series basis as in the previous example). For this we use some of the functions provided by xarray to group data. For the climatology (reference) we use all data from 1991 to 2020 (BASELINE_YEARS
). This is the standard baseline period, but you can of course try a different period. We then select all (absolute) soil moisture values in this period and group them by their month (i.e. all January, February, ... values for all years in the baseline period) and compute the mean for each group. This way we get the CLIM
stack, which consists of 12 images (one for each month).
BASELINE_YEARS = (1991, 2020)
baseline_slice = slice(f"{BASELINE_YEARS[0]}-01-01", f"{BASELINE_YEARS[1]}-12-31")
CLIM = DS.sel(time=baseline_slice)["sm"].groupby(DS.sel(time=baseline_slice).time.dt.month).mean()
CLIM
<xarray.DataArray 'sm' (month: 12, lat: 152, lon: 204)> dask.array<stack, shape=(12, 152, 204), dtype=float32, chunksize=(1, 152, 204), chunktype=numpy.ndarray> Coordinates: * lat (lat) float32 71.88 71.62 71.38 71.12 ... 34.88 34.62 34.38 34.12 * lon (lon) float32 -10.88 -10.62 -10.38 -10.12 ... 39.38 39.62 39.88 * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12 Attributes: dtype: float32 units: m3 m-3 valid_range: [0. 1.] long_name: Volumetric Soil Moisture _CoordinateAxes: time lat lon
We can now use the climatology stack and compute the difference between each image of absolute soil moisture and the climatological mean of the same month. This way we compute the anomalies for the whole data cube, without a time-consuming loop over all individual locations. We assign the result to a new variable in the DS
stack called sm_anomaly
.
DS["sm_anomaly"] = DS["sm"] - CLIM.sel(month=DS.time.dt.month).drop("month")
We can use the bounding box of the previously chosen study area to extract all soil moisture values in the area of interest. We then compute the mean over all locations in the bounding box to get a single time series for the study area (MEAN_TS
). Note that the coverage of C3S Soil Moisture varies over time (see the previous examples), which can also affect the value range of computed anomalies (standard deviation $\sigma$ is is usually larger for earlier periods).
subset = DS[["sm_anomaly"]].sel(
lon=slice(BBOXES[STUDY_AREA][0][0], BBOXES[STUDY_AREA][0][1]),
lat=slice(BBOXES[STUDY_AREA][0][3], BBOXES[STUDY_AREA][0][2]),
)
MEAN_TS = subset.mean(dim=["lat", "lon"]).to_pandas()
Now we can not only visualize the monthly anomalies from the processed anomaly stack, but also create a plot of annual mean values using the created time series for the whole study area bounding box. This is done for the year assigned to the variable YEAR
in the next cell.
Try:
YEAR
in the next cell).BASELINE_YEARS
in this application and see how the chosen period affects values in the map and time series.YEAR = 2018 # choose a year to plot on the map
# Combine all data in the study area to an annual mean time series
mean_ts_annual = pd.Series(MEAN_TS["sm_anomaly"]).resample("A").mean()
mean_ts_annual.index = mean_ts_annual.index.year
# Create a new figure with 2 subplots (1:2 width ratio) using matplotlib GridSpec
fig = plt.figure(figsize=(15, 4), constrained_layout=True)
gs = fig.add_gridspec(1, 3)
map_ax = fig.add_subplot(gs[0, 0], projection=ccrs.PlateCarree())
ts_ax = fig.add_subplot(gs[0, 1:])
# Select all anomalies for a year and compute the average, then create a map plot
DS["sm_anomaly"].sel(time=slice(f"{YEAR}-01-01", f"{YEAR}-12-31")).mean("time").plot(
transform=ccrs.PlateCarree(),
ax=map_ax,
cmap=plt.get_cmap("RdBu"),
cbar_kwargs={"label": f"Anomaly [{SM_UNIT}]"},
)
# Add features to map
map_ax.axes.add_feature(cartopy.feature.LAND, zorder=0, facecolor="gray")
map_ax.axes.coastlines()
map_ax.add_feature(cartopy.feature.BORDERS)
map_ax.set_title(f"{YEAR} - Soil Moisture Anomaly")
# Plot study area bounding box on map
bbox = BBOXES[STUDY_AREA][0]
map_ax.plot(
[bbox[0], bbox[0], bbox[1], bbox[1], bbox[0]],
[bbox[2], bbox[3], bbox[3], bbox[2], bbox[2]],
color="red",
linewidth=3,
transform=ccrs.PlateCarree(),
)
# Add grid lines to map
gl = map_ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True, alpha=0.25)
gl.right_labels, gl.top_labels = False, False
# Add time series of study area as bar plot to right subplot
bars = ts_ax.bar(mean_ts_annual.index, mean_ts_annual.values)
ts_ax.set_title(f"Annual conditions in `{STUDY_AREA}` study area")
ts_ax.set_xlabel("Year")
ts_ax.set_ylabel(f"Anomaly [{SM_UNIT}]")
ts_ax.axhline(y=0, color="black", linewidth=1)
# Highlight the bar in the right plot that contains the year chosen on the left side:
for i, bar in enumerate(bars.patches):
if mean_ts_annual.values[i] > 0:
bar.set_facecolor("blue")
else:
bar.set_facecolor("red")
if mean_ts_annual.index.values[i] == YEAR:
bar.set_edgecolor("k")
bar.set_linewidth(3)
We see an overall trend towards drier conditions in most regions. You can look at a different study area by selecting it in the first application. Don't forget to re-run all relevant cells in this chapter to use the new study area and accordingly update the plots!