Overview of compatibility testing¶
This notebook walks you through a workflow to check compatibility of a TiTiler-CMR deployment for a given Earthdata CMR dataset.
š In this notebook, you'll learn:
- Use
earthaccess
to authenticate to NASA Earthdata and query the CMR catalog - Collect collection-level metadata (concept IDs, temporal range, spatial bounds)
- Run
check_titiler_cmr_compatibility
against your TiTiler-CMR endpoint to validate whether a dataset can be successfully visualized and accessed via TiTiler-CMR.
Before you begin, you need:
- An Earthdata login account: https://urs.earthdata.nasa.gov/
- A valid
netrc
file with your Earthdata credentials or use interactive login.
For this walkthrough, we will use the public instance hosted by Open VEDA.
import earthaccess
import xarray as xr
from datacube_benchmark.titiler import (
DatasetParams,
create_bbox_feature,
check_titiler_cmr_compatibility,
)
endpoint = "https://staging.openveda.cloud/api/titiler-cmr"
Introduction to TiTiler-CMR¶
Titiler-CMR
is a dynamic map tile server that provides on-demand access to Earth science data managed by NASA's Common Metadata Repository (CMR). It allows users to dynamically generate and serve map tiles from multidimensional data formats like NetCDF and HDF5.
To get started with TiTiler-CMR, you typically need to:
- Choose a Titiler-CMR endpoint
- Pick a CMR dataset (by concept ID)
- Identify the assets/variables/bands you want to visualize
- Define a temporal interval (
start/end
ISO range) and, if needed, a time step (e.g., daily). - Select a backend that matches your datasetās structure
titiler-cmr
supports two different backends:
- xarray ā for gridded/cloud-native datasets (e.g., NetCDF4/HDF5), typically exposed as variables.
- rasterio ā for COG/raster imagery-style datasets exposed as bands (optionally via a regex).
Here, we first explore a dataset using earthaccess
to collect the necessary information such as concept_id, backend, and variable, then run a compatibility check using the check_titiler_cmr_compatibility
helper function. If you already know your dataset, you can skip the exploration steps step 2 directly.
Step 1: Explore data with earthaccess
¶
You can use earthaccess
to search for dataset and inspect the individual granules used in your query. This helps you validate which files were accessed, their sizes, and the temporal range.
First you need to authenticate to Earthdata.
# Authenticate to Earthdata
try:
auth = earthaccess.login(strategy="environment")
except Exception:
auth = earthaccess.login(strategy="interactive")
Next, you can search for datasets using doi, keywords, temporal range, and spatial bounds.
datasets = earthaccess.search_datasets(doi="10.5067/GHGMR-4FJ04")
ds = datasets[0]
concept_id = ds["meta"]["concept-id"]
print("Concept-Id: ", concept_id)
print("Abstract:", ds["umm"]["Abstract"])
Concept-Id: C1996881146-POCLOUD Abstract: A Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis produced as a retrospective dataset (four day latency) and near-real-time dataset (one day latency) at the JPL Physical Oceanography DAAC using wavelets as basis functions in an optimal interpolation approach on a global 0.01 degree grid. The version 4 Multiscale Ultrahigh Resolution (MUR) L4 analysis is based upon nighttime GHRSST L2P skin and subskin SST observations from several instruments including the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave Scanning Radiometer 2 on GCOM-W1, the Moderate Resolution Imaging Spectroradiometers (MODIS) on the NASA Aqua and Terra platforms, the US Navy microwave WindSat radiometer, the Advanced Very High Resolution Radiometer (AVHRR) on several NOAA satellites, and in situ SST observations from the NOAA iQuam project. The ice concentration data are from the archives at the EUMETSAT Ocean and Sea Ice Satellite Application Facility (OSI SAF) High Latitude Processing Center and are also used for an improved SST parameterization for the high-latitudes. The dataset also contains additional variables for some granules including a SST anomaly derived from a MUR climatology and the temporal distance to the nearest IR measurement for each pixel.This dataset is funded by the NASA MEaSUREs program ( http://earthdata.nasa.gov/our-community/community-data-system-programs/measures-projects ), and created by a team led by Dr. Toshio M. Chin from JPL. It adheres to the GHRSST Data Processing Specification (GDS) version 2 format specifications. Use the file global metadata "history:" attribute to determine if a granule is near-realtime or retrospective.
Examine the granules¶
With a selected data collection, we'll now use earthaccess.search_data
to find individual data granules within a specific temporal window.
time_range = ("2024-10-12", "2024-10-13")
results = earthaccess.search_data(
count=1,
concept_id=concept_id,
temporal=("2024-10-12", "2024-10-13"),
)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")
for g in results:
start = g["umm"]["TemporalExtent"]["RangeDateTime"]["BeginningDateTime"]
size = float(g["size"]) # or use g["granule_size_mb"]
print(f"\n{start} ā {size:.2f} MB")
for link in g.data_links(access="external"):
print(" ", link)
Found 1 granules between 2024-10-12 and 2024-10-13 2024-10-11T21:00:00.000Z ā 707.34 MB https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20241012090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
From the output above, the returned link ends with .nc
, indicating a NetCDF file. We can open it directly with xarray using the authenticated HTTPS session from earthaccess
and quickly list the variables (plus a peek at dimensions and coordinates).
fs = earthaccess.get_fsspec_https_session()
ds = xr.open_dataset(
fs.open(results[0].data_links(access="external")[0]),
engine="h5netcdf",
decode_timedelta=True,
)
data_vars = ds.data_vars
data_vars
Data variables: analysed_sst (time, lat, lon) float64 5GB ... analysis_error (time, lat, lon) float64 5GB ... mask (time, lat, lon) float32 3GB ... sea_ice_fraction (time, lat, lon) float64 5GB ... dt_1km_data (time, lat, lon) timedelta64[ns] 5GB ... sst_anomaly (time, lat, lon) float64 5GB ...
Now, that we know the concept_id, backend, and variable, we can run a quick compatibility check using check_titiler_cmr_compatibility()
helper function.
Step 2: Check Compatibility¶
check_titiler_cmr_compatibility()
helper function performs the following steps:
- Validate the CMR collection and granule search
- Resolve collection/granule metadata and fetch TileJSON
- Determine how many time steps fall within the requested temporal range
- Query the
/timeseries/statistics
endpoint for a small, bounded preview window to check if the dataset can be opened and processed with the selected backend.
The result is a summary of compatibility, tiling parameters, and dataset statistics.
concept_id = "C2723754864-GES_DISC"
datetime_range = "2024-10-12T00:00:01Z/2024-10-12T23:59:59Z"
variable = "precipitation"
ds_xarray = DatasetParams(
concept_id=concept_id,
backend="xarray",
datetime_range=datetime_range,
variable=variable,
step="P1D",
temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_xarray,
timeout_s=250.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 8 physical / 8 logical cores | RAM: 16.00 GiB Dataset: C2723754864-GES_DISC (xarray) Found 1 timesteps/granules from TileJSON Using random bounds for compatibility check: [2.741770939582061, -86.93233148855214, 83.24021812957449, -46.68310789355593] Statistics returned 1 timesteps Compatibility: compatible
Now, we want to check the summary of data is valid:
print(f"Statistics preview:\n{compat['statistics']}")
Statistics preview: timestamp min max mean count \ 0 2024-10-12T00:00:00.000000000 0.0 36.904999 1.470654 324133.21875 sum std median majority minority unique valid_percent \ 0 476687.84375 3.734399 0.0 0.0 0.065 14219.0 100.0 masked_pixels valid_pixels percentile_2 percentile_98 0 0.0 325624.0 0.0 14.860001
rasterio
backend¶
Similar to the xarray
example above, we can check compatibility for a CMR collection that is better suited for the rasterio
backend.
ds_hls_day = DatasetParams(
concept_id="C2021957295-LPCLOUD",
backend="rasterio",
datetime_range="2024-07-01T00:00:00Z/2024-07-10T23:59:59Z",
bands=["B05", "B04"],
bands_regex="B[0-9][0-9]",
step="P1D",
temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
timeout_s=250.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 8 physical / 8 logical cores | RAM: 16.00 GiB Dataset: C2021957295-LPCLOUD (rasterio) Found 1 timesteps/granules from TileJSON Using random bounds for compatibility check: [-105.53889935418451, -46.63206063840639, -25.040452164192082, -6.3828370434101664] ~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~ URL: https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point Error: 400 Bad Request Body: {"detail":"The AOI for this request is too large for the /statistics endpoint for this dataset. Try again with either a smaller AOI"} Statistics request failed: HTTPStatusError: Client error '400 Bad Request' for url 'https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400 Compatibility: issues_detected
āļø If your area of interest is too large, the API will return an āAOI is too largeā error. Use the create_bbox_feature
function to define a smaller bounding box before retrying.
gulf_geometry = create_bbox_feature(
-91.65432884883238, 47.86503396133904, -91.53842043960762, 47.9221313337365
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
geometry=gulf_geometry,
timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 8 physical / 8 logical cores | RAM: 16.00 GiB Dataset: C2021957295-LPCLOUD (rasterio) Found 1 timesteps/granules from TileJSON Statistics returned 0 timesteps Compatibility: compatible
Alternatively, you can specify bounds_fraction
to create a much smaller bounding box within the original bounds.
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
bounds_fraction=1e-5,
timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 8 physical / 8 logical cores | RAM: 16.00 GiB Dataset: C2021957295-LPCLOUD (rasterio) Found 1 timesteps/granules from TileJSON Using random bounds for compatibility check: [-129.466539636604, -10.179722642907745, -128.32811967894338, -9.610512664077437] Statistics returned 0 timesteps Compatibility: compatible