Overview of compatibility testing¶

This notebook walks you through a workflow to check compatibility of a TiTiler-CMR deployment for a given Earthdata CMR dataset.

📚 In this notebook, you'll learn:

Use earthaccess to authenticate to NASA Earthdata and query the CMR catalog
Collect collection-level metadata (concept IDs, temporal range, spatial bounds)
Run check_titiler_cmr_compatibility against your TiTiler-CMR endpoint to validate whether a dataset can be successfully visualized and accessed via TiTiler-CMR.

Before you begin, you need:

An Earthdata login account: https://urs.earthdata.nasa.gov/
A valid netrc file with your Earthdata credentials or use interactive login.

For this walkthrough, we will use the public instance hosted by Open VEDA.

In [1]:

Copied!





import earthaccess
import xarray as xr

from datacube_benchmark.titiler import (
    DatasetParams,
    create_bbox_feature,
    check_titiler_cmr_compatibility,
)

endpoint = "https://staging.openveda.cloud/api/titiler-cmr"
import earthaccess
import xarray as xr

from datacube_benchmark.titiler import (
    DatasetParams,
    create_bbox_feature,
    check_titiler_cmr_compatibility,
)

endpoint = "https://staging.openveda.cloud/api/titiler-cmr"

Introduction to TiTiler-CMR¶

Titiler-CMR is a dynamic map tile server that provides on-demand access to Earth science data managed by NASA's Common Metadata Repository (CMR). It allows users to dynamically generate and serve map tiles from multidimensional data formats like NetCDF and HDF5.

To get started with TiTiler-CMR, you typically need to:

Choose a Titiler-CMR endpoint
Pick a CMR dataset (by concept ID)
Identify the assets/variables/bands you want to visualize
Define a temporal interval (start/end ISO range) and, if needed, a time step (e.g., daily).
Select a backend that matches your dataset’s structure

titiler-cmr supports two different backends:

xarray → for gridded/cloud-native datasets (e.g., NetCDF4/HDF5), typically exposed as variables.
rasterio → for COG/raster imagery-style datasets exposed as bands (optionally via a regex).

Here, we first explore a dataset using earthaccess to collect the necessary information such as concept_id, backend, and variable, then run a compatibility check using the check_titiler_cmr_compatibility helper function. If you already know your dataset, you can skip the exploration steps step 2 directly.

Step 1: Explore data with `earthaccess`¶

You can use earthaccess to search for dataset and inspect the individual granules used in your query. This helps you validate which files were accessed, their sizes, and the temporal range.

First you need to authenticate to Earthdata.

In [2]:

Copied!





# Authenticate to Earthdata
try:
    auth = earthaccess.login(strategy="environment")
except Exception:
    auth = earthaccess.login(strategy="interactive")
# Authenticate to Earthdata
try:
    auth = earthaccess.login(strategy="environment")
except Exception:
    auth = earthaccess.login(strategy="interactive")

Next, you can search for datasets using doi, keywords, temporal range, and spatial bounds.

In [3]:

Copied!





datasets = earthaccess.search_datasets(doi="10.5067/GHGMR-4FJ04")
ds = datasets[0]

concept_id = ds["meta"]["concept-id"]
print("Concept-Id: ", concept_id)
print("Abstract:", ds["umm"]["Abstract"])
datasets = earthaccess.search_datasets(doi="10.5067/GHGMR-4FJ04")
ds = datasets[0]

concept_id = ds["meta"]["concept-id"]
print("Concept-Id: ", concept_id)
print("Abstract:", ds["umm"]["Abstract"])

Concept-Id:  C1996881146-POCLOUD
Abstract: A Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis produced as a retrospective dataset (four day latency) and near-real-time dataset (one day latency) at the JPL Physical Oceanography DAAC using wavelets as basis functions in an optimal interpolation approach on a global 0.01 degree grid. The version 4 Multiscale Ultrahigh Resolution (MUR) L4 analysis is based upon nighttime GHRSST L2P skin and subskin SST observations from several instruments including the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave Scanning Radiometer 2 on GCOM-W1, the Moderate Resolution Imaging Spectroradiometers (MODIS) on the NASA Aqua and Terra platforms, the US Navy microwave WindSat radiometer, the Advanced Very High Resolution Radiometer (AVHRR) on several NOAA satellites, and in situ SST observations from the NOAA iQuam project. The ice concentration data are from the archives at the EUMETSAT Ocean and Sea Ice Satellite Application Facility (OSI SAF) High Latitude Processing Center and are also used for an improved SST parameterization for the high-latitudes.  The dataset also contains additional variables for some granules including a SST anomaly derived from a MUR climatology and the temporal distance to the nearest IR measurement for each pixel.This dataset is funded by the NASA MEaSUREs program ( http://earthdata.nasa.gov/our-community/community-data-system-programs/measures-projects ), and created by a team led by Dr. Toshio M. Chin from JPL. It adheres to the GHRSST Data Processing Specification (GDS) version 2 format specifications. Use the file global metadata "history:" attribute to determine if a granule is near-realtime or retrospective.

Examine the granules¶

With a selected data collection, we'll now use earthaccess.search_data to find individual data granules within a specific temporal window.

In [4]:

Copied!





time_range = ("2024-10-12", "2024-10-13")

results = earthaccess.search_data(
    count=1,
    concept_id=concept_id,
    temporal=("2024-10-12", "2024-10-13"),
)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")

for g in results:
    start = g["umm"]["TemporalExtent"]["RangeDateTime"]["BeginningDateTime"]
    size = float(g["size"])  # or use g["granule_size_mb"]

    print(f"\n{start} → {size:.2f} MB")

    for link in g.data_links(access="external"):
        print(" ", link)
time_range = ("2024-10-12", "2024-10-13")

results = earthaccess.search_data(
    count=1,
    concept_id=concept_id,
    temporal=("2024-10-12", "2024-10-13"),
)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")

for g in results:
    start = g["umm"]["TemporalExtent"]["RangeDateTime"]["BeginningDateTime"]
    size = float(g["size"])  # or use g["granule_size_mb"]

    print(f"\n{start} → {size:.2f} MB")

    for link in g.data_links(access="external"):
        print(" ", link)

Found 1 granules between 2024-10-12 and 2024-10-13

2024-10-11T21:00:00.000Z → 707.34 MB
  https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20241012090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc

From the output above, the returned link ends with .nc, indicating a NetCDF file. We can open it directly with xarray using the authenticated HTTPS session from earthaccess and quickly list the variables (plus a peek at dimensions and coordinates).

In [5]:

Copied!





fs = earthaccess.get_fsspec_https_session()

ds = xr.open_dataset(
    fs.open(results[0].data_links(access="external")[0]),
    engine="h5netcdf",
    decode_timedelta=True,
)
data_vars = ds.data_vars
data_vars
fs = earthaccess.get_fsspec_https_session()

ds = xr.open_dataset(
    fs.open(results[0].data_links(access="external")[0]),
    engine="h5netcdf",
    decode_timedelta=True,
)
data_vars = ds.data_vars
data_vars

Out[5]:

Data variables:
    analysed_sst      (time, lat, lon) float64 5GB ...
    analysis_error    (time, lat, lon) float64 5GB ...
    mask              (time, lat, lon) float32 3GB ...
    sea_ice_fraction  (time, lat, lon) float64 5GB ...
    dt_1km_data       (time, lat, lon) timedelta64[ns] 5GB ...
    sst_anomaly       (time, lat, lon) float64 5GB ...

Now, that we know the concept_id, backend, and variable, we can run a quick compatibility check using check_titiler_cmr_compatibility() helper function.

Step 2: Check Compatibility¶

check_titiler_cmr_compatibility() helper function performs the following steps:

Validate the CMR collection and granule search
Resolve collection/granule metadata and fetch TileJSON
Determine how many time steps fall within the requested temporal range
Query the /timeseries/statistics endpoint for a small, bounded preview window to check if the dataset can be opened and processed with the selected backend.

The result is a summary of compatibility, tiling parameters, and dataset statistics.

In [6]:

Copied!





concept_id = "C2723754864-GES_DISC"
datetime_range = "2024-10-12T00:00:01Z/2024-10-12T23:59:59Z"
variable = "precipitation"

ds_xarray = DatasetParams(
    concept_id=concept_id,
    backend="xarray",
    datetime_range=datetime_range,
    variable=variable,
    step="P1D",
    temporal_mode="point",
)
concept_id = "C2723754864-GES_DISC"
datetime_range = "2024-10-12T00:00:01Z/2024-10-12T23:59:59Z"
variable = "precipitation"

ds_xarray = DatasetParams(
    concept_id=concept_id,
    backend="xarray",
    datetime_range=datetime_range,
    variable=variable,
    step="P1D",
    temporal_mode="point",
)

In [7]:

Copied!





compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_xarray,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_xarray,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 8 physical / 8 logical cores | RAM: 16.00 GiB
Dataset: C2723754864-GES_DISC (xarray)
Found 1 timesteps/granules from TileJSON
Using random bounds for compatibility check: [2.741770939582061, -86.93233148855214, 83.24021812957449, -46.68310789355593]
Statistics returned 1 timesteps
Compatibility: compatible

Now, we want to check the summary of data is valid:

In [8]:

Copied!

print(f"Statistics preview:\n{compat['statistics']}")
print(f"Statistics preview:\n{compat['statistics']}")

Statistics preview:
                       timestamp  min        max      mean         count  \
0  2024-10-12T00:00:00.000000000  0.0  36.904999  1.470654  324133.21875   

            sum       std  median  majority  minority   unique  valid_percent  \
0  476687.84375  3.734399     0.0       0.0     0.065  14219.0          100.0   

   masked_pixels  valid_pixels  percentile_2  percentile_98  
0            0.0      325624.0           0.0      14.860001

`rasterio` backend¶

Similar to the xarray example above, we can check compatibility for a CMR collection that is better suited for the rasterio backend.

In [9]:

Copied!





ds_hls_day = DatasetParams(
    concept_id="C2021957295-LPCLOUD",
    backend="rasterio",
    datetime_range="2024-07-01T00:00:00Z/2024-07-10T23:59:59Z",
    bands=["B05", "B04"],
    bands_regex="B[0-9][0-9]",
    step="P1D",
    temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")
ds_hls_day = DatasetParams(
    concept_id="C2021957295-LPCLOUD",
    backend="rasterio",
    datetime_range="2024-07-01T00:00:00Z/2024-07-10T23:59:59Z",
    bands=["B05", "B04"],
    bands_regex="B[0-9][0-9]",
    step="P1D",
    temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 8 physical / 8 logical cores | RAM: 16.00 GiB
Dataset: C2021957295-LPCLOUD (rasterio)
Found 1 timesteps/granules from TileJSON
Using random bounds for compatibility check: [-105.53889935418451, -46.63206063840639, -25.040452164192082, -6.3828370434101664]
~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~
URL: https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point
Error: 400 Bad Request
Body: {"detail":"The AOI for this request is too large for the /statistics endpoint for this dataset. Try again with either a smaller AOI"}
Statistics request failed: HTTPStatusError: Client error '400 Bad Request' for url 'https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
Compatibility: issues_detected

☝️ If your area of interest is too large, the API will return an “AOI is too large” error. Use the create_bbox_feature function to define a smaller bounding box before retrying.

In [10]:

Copied!





gulf_geometry = create_bbox_feature(
    -91.65432884883238, 47.86503396133904, -91.53842043960762, 47.9221313337365
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    geometry=gulf_geometry,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
gulf_geometry = create_bbox_feature(
    -91.65432884883238, 47.86503396133904, -91.53842043960762, 47.9221313337365
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    geometry=gulf_geometry,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 8 physical / 8 logical cores | RAM: 16.00 GiB
Dataset: C2021957295-LPCLOUD (rasterio)
Found 1 timesteps/granules from TileJSON
Statistics returned 0 timesteps
Compatibility: compatible

Alternatively, you can specify bounds_fraction to create a much smaller bounding box within the original bounds.

In [11]:

Copied!





compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    bounds_fraction=1e-5,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    bounds_fraction=1e-5,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 8 physical / 8 logical cores | RAM: 16.00 GiB
Dataset: C2021957295-LPCLOUD (rasterio)
Found 1 timesteps/granules from TileJSON
Using random bounds for compatibility check: [-129.466539636604, -10.179722642907745, -128.32811967894338, -9.610512664077437]
Statistics returned 0 timesteps
Compatibility: compatible

Conclusion¶

This notebook demonstrated how to use earthaccess to explore CMR datasets and validate their compatibility with a TiTiler-CMR deployment using the check_titiler_cmr_compatibility helper function.