Compatibility Testing Tool¶

This notebook walks you how to check compatibility of a given dataset with TiTiler-CMR.

In this notebook, you'll learn:

Use earthaccess to authenticate to NASA Earthdata and query the CMR catalog
Collect collection-level metadata (concept IDs, temporal range, spatial bounds)
Run check_titiler_cmr_compatibility against your TiTiler-CMR endpoint to validate whether a dataset can be successfully accessed via TiTiler-CMR.

Before you begin, you need:

An Earthdata login account: https://urs.earthdata.nasa.gov/
A valid netrc file with your Earthdata credentials or use interactive login.

For this walkthrough, we will use https://staging.openveda.cloud/api/titiler-cmr/.

In [1]:

Copied!





import earthaccess
import xarray as xr

from datacube_benchmark import (
    DatasetParams,
    create_bbox_feature,
    check_titiler_cmr_compatibility,
)

endpoint = "https://staging.openveda.cloud/api/titiler-cmr"
import earthaccess
import xarray as xr

from datacube_benchmark import (
    DatasetParams,
    create_bbox_feature,
    check_titiler_cmr_compatibility,
)

endpoint = "https://staging.openveda.cloud/api/titiler-cmr"

titiler-cmr supports two different backends:

xarray → for gridded/cloud-native datasets (e.g., NetCDF4/HDF5), typically exposed as variables.
rasterio → for COG/raster imagery-style datasets exposed as bands (optionally via a regex).

Here, we first explore a dataset using earthaccess to collect the necessary information such as concept_id, backend, and variable, then run a compatibility check using the check_titiler_cmr_compatibility helper function. If you already know your dataset, you can skip the exploration step.

Step 1: Explore data with `earthaccess`¶

You can use earthaccess to search for dataset and inspect the individual granules used in your query. This helps you validate which files were accessed, their sizes, and the temporal range.

First you need to authenticate to Earthdata.

In [2]:

Copied!





# Authenticate to Earthdata
try:
    auth = earthaccess.login(strategy="environment")
except Exception:
    auth = earthaccess.login(strategy="interactive")
# Authenticate to Earthdata
try:
    auth = earthaccess.login(strategy="environment")
except Exception:
    auth = earthaccess.login(strategy="interactive")

Next, you can search for datasets using concept_id, keywords, temporal range, and spatial bounds.

In [3]:

Copied!





datasets = earthaccess.search_datasets(concept_id="C1996881146-POCLOUD")
ds = datasets[0]

concept_id = ds["meta"]["concept-id"]
print("Concept-Id: ", concept_id)
print("Abstract:", ds["umm"]["Abstract"])
datasets = earthaccess.search_datasets(concept_id="C1996881146-POCLOUD")
ds = datasets[0]

concept_id = ds["meta"]["concept-id"]
print("Concept-Id: ", concept_id)
print("Abstract:", ds["umm"]["Abstract"])

Concept-Id: C1996881146-POCLOUD
Abstract: A Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis produced as a retrospective dataset (four day latency) and near-real-time dataset (one day latency) at the JPL Physical Oceanography DAAC using wavelets as basis functions in an optimal interpolation approach on a global 0.01 degree grid. The version 4 Multiscale Ultrahigh Resolution (MUR) L4 analysis is based upon nighttime GHRSST L2P skin and subskin SST observations from several instruments including the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave Scanning Radiometer 2 on GCOM-W1, the Moderate Resolution Imaging Spectroradiometers (MODIS) on the NASA Aqua and Terra platforms, the US Navy microwave WindSat radiometer, the Advanced Very High Resolution Radiometer (AVHRR) on several NOAA satellites, and in situ SST observations from the NOAA iQuam project. The ice concentration data are from the archives at the EUMETSAT Ocean and Sea Ice Satellite Application Facility (OSI SAF) High Latitude Processing Center and are also used for an improved SST parameterization for the high-latitudes. The dataset also contains additional variables for some granules including the SST anomaly (variable sst_anomaly) derived from a MUR climatology, and the temporal distance in hours to the nearest IR measurement for each pixel (variable dt_1km_data). Variable dt_1km_data first appears in the time series on October 4, 2015, while sst_anomaly starts July 23, 2019. This dataset was originally funded by the NASA MEaSUREs program (http://earthdata.nasa.gov/our-community/community-data-system-programs/measures-projects), and created by a team led by Dr. Toshio M. Chin from JPL. It adheres to the GHRSST Data Processing Specification (GDS) version 2 format specifications. Use the file global metadata "history:" attribute to determine if a granule is near-realtime or retrospective.

Examine the granules¶

With a selected data collection, we'll now use earthaccess.search_data to find individual data granules within a specific temporal window.

In [4]:

Copied!





time_range = ("2024-10-12", "2024-10-13")

results = earthaccess.search_data(
    count=1,
    concept_id=concept_id,
    temporal=("2024-10-12", "2024-10-13"),
)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")

for g in results:
    start = g["umm"]["TemporalExtent"]["RangeDateTime"]["BeginningDateTime"]
    size = float(g["size"])  # or use g["granule_size_mb"]

    print(f"\n{start} → {size:.2f} MB")

    for link in g.data_links(access="external"):
        print(" ", link)
time_range = ("2024-10-12", "2024-10-13")

results = earthaccess.search_data(
    count=1,
    concept_id=concept_id,
    temporal=("2024-10-12", "2024-10-13"),
)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")

for g in results:
    start = g["umm"]["TemporalExtent"]["RangeDateTime"]["BeginningDateTime"]
    size = float(g["size"])  # or use g["granule_size_mb"]

    print(f"\n{start} → {size:.2f} MB")

    for link in g.data_links(access="external"):
        print(" ", link)

Found 1 granules between 2024-10-12 and 2024-10-13

2024-10-11T21:00:00.000Z → 707.34 MB
  https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20241012090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc

From the output above, the returned link ends with .nc, indicating a NetCDF file. We can open it directly with xarray using the authenticated HTTPS session from earthaccess and quickly list the variables (plus a peek at dimensions and coordinates).

In [5]:

Copied!





fs = earthaccess.get_fsspec_https_session()

ds = xr.open_dataset(
    fs.open(results[0].data_links(access="external")[0]),
    engine="h5netcdf",
    decode_timedelta=True,
)
data_vars = ds.data_vars
data_vars
fs = earthaccess.get_fsspec_https_session()

ds = xr.open_dataset(
    fs.open(results[0].data_links(access="external")[0]),
    engine="h5netcdf",
    decode_timedelta=True,
)
data_vars = ds.data_vars
data_vars

Out[5]:

Data variables:
    analysed_sst      (time, lat, lon) float64 5GB ...
    analysis_error    (time, lat, lon) float64 5GB ...
    mask              (time, lat, lon) float32 3GB ...
    sea_ice_fraction  (time, lat, lon) float64 5GB ...
    dt_1km_data       (time, lat, lon) timedelta64[ns] 5GB ...
    sst_anomaly       (time, lat, lon) float64 5GB ...

Now, that we know the concept_id, backend, and variable, we can run a quick compatibility check using check_titiler_cmr_compatibility() helper function.

Step 2: Check Compatibility¶

check_titiler_cmr_compatibility() helper function performs the following steps:

Validate the CMR collection and granule search
Resolve collection/granule metadata and fetch TileJSON
Determine how many time steps fall within the requested temporal range
Query the /timeseries/statistics endpoint for a small, bounded preview window to check if the dataset can be opened and processed with the selected backend.

The result is a summary of compatibility, tiling parameters, and dataset statistics.

In [6]:

Copied!





concept_id = "C2723754864-GES_DISC"
datetime_range = "2024-10-12T00:00:01Z/2024-10-12T23:59:59Z"
variable = "precipitation"

ds_xarray = DatasetParams(
    concept_id=concept_id,
    backend="xarray",
    datetime_range=datetime_range,
    variable=variable,
    step="P1D",
    temporal_mode="point",
)
concept_id = "C2723754864-GES_DISC"
datetime_range = "2024-10-12T00:00:01Z/2024-10-12T23:59:59Z"
variable = "precipitation"

ds_xarray = DatasetParams(
    concept_id=concept_id,
    backend="xarray",
    datetime_range=datetime_range,
    variable=variable,
    step="P1D",
    temporal_mode="point",
)

In [7]:

Copied!





compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_xarray,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_xarray,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 2 physical / 4 logical cores | RAM: 15.62 GiB
Dataset: C2723754864-GES_DISC (xarray)

Found 1 timesteps/granules from TileJSON
Using random bounds for compatibility check: [93.37443877302621, -0.25278725006542757, 173.87288596301863, 39.996436344930785]

Statistics returned 1 timesteps
Compatibility: compatible

Now, we want to check the summary of data is valid:

In [8]:

Copied!

print(f"Statistics preview:\n{compat['statistics']}")
print(f"Statistics preview:\n{compat['statistics']}")

Statistics preview:
                       timestamp  min         max      mean      count  \
0  2024-10-12T00:00:00.000000000  0.0  271.910065  5.126765  324133.25   

           sum        std  median  majority  minority   unique  valid_percent  \
0  1661755.125  11.946534    0.44       0.0      0.07  31590.0          100.0   

   masked_pixels  valid_pixels  percentile_2  percentile_98  
0            0.0      324818.0           0.0      44.679996

`rasterio` backend¶

Similar to the xarray example above, we can check compatibility for a CMR collection that is better suited for the rasterio backend.

In [9]:

Copied!





ds_hls_day = DatasetParams(
    concept_id="C2021957295-LPCLOUD",
    backend="rasterio",
    datetime_range="2024-07-01T00:00:00Z/2024-07-10T23:59:59Z",
    bands=["B05", "B04"],
    bands_regex="B[0-9][0-9]",
    step="P1D",
    temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")
ds_hls_day = DatasetParams(
    concept_id="C2021957295-LPCLOUD",
    backend="rasterio",
    datetime_range="2024-07-01T00:00:00Z/2024-07-10T23:59:59Z",
    bands=["B05", "B04"],
    bands_regex="B[0-9][0-9]",
    step="P1D",
    temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    timeout_s=250.0,
)

print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 2 physical / 4 logical cores | RAM: 15.62 GiB
Dataset: C2021957295-LPCLOUD (rasterio)

Found 1 timesteps/granules from TileJSON
Using random bounds for compatibility check: [-141.12568175339308, -69.82540680805161, -60.627234563400656, -29.576183213055398]

~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~
URL: https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point
Error: 400 Bad Request
Body: {"detail":"The AOI for this request is too large for the /statistics endpoint for this dataset. Try again with either a smaller AOI"}
Statistics request failed: HTTPStatusError: Client error '400 Bad Request' for url 'https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
Compatibility: issues_detected

☝️ If your area of interest is too large, the API will return an “AOI is too large” error. Use the create_bbox_feature function to define a smaller bounding box before retrying.

In [10]:

Copied!





gulf_geometry = create_bbox_feature(
    -91.65432884883238, 47.86503396133904, -91.53842043960762, 47.9221313337365
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    geometry=gulf_geometry,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
gulf_geometry = create_bbox_feature(
    -91.65432884883238, 47.86503396133904, -91.53842043960762, 47.9221313337365
)
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    geometry=gulf_geometry,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 2 physical / 4 logical cores | RAM: 15.62 GiB
Dataset: C2021957295-LPCLOUD (rasterio)
Found 1 timesteps/granules from TileJSON

Statistics returned 0 timesteps
Compatibility: compatible

Alternatively, you can specify bounds_fraction to create a much smaller bounding box within the original bounds.

In [11]:

Copied!





compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    bounds_fraction=1e-5,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
compat = await check_titiler_cmr_compatibility(
    endpoint=endpoint,
    dataset=ds_hls_day,
    bounds_fraction=1e-5,
    timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")

=== TiTiler-CMR Compatibility Check ===
Client: 2 physical / 4 logical cores | RAM: 15.62 GiB
Dataset: C2021957295-LPCLOUD (rasterio)

Found 1 timesteps/granules from TileJSON
Using random bounds for compatibility check: [28.082308330028646, -88.91085211807454, 29.220728287689262, -88.34164213924423]

Statistics returned 0 timesteps
Compatibility: compatible

Conclusion¶

This notebook demonstrated how to use earthaccess to explore CMR datasets and validate their compatibility with a TiTiler-CMR deployment using the check_titiler_cmr_compatibility helper function.

Compatibility Testing Tool¶

Step 1: Explore data with earthaccess¶

Examine the granules¶

Step 2: Check Compatibility¶

rasterio backend¶

Conclusion¶

📚 Useful Resources¶

Step 1: Explore data with `earthaccess`¶

`rasterio` backend¶