Compatibility Testing Tool¶
This notebook walks you how to check compatibility of a given dataset with TiTiler-CMR.
See also How to use the Compatibility API.
In this notebook, you'll learn:
- Use
earthaccessto authenticate to NASA Earthdata and query the CMR catalog - Collect collection-level metadata (concept IDs, temporal range, spatial bounds)
- Run
check_titiler_cmr_compatibilityagainst your TiTiler-CMR endpoint to validate whether a dataset can be successfully accessed via TiTiler-CMR.
Before you begin, you need:
- An Earthdata login account: https://urs.earthdata.nasa.gov/
- A valid
netrcfile with your Earthdata credentials or use interactive login.
For this walkthrough, we will use https://staging.openveda.cloud/api/titiler-cmr/.
import earthaccess
import xarray as xr
from datacube_benchmark import (
DatasetParams,
create_bbox_feature,
check_titiler_cmr_compatibility,
)
endpoint = "https://staging.openveda.cloud/api/titiler-cmr"
titiler-cmr supports two different backends:
- xarray ā for gridded/cloud-native datasets (e.g., NetCDF4/HDF5), typically exposed as variables.
- rasterio ā for COG/raster imagery-style datasets exposed as bands (optionally via a regex).
Here, we first explore a dataset using earthaccess to collect the necessary information such as concept_id, backend, and variable, then run a compatibility check using the check_titiler_cmr_compatibility helper function. If you already know your dataset, you can skip the exploration step.
Step 1: Explore data with earthaccess¶
You can use earthaccess to search for dataset and inspect the individual granules used in your query. This helps you validate which files were accessed, their sizes, and the temporal range.
First you need to authenticate to Earthdata.
# Authenticate to Earthdata
try:
auth = earthaccess.login(strategy="environment")
except Exception:
auth = earthaccess.login(strategy="interactive")
Next, you can search for datasets using concept_id, keywords, temporal range, and spatial bounds.
datasets = earthaccess.search_datasets(concept_id="C1996881146-POCLOUD")
ds = datasets[0]
concept_id = ds["meta"]["concept-id"]
print("Concept-Id: ", concept_id)
print("Abstract:", ds["umm"]["Abstract"])
Concept-Id: C1996881146-POCLOUD Abstract: A Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis produced as a retrospective dataset (four day latency) and near-real-time dataset (one day latency) at the JPL Physical Oceanography DAAC using wavelets as basis functions in an optimal interpolation approach on a global 0.01 degree grid. The version 4 Multiscale Ultrahigh Resolution (MUR) L4 analysis is based upon nighttime GHRSST L2P skin and subskin SST observations from several instruments including the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave Scanning Radiometer 2 on GCOM-W1, the Moderate Resolution Imaging Spectroradiometers (MODIS) on the NASA Aqua and Terra platforms, the US Navy microwave WindSat radiometer, the Advanced Very High Resolution Radiometer (AVHRR) on several NOAA satellites, and in situ SST observations from the NOAA iQuam project. The ice concentration data are from the archives at the EUMETSAT Ocean and Sea Ice Satellite Application Facility (OSI SAF) High Latitude Processing Center and are also used for an improved SST parameterization for the high-latitudes. The dataset also contains additional variables for some granules including the SST anomaly (variable sst_anomaly) derived from a MUR climatology, and the temporal distance in hours to the nearest IR measurement for each pixel (variable dt_1km_data). Variable dt_1km_data first appears in the time series on October 4, 2015, while sst_anomaly starts July 23, 2019. This dataset was originally funded by the NASA MEaSUREs program (http://earthdata.nasa.gov/our-community/community-data-system-programs/measures-projects), and created by a team led by Dr. Toshio M. Chin from JPL. It adheres to the GHRSST Data Processing Specification (GDS) version 2 format specifications. Use the file global metadata "history:" attribute to determine if a granule is near-realtime or retrospective.
Examine the granules¶
With a selected data collection, we'll now use earthaccess.search_data to find individual data granules within a specific temporal window.
time_range = ("2024-10-12", "2024-10-13")
results = earthaccess.search_data(
count=1,
concept_id=concept_id,
temporal=("2024-10-12", "2024-10-13"),
)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")
for g in results:
start = g["umm"]["TemporalExtent"]["RangeDateTime"]["BeginningDateTime"]
size = float(g["size"]) # or use g["granule_size_mb"]
print(f"\n{start} ā {size:.2f} MB")
for link in g.data_links(access="external"):
print(" ", link)
Found 1 granules between 2024-10-12 and 2024-10-13 2024-10-11T21:00:00.000Z ā 707.34 MB https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20241012090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
From the output above, the returned link ends with .nc, indicating a NetCDF file. We can open it directly with xarray using the authenticated HTTPS session from earthaccess and quickly list the variables (plus a peek at dimensions and coordinates).
fs = earthaccess.get_fsspec_https_session()
ds = xr.open_dataset(
fs.open(results[0].data_links(access="external")[0]),
engine="h5netcdf",
decode_timedelta=True,
)
data_vars = ds.data_vars
data_vars
Data variables:
analysed_sst (time, lat, lon) float64 5GB ...
analysis_error (time, lat, lon) float64 5GB ...
mask (time, lat, lon) float32 3GB ...
sea_ice_fraction (time, lat, lon) float64 5GB ...
dt_1km_data (time, lat, lon) timedelta64[ns] 5GB ...
sst_anomaly (time, lat, lon) float64 5GB ...
Now, that we know the concept_id, backend, and variable, we can run a quick compatibility check using check_titiler_cmr_compatibility() helper function.
Step 2: Check Compatibility¶
check_titiler_cmr_compatibility() helper function performs the following steps:
- Validate the CMR collection and granule search
- Resolve collection/granule metadata and fetch TileJSON
- Determine how many time steps fall within the requested temporal range
- Query the
/timeseries/statisticsendpoint for a small, bounded preview window to check if the dataset can be opened and processed with the selected backend.
The result is a summary of compatibility, tiling parameters, and dataset statistics.
concept_id = "C2723754864-GES_DISC"
datetime_range = "2024-10-12T00:00:01Z/2024-10-12T23:59:59Z"
variable = "precipitation"
ds_xarray = DatasetParams(
concept_id=concept_id,
backend="xarray",
datetime_range=datetime_range,
variable=variable,
step="P1D",
temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_xarray,
timeout_s=250.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.62 GiB Dataset: C2723754864-GES_DISC (xarray)
Found 1 timesteps/granules from TileJSON Using random bounds for compatibility check: [93.37443877302621, -0.25278725006542757, 173.87288596301863, 39.996436344930785]
Statistics returned 1 timesteps Compatibility: compatible
Now, we want to check the summary of data is valid:
print(f"Statistics preview:\n{compat['statistics']}")
Statistics preview:
timestamp min max mean count \
0 2024-10-12T00:00:00.000000000 0.0 271.910065 5.126765 324133.25
sum std median majority minority unique valid_percent \
0 1661755.125 11.946534 0.44 0.0 0.07 31590.0 100.0
masked_pixels valid_pixels percentile_2 percentile_98
0 0.0 324818.0 0.0 44.679996
rasterio backend¶
Similar to the xarray example above, we can check compatibility for a CMR collection that is better suited for the rasterio backend.
ds_hls_day = DatasetParams(
concept_id="C2021957295-LPCLOUD",
backend="rasterio",
datetime_range="2024-07-01T00:00:00Z/2024-07-10T23:59:59Z",
bands=["B05", "B04"],
bands_regex="B[0-9][0-9]",
step="P1D",
temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
timeout_s=250.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.62 GiB Dataset: C2021957295-LPCLOUD (rasterio)
Found 1 timesteps/granules from TileJSON Using random bounds for compatibility check: [-141.12568175339308, -69.82540680805161, -60.627234563400656, -29.576183213055398]
~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~
URL: https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point
Error: 400 Bad Request
Body: {"detail":"The AOI for this request is too large for the /statistics endpoint for this dataset. Try again with either a smaller AOI"}
Statistics request failed: HTTPStatusError: Client error '400 Bad Request' for url 'https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
Compatibility: issues_detected
āļø If your area of interest is too large, the API will return an āAOI is too largeā error. Use the create_bbox_feature function to define a smaller bounding box before retrying.
gulf_geometry = create_bbox_feature(
-91.65432884883238, 47.86503396133904, -91.53842043960762, 47.9221313337365
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
geometry=gulf_geometry,
timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.62 GiB Dataset: C2021957295-LPCLOUD (rasterio) Found 1 timesteps/granules from TileJSON
Statistics returned 0 timesteps Compatibility: compatible
Alternatively, you can specify bounds_fraction to create a much smaller bounding box within the original bounds.
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
bounds_fraction=1e-5,
timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.62 GiB Dataset: C2021957295-LPCLOUD (rasterio)
Found 1 timesteps/granules from TileJSON Using random bounds for compatibility check: [28.082308330028646, -88.91085211807454, 29.220728287689262, -88.34164213924423]
Statistics returned 0 timesteps Compatibility: compatible