Compatibility Testing Tool¶
This notebook walks you how to check compatibility of a given dataset with TiTiler-CMR.
See also How to use the Compatibility API.
In this notebook, you'll learn:
- Use
earthaccessto authenticate to NASA Earthdata and query the CMR catalog - Collect collection-level metadata (concept IDs, temporal range, spatial bounds)
- Run
check_titiler_cmr_compatibilityagainst your TiTiler-CMR endpoint to validate whether a dataset can be successfully accessed via TiTiler-CMR.
Before you begin, you need:
- An Earthdata login account: https://urs.earthdata.nasa.gov/
- A valid
netrcfile with your Earthdata credentials or use interactive login.
For this walkthrough, we will use https://staging.openveda.cloud/api/titiler-cmr/.
import earthaccess
import xarray as xr
from datacube_benchmark import (
DatasetParams,
create_bbox_feature,
check_titiler_cmr_compatibility,
)
endpoint = "https://staging.openveda.cloud/api/titiler-cmr"
titiler-cmr supports two different backends:
- xarray ā for gridded/cloud-native datasets (e.g., NetCDF4/HDF5), typically exposed as variables.
- rasterio ā for COG/raster imagery-style datasets exposed as bands (optionally via a regex).
Here, we first explore a dataset using earthaccess to collect the necessary information such as concept_id, backend, and variable, then run a compatibility check using the check_titiler_cmr_compatibility helper function. If you already know your dataset, you can skip the exploration step.
Step 1: Explore data with earthaccess¶
You can use earthaccess to search for dataset and inspect the individual granules used in your query. This helps you validate which files were accessed, their sizes, and the temporal range.
First you need to authenticate to Earthdata.
# Authenticate to Earthdata
try:
auth = earthaccess.login(strategy="environment")
except Exception:
auth = earthaccess.login(strategy="interactive")
--------------------------------------------------------------------------- OSError Traceback (most recent call last) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connection.py:204, in HTTPConnection._new_conn(self) 203 try: --> 204 sock = connection.create_connection( 205 (self._dns_host, self.port), 206 self.timeout, 207 source_address=self.source_address, 208 socket_options=self.socket_options, 209 ) 210 except socket.gaierror as e: File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/util/connection.py:85, in create_connection(address, timeout, source_address, socket_options) 84 try: ---> 85 raise err 86 finally: 87 # Break explicitly a reference cycle File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/util/connection.py:73, in create_connection(address, timeout, source_address, socket_options) 72 sock.bind(source_address) ---> 73 sock.connect(sa) 74 # Break explicitly a reference cycle OSError: [Errno 101] Network is unreachable The above exception was the direct cause of the following exception: NewConnectionError Traceback (most recent call last) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw) 786 # Make the request on the HTTPConnection object --> 787 response = self._make_request( 788 conn, 789 method, 790 url, 791 timeout=timeout_obj, 792 body=body, 793 headers=headers, 794 chunked=chunked, 795 retries=retries, 796 response_conn=response_conn, 797 preload_content=preload_content, 798 decode_content=decode_content, 799 **response_kw, 800 ) 802 # Everything went great! File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py:488, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length) 487 new_e = _wrap_proxy_error(new_e, conn.proxy.scheme) --> 488 raise new_e 490 # conn.request() calls http.client.*.request, not the method in 491 # urllib3.request. It also calls makefile (recv) on the socket. File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py:464, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length) 463 try: --> 464 self._validate_conn(conn) 465 except (SocketTimeout, BaseSSLError) as e: File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py:1093, in HTTPSConnectionPool._validate_conn(self, conn) 1092 if conn.is_closed: -> 1093 conn.connect() 1095 # TODO revise this, see https://github.com/urllib3/urllib3/issues/2791 File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connection.py:759, in HTTPSConnection.connect(self) 758 sock: socket.socket | ssl.SSLSocket --> 759 self.sock = sock = self._new_conn() 760 server_hostname: str = self.host File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connection.py:219, in HTTPConnection._new_conn(self) 218 except OSError as e: --> 219 raise NewConnectionError( 220 self, f"Failed to establish a new connection: {e}" 221 ) from e 223 sys.audit("http.client.connect", self, self.host, self.port) NewConnectionError: HTTPSConnection(host='urs.earthdata.nasa.gov', port=443): Failed to establish a new connection: [Errno 101] Network is unreachable The above exception was the direct cause of the following exception: MaxRetryError Traceback (most recent call last) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/requests/adapters.py:645, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 644 try: --> 645 resp = conn.urlopen( 646 method=request.method, 647 url=url, 648 body=request.body, 649 headers=request.headers, 650 redirect=False, 651 assert_same_host=False, 652 preload_content=False, 653 decode_content=False, 654 retries=self.max_retries, 655 timeout=timeout, 656 chunked=chunked, 657 ) 659 except (ProtocolError, OSError) as err: File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py:841, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw) 839 new_e = ProtocolError("Connection aborted.", new_e) --> 841 retries = retries.increment( 842 method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2] 843 ) 844 retries.sleep() File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/urllib3/util/retry.py:535, in Retry.increment(self, method, url, response, error, _pool, _stacktrace) 534 reason = error or ResponseError(cause) --> 535 raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] 537 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry) MaxRetryError: HTTPSConnectionPool(host='urs.earthdata.nasa.gov', port=443): Max retries exceeded with url: /api/users/find_or_create_token (Caused by NewConnectionError("HTTPSConnection(host='urs.earthdata.nasa.gov', port=443): Failed to establish a new connection: [Errno 101] Network is unreachable")) During handling of the above exception, another exception occurred: ConnectionError Traceback (most recent call last) Cell In[2], line 3 2 try: ----> 3 auth = earthaccess.login(strategy="environment") 4 except Exception: File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/api.py:373, in login(strategy, persist, system) 372 else: --> 373 earthaccess.__auth__.login( 374 strategy=strategy, 375 persist=persist, 376 system=system, 377 ) 378 if earthaccess.__auth__.authenticated: File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/auth.py:148, in Auth.login(self, strategy, persist, system) 147 elif strategy == "environment": --> 148 self._environment() 150 return self File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/auth.py:300, in Auth._environment(self) 299 logger.debug("Using environment variables for EDL") --> 300 return self._get_credentials(username, password, token) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/auth.py:314, in Auth._get_credentials(self, username, password, user_token) 313 self.password = password --> 314 token_resp = self._find_or_create_token() 316 if not (token_resp.ok): # type: ignore File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/auth.py:332, in Auth._find_or_create_token(self) 331 with self.get_session() as session: --> 332 return session.post( 333 self.EDL_FIND_OR_CREATE_TOKEN_URL, 334 headers={"Accept": "application/json"}, 335 timeout=10, 336 ) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/requests/sessions.py:640, in Session.post(self, url, data, json, **kwargs) 630 r"""Sends a POST request. Returns :class:`Response` object. 631 632 :param url: URL for the new :class:`Request` object. (...) 637 :rtype: requests.Response 638 """ --> 640 return self.request("POST", url, data=data, json=json, **kwargs) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/requests/sessions.py:592, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 591 send_kwargs.update(settings) --> 592 resp = self.send(prep, **send_kwargs) 594 return resp File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/requests/sessions.py:706, in Session.send(self, request, **kwargs) 705 # Send the request --> 706 r = adapter.send(request, **kwargs) 708 # Total elapsed time of the request (approximately) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/requests/adapters.py:678, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 676 raise SSLError(e, request=request) --> 678 raise ConnectionError(e, request=request) 680 except ClosedPoolError as e: ConnectionError: HTTPSConnectionPool(host='urs.earthdata.nasa.gov', port=443): Max retries exceeded with url: /api/users/find_or_create_token (Caused by NewConnectionError("HTTPSConnection(host='urs.earthdata.nasa.gov', port=443): Failed to establish a new connection: [Errno 101] Network is unreachable")) During handling of the above exception, another exception occurred: StdinNotImplementedError Traceback (most recent call last) Cell In[2], line 5 3 auth = earthaccess.login(strategy="environment") 4 except Exception: ----> 5 auth = earthaccess.login(strategy="interactive") File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/api.py:373, in login(strategy, persist, system) 371 break 372 else: --> 373 earthaccess.__auth__.login( 374 strategy=strategy, 375 persist=persist, 376 system=system, 377 ) 378 if earthaccess.__auth__.authenticated: 379 earthaccess.__store__ = Store(earthaccess.__auth__) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/auth.py:144, in Auth.login(self, strategy, persist, system) 141 return self 143 if strategy == "interactive": --> 144 self._interactive(persist) 145 elif strategy == "netrc": 146 self._netrc() File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/auth.py:241, in Auth._interactive(self, persist_credentials) 237 def _interactive( 238 self, 239 persist_credentials: bool = False, 240 ) -> bool: --> 241 username = input("Enter your Earthdata Login username: ") 242 password = getpass.getpass(prompt="Enter your Earthdata password: ") 243 authenticated = self._get_credentials(username, password, None) File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/ipykernel/kernelbase.py:1274, in Kernel.raw_input(self, prompt) 1272 if not self._allow_stdin: 1273 msg = "raw_input was called, but this frontend does not support input requests." -> 1274 raise StdinNotImplementedError(msg) 1275 return self._input_request( 1276 str(prompt), 1277 self._parent_ident["shell"], 1278 self.get_parent("shell"), 1279 password=False, 1280 ) StdinNotImplementedError: raw_input was called, but this frontend does not support input requests.
Next, you can search for datasets using concept_id, keywords, temporal range, and spatial bounds.
datasets = earthaccess.search_datasets(concept_id="C1996881146-POCLOUD")
ds = datasets[0]
concept_id = ds["meta"]["concept-id"]
print("Concept-Id: ", concept_id)
print("Abstract:", ds["umm"]["Abstract"])
Concept-Id: C1996881146-POCLOUD Abstract: A Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis produced as a retrospective dataset (four day latency) and near-real-time dataset (one day latency) at the JPL Physical Oceanography DAAC using wavelets as basis functions in an optimal interpolation approach on a global 0.01 degree grid. The version 4 Multiscale Ultrahigh Resolution (MUR) L4 analysis is based upon nighttime GHRSST L2P skin and subskin SST observations from several instruments including the NASA Advanced Microwave Scanning Radiometer-EOS (AMSR-E), the JAXA Advanced Microwave Scanning Radiometer 2 on GCOM-W1, the Moderate Resolution Imaging Spectroradiometers (MODIS) on the NASA Aqua and Terra platforms, the US Navy microwave WindSat radiometer, the Advanced Very High Resolution Radiometer (AVHRR) on several NOAA satellites, and in situ SST observations from the NOAA iQuam project. The ice concentration data are from the archives at the EUMETSAT Ocean and Sea Ice Satellite Application Facility (OSI SAF) High Latitude Processing Center and are also used for an improved SST parameterization for the high-latitudes. The dataset also contains additional variables for some granules including the SST anomaly (variable sst_anomaly) derived from a MUR climatology, and the temporal distance in hours to the nearest IR measurement for each pixel (variable dt_1km_data). Variable dt_1km_data first appears in the time series on October 4, 2015, while sst_anomaly starts July 23, 2019. This dataset was originally funded by the NASA MEaSUREs program (http://earthdata.nasa.gov/our-community/community-data-system-programs/measures-projects), and created by a team led by Dr. Toshio M. Chin from JPL. It adheres to the GHRSST Data Processing Specification (GDS) version 2 format specifications. Use the file global metadata "history:" attribute to determine if a granule is near-realtime or retrospective.
Examine the granules¶
With a selected data collection, we'll now use earthaccess.search_data to find individual data granules within a specific temporal window.
time_range = ("2024-10-12", "2024-10-13")
results = earthaccess.search_data(
count=1,
concept_id=concept_id,
temporal=("2024-10-12", "2024-10-13"),
)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")
for g in results:
start = g["umm"]["TemporalExtent"]["RangeDateTime"]["BeginningDateTime"]
size = float(g["size"]) # or use g["granule_size_mb"]
print(f"\n{start} ā {size:.2f} MB")
for link in g.data_links(access="external"):
print(" ", link)
Found 1 granules between 2024-10-12 and 2024-10-13 2024-10-11T21:00:00.000Z ā 707.34 MB https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20241012090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
From the output above, the returned link ends with .nc, indicating a NetCDF file. We can open it directly with xarray using the authenticated HTTPS session from earthaccess and quickly list the variables (plus a peek at dimensions and coordinates).
fs = earthaccess.get_fsspec_https_session()
ds = xr.open_dataset(
fs.open(results[0].data_links(access="external")[0]),
engine="h5netcdf",
decode_timedelta=True,
)
data_vars = ds.data_vars
data_vars
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[5], line 1 ----> 1 fs = earthaccess.get_fsspec_https_session() 3 ds = xr.open_dataset( 4 fs.open(results[0].data_links(access="external")[0]), 5 engine="h5netcdf", 6 decode_timedelta=True, 7 ) 8 data_vars = ds.data_vars File ~/work/titiler-cmr/titiler-cmr/.venv/lib/python3.12/site-packages/earthaccess/api.py:553, in get_fsspec_https_session() 537 def get_fsspec_https_session() -> AbstractFileSystem: 538 """Returns a fsspec session that can be used to access datafiles across many different DAACs. 539 540 Returns: (...) 551 ``` 552 """ --> 553 session = earthaccess.__store__.get_fsspec_session() 554 return session AttributeError: 'NoneType' object has no attribute 'get_fsspec_session'
Now, that we know the concept_id, backend, and variable, we can run a quick compatibility check using check_titiler_cmr_compatibility() helper function.
Step 2: Check Compatibility¶
check_titiler_cmr_compatibility() helper function performs the following steps:
- Validate the CMR collection and granule search
- Resolve collection/granule metadata and fetch TileJSON
- Determine how many time steps fall within the requested temporal range
- Query the
/timeseries/statisticsendpoint for a small, bounded preview window to check if the dataset can be opened and processed with the selected backend.
The result is a summary of compatibility, tiling parameters, and dataset statistics.
concept_id = "C2723754864-GES_DISC"
datetime_range = "2024-10-12T00:00:01Z/2024-10-12T23:59:59Z"
variable = "precipitation"
ds_xarray = DatasetParams(
concept_id=concept_id,
backend="xarray",
datetime_range=datetime_range,
variable=variable,
step="P1D",
temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_xarray,
timeout_s=250.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.61 GiB Dataset: C2723754864-GES_DISC (xarray)
~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~ URL: https://staging.openveda.cloud/api/titiler-cmr/WebMercatorQuad/tilejson.json?concept_id=C2723754864-GES_DISC&backend=xarray&datetime=2024-10-12T00%3A00%3A01Z%2F2024-10-12T23%3A59%3A59Z&variable=precipitation&step=P1D&temporal_mode=point Error: 301 Moved Permanently Body: HTTP 301 error during compatibility check Statistics request failed: HTTP 301: Compatibility: issues_detected
Now, we want to check the summary of data is valid:
print(f"Statistics preview:\n{compat['statistics']}")
Statistics preview: Empty DataFrame Columns: [] Index: []
rasterio backend¶
Similar to the xarray example above, we can check compatibility for a CMR collection that is better suited for the rasterio backend.
ds_hls_day = DatasetParams(
concept_id="C2021957295-LPCLOUD",
backend="rasterio",
datetime_range="2024-07-01T00:00:00Z/2024-07-10T23:59:59Z",
bands=["B05", "B04"],
bands_regex="B[0-9][0-9]",
step="P1D",
temporal_mode="point",
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
timeout_s=250.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.61 GiB Dataset: C2021957295-LPCLOUD (rasterio) ~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~ URL: https://staging.openveda.cloud/api/titiler-cmr/WebMercatorQuad/tilejson.json?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point Error: 301 Moved Permanently Body: HTTP 301 error during compatibility check Statistics request failed: HTTP 301: Compatibility: issues_detected
āļø If your area of interest is too large, the API will return an āAOI is too largeā error. Use the create_bbox_feature function to define a smaller bounding box before retrying.
gulf_geometry = create_bbox_feature(
-91.65432884883238, 47.86503396133904, -91.53842043960762, 47.9221313337365
)
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
geometry=gulf_geometry,
timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.61 GiB Dataset: C2021957295-LPCLOUD (rasterio)
~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~ URL: https://staging.openveda.cloud/api/titiler-cmr/WebMercatorQuad/tilejson.json?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point Error: 301 Moved Permanently Body: HTTP 301 error during compatibility check Statistics request failed: HTTP 301: Compatibility: issues_detected
Alternatively, you can specify bounds_fraction to create a much smaller bounding box within the original bounds.
compat = await check_titiler_cmr_compatibility(
endpoint=endpoint,
dataset=ds_hls_day,
bounds_fraction=1e-5,
timeout_s=300.0,
)
print(f"Compatibility: {compat['compatibility']}")
=== TiTiler-CMR Compatibility Check === Client: 2 physical / 4 logical cores | RAM: 15.61 GiB Dataset: C2021957295-LPCLOUD (rasterio)
~~~~~~~~~~~~~~~~ ERROR JSON REQUEST ~~~~~~~~~~~~~~~~ URL: https://staging.openveda.cloud/api/titiler-cmr/WebMercatorQuad/tilejson.json?concept_id=C2021957295-LPCLOUD&backend=rasterio&datetime=2024-07-01T00%3A00%3A00Z%2F2024-07-10T23%3A59%3A59Z&bands=B04&bands_regex=B%5B0-9%5D%5B0-9%5D&step=P1D&temporal_mode=point Error: 301 Moved Permanently Body: HTTP 301 error during compatibility check Statistics request failed: HTTP 301: Compatibility: issues_detected