Benchmarking tile generation¶
This notebook walks through benchmarking performance of TiTiler-CMR for a given Earthdata CMR dataset.
In this notebook, you'll learn:
- How to benchmark tile rendering performance across zoom levels
- What factors impact tile generation performance in TiTiler-CMR.
import earthaccess
from matplotlib.lines import Line2D
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datacube_benchmark import (
DatasetParams,
benchmark_viewport,
tiling_benchmark_summary,
)
TiTiler-CMR¶
For this walkthrough, we will use the Titiler-CMR staging endpoint https://staging.openveda.cloud/api/titiler-cmr/.
Titiler-CMR supports two different backends:
- xarray → for gridded/cloud-native datasets (e.g., NetCDF4/HDF5/GRIB), typically exposed as variables.
- rasterio → for COG/raster imagery-style datasets exposed as bands (optionally via a regex).
Tip: Explore data granules with
earthaccessYou can use
earthaccessto search and inspect the individual granules used in your query. This helps you validate which files were accessed, their sizes, and the temporal range.
concept_id = "C2723754864-GES_DISC"
time_range = ("2022-03-01T00:00:01Z", "2022-03-02T23:59:59Z")
# Authenticate if needed
earthaccess.login() # or use "interactive" if needed
results = earthaccess.search_data(concept_id=concept_id, temporal=time_range)
print(f"Found {len(results)} granules between {time_range[0]} and {time_range[1]}")
Found 2 granules between 2022-03-01T00:00:01Z and 2022-03-02T23:59:59Z
Tile Generation Benchmarking¶
We are going to measure the tile generation performance across different zoom levels using titiler_cmr_benchmark.benchmark_viewport function.
This function simulates the load of a typical viewport render in a slippy map, where multiple adjacent tiles must be fetched in parallel to draw a single view.
Step 1: Specify API Parameters¶
First, we have to define the parameters for the CMR dataset we want to benchmark. The DatasetParams class encapsulates all the necessary information to interact with a specific dataset via TiTiler-CMR.
Note this first example is for a dataset where each file has global coverage. This is important to evaluating the results of benchmarking.
endpoint = "https://staging.openveda.cloud/api/titiler-cmr"
ds_xarray = DatasetParams(
concept_id="C2723754864-GES_DISC",
backend="xarray",
datetime_range="2022-04-01T00:00:01Z/2022-04-02T23:59:59Z",
variable="precipitation",
step="P1D",
temporal_mode="point",
)
Step 2: Specifiy Zoom Levels¶
Zoom levels determine the detail and extent of the area being rendered. At lower zoom levels, a single tile covers a large spatial area and may intersect many granules. This usually translates to more I/O, more resampling/mosaic work, higher latency, and higher chance of timeouts errors.
As you increase the zoom level, each tile covers a smaller area, reducing the number of intersecting granules and the amount of work per request.
We'll define a range of zoom levels to test to see how performance varies.
min_zoom = 3
max_zoom = 20
# Define the viewport parameters
viewport_width = 4
viewport_height = 4
lng = 25.0
lat = 29.0
Step 3: Run the Benchmark¶
Now, let's run the benchmark across the specified zoom levels and visualize the results.
Under the hood, benchmark_viewport computes the center tile for each zoom level, selects its neighboring tiles to approximate a viewport, and requests them concurrently from the TiTiler-CMR endpoint. This function returns a pandas.DataFrame containing the response times for each tile request.
df_viewport = await benchmark_viewport(
endpoint=endpoint,
dataset=ds_xarray,
lng=lng,
lat=lat,
viewport_width=viewport_width,
viewport_height=viewport_height,
min_zoom=min_zoom,
max_zoom=max_zoom,
timeout_s=60.0,
)
=== TiTiler-CMR Tile Benchmark === Client: 2 physical / 4 logical cores | RAM: 15.62 GiB Dataset: C2723754864-GES_DISC (xarray) Query params: 8 parameters concept_id: C2723754864-GES_DISC backend: xarray datetime: 2022-04-01T00:00:01Z/2022-04-02T23:59:59Z variable: precipitation step: P1D temporal_mode: point tile_format: png tile_scale: 1
Total execution time: 23.072s
df_viewport.head()
| zoom | x | y | status_code | ok | no_data | is_error | response_time_sec | content_type | response_size_bytes | url | error_text | total_run_elapsed_s | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 2 | 1 | 200 | True | False | False | 1.461705 | image/png | 694 | https://staging.openveda.cloud/api/titiler-cmr... | None | 23.071837 |
| 1 | 3 | 3 | 1 | 200 | True | False | False | 1.635859 | image/png | 694 | https://staging.openveda.cloud/api/titiler-cmr... | None | 23.071837 |
| 2 | 3 | 4 | 1 | 200 | True | False | False | 1.667570 | image/png | 694 | https://staging.openveda.cloud/api/titiler-cmr... | None | 23.071837 |
| 3 | 3 | 5 | 1 | 200 | True | False | False | 1.344237 | image/png | 694 | https://staging.openveda.cloud/api/titiler-cmr... | None | 23.071837 |
| 4 | 3 | 6 | 1 | 200 | True | False | False | 1.879269 | image/png | 694 | https://staging.openveda.cloud/api/titiler-cmr... | None | 23.071837 |
The output includes the following columns:
zoom, x, y— XYZ tile indicesstatus_code— HTTP code (200 = success, 204 = no-data, 4xx/5xx = errors)response_time_sec— wall time in secondsresponse_size_bytes— payload sizeok,is_error, has_data— convenience flags
Now, let's use a convenience function to summarize the benchmark results.
df_summary = tiling_benchmark_summary(df_viewport)
df_summary.head()
| zoom | n_tiles | ok_pct | no_data_pct | error_pct | median_latency_s | p95_latency_s | |
|---|---|---|---|---|---|---|---|
| 0 | 3 | 25.0 | 100.0 | 0.0 | 0.0 | 1.590145 | 1.885190 |
| 1 | 4 | 25.0 | 100.0 | 0.0 | 0.0 | 1.596415 | 1.993003 |
| 2 | 5 | 25.0 | 100.0 | 0.0 | 0.0 | 1.580377 | 1.997009 |
| 3 | 6 | 25.0 | 100.0 | 0.0 | 0.0 | 1.565605 | 1.922572 |
| 4 | 7 | 25.0 | 100.0 | 0.0 | 0.0 | 1.560524 | 1.712366 |
Step 4: Plot the results¶
You may notice there is little variation in performance across zoom levels. This collection's files have global extent. Regardless of the zoom level of the tile request, the same file or files must be opened and read. (Multiple files in the case that the datetime parameter returns multiple granules).
This situation is in contrast to the next example which uses files without global extent.
def summarize_and_plot_tiles_from_df(
df: pd.DataFrame,
*,
jitter=0.08,
alpha=0.35,
figsize=(9, 5),
title_lines=None,
):
"""Generate summary and plot from tile benchmark DataFrame."""
summary = tiling_benchmark_summary(df)
fig, ax = plt.subplots(figsize=figsize)
fig.subplots_adjust(right=0.72, top=0.80)
zoom_levels = sorted(
int(z) for z in pd.to_numeric(df["zoom"], errors="coerce").dropna().unique()
)
ax.set_xticks(zoom_levels)
if zoom_levels:
ax.set_xlim(min(zoom_levels) - 0.6, max(zoom_levels) + 0.6)
for z in zoom_levels:
sub = df[df["zoom"] == z]
if sub.empty:
continue
x = np.random.normal(loc=z, scale=jitter, size=len(sub))
ok_mask = sub["ok"].astype(bool).values
err_mask = sub["is_error"].astype(bool).values
ax.scatter(
x[ok_mask],
sub.loc[ok_mask, "response_time_sec"],
alpha=alpha,
edgecolor="none",
label=None,
)
ax.scatter(
x[err_mask],
sub.loc[err_mask, "response_time_sec"],
marker="x",
alpha=min(0.85, alpha + 0.25),
label=None,
)
med = pd.to_numeric(sub["response_time_sec"], errors="coerce").median()
if np.isfinite(med):
ax.hlines(med, z - 0.45, z + 0.45, linestyles="--")
ax.set_xlabel("Zoom level")
ax.set_ylabel("Tile response time (s)")
ok_proxy = Line2D([], [], linestyle="none", marker="o", label="200 OK")
err_proxy = Line2D(
[], [], linestyle="none", marker="x", label="error (≥400 or failure)"
)
ax.legend(
[ok_proxy, err_proxy],
["200 OK", "error"],
frameon=False,
loc="upper left",
bbox_to_anchor=(1.02, 1.00),
)
if title_lines:
ax.set_title("\n".join(title_lines), fontsize=9, loc="left", pad=12)
ax.grid(True, axis="y", alpha=0.2)
plt.tight_layout()
return summary, (fig, ax)
summary, (fig, ax) = summarize_and_plot_tiles_from_df(
df_viewport,
title_lines=[
"concept_id: C2723754864-GES_DISC",
"endpoint: https://staging.openveda.cloud/api/titiler-cmr",
],
)
plt.show()
HLS Example¶
In this example, we will benchmark a CMR dataset that is structured as Cloud-Optimized GeoTIFFs (COGs) with individual bands. We will use the rasterio backend for this dataset.
In contrast to the first example, HLS is much higher spatial resolution (30 meters vs 0.1 degrees)and each granule has a small spatial footprint. In general, the lower the zoom level (more zoomed out), the more files need to be opened to render a tile, which can lead to increased latency.
A more in-depth HLS benchmark report is provided in the Harmonized Landsat Sentinel 2 (HLS): tiling configuration and rendering performance documentation.
ds_hls_day = DatasetParams(
concept_id="C2021957295-LPCLOUD",
backend="rasterio",
datetime_range="2023-10-01T00:00:01Z/2023-10-07T00:00:01Z",
bands=["B04", "B03", "B02"],
bands_regex="B[0-9][0-9]",
step="P1D",
temporal_mode="point",
)
ds_hls_week = DatasetParams(
concept_id="C2021957657-LPCLOUD",
backend="rasterio",
datetime_range="2023-10-01T00:00:01Z/2023-10-20T00:00:01Z",
bands=["B04", "B03", "B02"],
bands_regex="B[0-9][0-9]",
step="P1W",
temporal_mode="point",
)
min_zoom = 3
max_zoom = 20
viewport_width = 3
viewport_height = 3
timeout_s = 60.0
df_viewport_day = await benchmark_viewport(
endpoint=endpoint,
dataset=ds_hls_day,
lng=lng,
lat=lat,
viewport_width=viewport_width,
viewport_height=viewport_height,
min_zoom=min_zoom,
max_zoom=max_zoom,
timeout_s=timeout_s,
)
df_viewport_day_summary = tiling_benchmark_summary(df_viewport_day)
df_viewport_day_summary.head()
=== TiTiler-CMR Tile Benchmark === Client: 2 physical / 4 logical cores | RAM: 15.62 GiB Dataset: C2021957295-LPCLOUD (rasterio) Query params: 11 parameters concept_id: C2021957295-LPCLOUD backend: rasterio datetime: 2023-10-01T00:00:01Z/2023-10-07T00:00:01Z bands: B04 bands: B03 bands: B02 bands_regex: B[0-9][0-9] step: P1D temporal_mode: point tile_format: png tile_scale: 1
Total execution time: 50.336s
| zoom | n_tiles | ok_pct | no_data_pct | error_pct | median_latency_s | p95_latency_s | |
|---|---|---|---|---|---|---|---|
| 0 | 3 | 9.0 | 100.0 | 0.0 | 0.0 | 22.699576 | 40.584356 |
| 1 | 4 | 9.0 | 100.0 | 0.0 | 0.0 | 16.600212 | 19.499651 |
| 2 | 5 | 9.0 | 100.0 | 0.0 | 0.0 | 14.605976 | 17.154000 |
| 3 | 6 | 9.0 | 100.0 | 0.0 | 0.0 | 11.028084 | 15.306074 |
| 4 | 7 | 9.0 | 100.0 | 0.0 | 0.0 | 4.970010 | 6.005689 |
df_viewport_week = await benchmark_viewport(
endpoint=endpoint,
dataset=ds_hls_week,
lng=lng,
lat=lat,
viewport_width=viewport_width,
viewport_height=viewport_height,
min_zoom=min_zoom,
max_zoom=max_zoom,
timeout_s=timeout_s,
)
df_viewport_week_summary = tiling_benchmark_summary(df_viewport_week)
df_viewport_week_summary.head()
=== TiTiler-CMR Tile Benchmark === Client: 2 physical / 4 logical cores | RAM: 15.62 GiB Dataset: C2021957657-LPCLOUD (rasterio) Query params: 11 parameters concept_id: C2021957657-LPCLOUD backend: rasterio datetime: 2023-10-01T00:00:01Z/2023-10-20T00:00:01Z bands: B04 bands: B03 bands: B02 bands_regex: B[0-9][0-9] step: P1W temporal_mode: point tile_format: png tile_scale: 1
Total execution time: 68.305s
| zoom | n_tiles | ok_pct | no_data_pct | error_pct | median_latency_s | p95_latency_s | |
|---|---|---|---|---|---|---|---|
| 0 | 3 | 9.0 | 88.888889 | 0.0 | 11.111111 | 21.704861 | 23.464815 |
| 1 | 4 | 9.0 | 100.000000 | 0.0 | 0.000000 | 14.372156 | 15.745402 |
| 2 | 5 | 9.0 | 100.000000 | 0.0 | 0.000000 | 14.365140 | 16.641927 |
| 3 | 6 | 9.0 | 100.000000 | 0.0 | 0.000000 | 14.022378 | 17.720544 |
| 4 | 7 | 9.0 | 100.000000 | 0.0 | 0.000000 | 8.535669 | 9.593534 |
summary, (fig, ax) = summarize_and_plot_tiles_from_df(
df_viewport_day,
title_lines=[
"concept_id: C2036881735-POCLOUD",
"Viewport: 3x3 tiles -- daily",
"endpoint: https://staging.openveda.cloud/api/titiler-cmr",
],
)
plt.show()
summary, (fig, ax) = summarize_and_plot_tiles_from_df(
df_viewport_week,
title_lines=[
"concept_id: C2036881735-POCLOUD",
"Viewport: 3x3 tiles -- weekly",
"endpoint: https://staging.openveda.cloud/api/titiler-cmr",
],
)
plt.show()
Conclusion¶
In this notebook, we explored how to check the performance of tile rendering performance in TiTiler-CMR using different datasets and backends. We observed how factors such as zoom levels, temporal intervals, and dataset structures impact the latency of tile requests.
In general, performance depends on:
- zoom level and spatial resolution of the dataset
- the width of the datetime interval and the temporal resolution of the dataset
- how many granules intersect the tile footprint
Takeaways:
- Consider specifying a minzoom and maximum datetime interval given a specific datasets temporal and spatial resolution.