datacube-benchmark
Utilities for benchmarking Zarr datacubes — generate synthetic stores with different chunking schemes, compressors, and dtypes, then measure read performance under realistic access patterns.
Companion package to the Datacube Guide, which documents common pitfalls when producing and consuming multi-dimensional data products.
Installation
pip install datacube-benchmark
Python 3.12+ is required.
Quickstart
Create a synthetic Zarr store on local disk and time a few random-access patterns against it:
from pathlib import Path
import obstore as obs
import zarr
import datacube_benchmark
path = Path.cwd() / "data" / "test.zarr"
path.mkdir(parents=True, exist_ok=True)
store = obs.store.LocalStore(str(path))
zarr_store = datacube_benchmark.create_zarr_store(store)
arr = zarr.open_array(zarr_store, zarr_version=3, path="data")
results = datacube_benchmark.benchmark_access_patterns(arr, num_samples=10)
print(results)
create_zarr_store takes target sizes and chunk shapes as strings or
pint quantities (e.g. "1 GB",
"10 MB"), and writes through an obstore
store — so the same call works against a local directory, S3, GCS, or
Azure by swapping the store.
See the API reference for the full surface.