Skip to content

cng-formats-benchmark

A reusable, deployable benchmarking system for cloud-native geospatial formats (COG, GeoZarr v3, COPC, GeoParquet, and their baselines). It measures read, write, and display performance plus the object-size distribution that decides whether a layout fits a tiered object store.

The methodology is opinionated and reproducible: describe your datasets in config files, deploy the stack, and run. Nothing about a particular dataset or provider is baked into the code.

What it is (and is not)

It is a deployable component, not a CI job. The harness runs on real infrastructure (a workstation, a notebook environment, any Kubernetes cluster) against real data. Continuous integration only builds the image, unit-tests the harness, and proves the stack deploys — it never runs a benchmark.

Two layers:

  • Harness — the Python logic that runs the metrics against a dataset, packaged as a container image (a batch runner). Importable and unit-testable in isolation; no live services required.
  • Deployment — a stack bundling the runner and its service dependencies (notably TiTiler for the display metric, and S3-compatible object storage), deployable via docker-compose (local) and a Helm chart (Kubernetes).

Metrics

Per format, per dataset:

Metric What it captures
object size distribution + tier fitness — first-class, because a tiered object store makes object size a hard constraint, not a footnote
write conversion throughput (baseline → target), including the source read
read range-request-aware read latency / throughput (windowed /vsis3 reads)
display TiTiler tile latency per chunk-crossing scenario (tiles touching 1 / 2 / 4 / 9+ internal blocks), with a block-grid layout image

Processing benchmarks are out of scope, but the harness keeps an extension seam for them.

Where to go next

  • Architecture — the design and how the pieces fit together, with diagrams.
  • Getting started — run the synthetic stack end-to-end on your machine in a few minutes.
  • Configuration — describe datasets, benchmarks, and tier policies as data.
  • Deployment — docker-compose and Helm, local and lab.

Status

The deployable stack and the COG end-to-end path (convert → object size → read → display) are implemented and proven in CI against a synthetic fixture, including the two-provider (source ≠ sink) storage model for real runs. Bringing a real mission online is a deployment activity; extending to a second dataset/format is configuration. See Architecture › Status & roadmap.

Licence

MIT. See LICENSE.