Skip to content

API Documentation

datacube_benchmark.utils.array_storage_size

array_storage_size(array: Array) -> int

Calculate the total storage size of a Zarr array by summing the sizes of its chunks.

datacube_benchmark.create_empty_dataarray

create_empty_dataarray(
    target_array_size: str | Quantity = "1 GB",
    target_spatial_resolution: str | Quantity = ".5 degrees",
    target_chunk_size: str | Quantity = "10 MB",
    target_chunk_shape: TARGET_SHAPES = "dumpling",
    dtype: dtype = dtype("float32"),
) -> DataArray

Create an empty xarray.DataArray with specified size, shape, and dtype.

Parameters:

Returns:

datacube_benchmark.create_zarr_store

create_zarr_store(
    object_store: ObjectStore,
    target_array_size: str | Quantity = "1 GB",
    target_spatial_resolution: str | Quantity = ".5 degrees",
    target_chunk_size: str | Quantity = "10 MB",
    target_chunk_shape: TARGET_SHAPES = "dumpling",
    compressor: Codec | BytesBytesCodec | None = None,
    dtype: dtype = dtype("float32"),
    fill_method: Literal["random", "zeros", "ones", "arange"] = "arange",
    chunked_coords: bool = False,
    consolidated_metadata: bool = True,
) -> ObjectStore

Create a Zarr store in the specified object store with an empty dataset.

Parameters:

  • object_store (ObjectStore) –

    The object store to write the Zarr dataset to.

  • target_array_size (str | Quantity, default: '1 GB' ) –

    The size of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.

  • target_spatial_resolution (str | Quantity, default: '.5 degrees' ) –

    The spatial resolution of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.

  • target_chunk_size (str | Quantity, default: '10 MB' ) –

    The size of the chunks in the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.

  • target_chunk_shape (TARGET_SHAPES, default: 'dumpling' ) –

    The shape of the xarray.DataArray, default is "dumpling".

  • compressor (Codec | BytesBytesCodec | None, default: None ) –

    The compressor to use for the Zarr store, default is None (no compression).

  • dtype (dtype, default: dtype('float32') ) –

    The data type of the xarray.DataArray, default is np.dtype("float32").

  • fill_method (Literal['random', 'zeros', 'ones', 'arange'], default: 'arange' ) –

    The method to use for filling the Zarr array. Options are:

    • "random": Fill with random values.
    • "zeros": Fill with zeros.
    • "ones": Fill with ones.
    • "arange": Fill with a range of values.
  • chunked_coords (bool, default: False ) –

    Whether coords are chunked or not. Chunk size would be (1,).

  • consolidated_metadata (bool, default: True ) –

    Whether to consolidate the Zarr metadata.

Returns:

  • ObjectStore

    A Zarr store with the specified parameters.

datacube_benchmark.benchmark_zarr_array

benchmark_zarr_array(
    zarr_array: Array,
    access_pattern: Literal[
        "point", "time_series", "spatial_slice", "full"
    ] = "point",
    num_samples: int = 10,
    warmup_samples: int = 10,
) -> dict

Comprehensive benchmark of zarr array random access performance.

Returns detailed statistics about the performance.

Parameters:

  • zarr_array (Array) –

    The zarr array to benchmark

  • access_pattern (Literal['point', 'time_series', 'spatial_slice', 'full'], default: 'point' ) –

    Type of access pattern: "point", "time_series", "spatial_slice", "full"

  • num_samples (int, default: 10 ) –

    Number of random access operations to perform

  • warmup_samples (int, default: 10 ) –

    Number of warmup operations (not included in timing)

Returns:

  • dict

    A dictionary containing performance statistics including mean, median, std deviation, min, max access times and details about the zarr array such as shape, dtype, and size.

datacube_benchmark.benchmark_access_patterns

benchmark_access_patterns(
    zarr_array: Array, num_samples: int = 10, warmup_samples: int = 10
) -> DataFrame

Benchmark all three access patterns and return combined results.

Parameters:

  • zarr_array (Array) –

    The zarr array to benchmark

  • num_samples (int, default: 10 ) –

    Number of random access operations to perform for each pattern

  • warmup_samples (int, default: 10 ) –

    Number of warmup operations (not included in timing)

Returns:

datacube_benchmark.types.TARGET_SHAPES module-attribute

TARGET_SHAPES = Literal['pancake', 'dumpling', 'churro']