API Documentation

datacube_benchmark.utils.array_storage_size

array_storage_size(array: Array) -> int

Calculate the total storage size of a Zarr array by summing the sizes of its chunks.

datacube_benchmark.create_empty_dataarray

create_empty_dataarray(
    target_array_size: str | Quantity = "1 GB",
    target_spatial_resolution: str | Quantity = ".5 degrees",
    target_chunk_size: str | Quantity = "10 MB",
    target_chunk_shape: TARGET_SHAPES = "dumpling",
    dtype: dtype = dtype("float32"),
) -> DataArray

Create an empty xarray.DataArray with specified size, shape, and dtype.

Parameters:

target_array_size (str | Quantity, default: '1 GB' ) –

The size of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
target_spatial_resolution (str | Quantity, default: '.5 degrees' ) –

The spatial resolution of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
target_chunk_size (str | Quantity, default: '10 MB' ) –

The size of the chunks in the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
target_chunk_shape (TARGET_SHAPES, default: 'dumpling' ) –

The shape of the xarray.DataArray, default is "dumpling".
dtype (dtype, default: dtype('float32') ) –

The data type of the xarray.DataArray, default is np.dtype("float32")

Returns:

DataArray –

An empty xarray.DataArray with the specified parameters.

datacube_benchmark.create_zarr_store

create_zarr_store(
    object_store: ObjectStore,
    target_array_size: str | Quantity = "1 GB",
    target_spatial_resolution: str | Quantity = ".5 degrees",
    target_chunk_size: str | Quantity = "10 MB",
    target_chunk_shape: TARGET_SHAPES = "dumpling",
    compressor: Codec | BytesBytesCodec | None = None,
    dtype: dtype = dtype("float32"),
    fill_method: Literal["random", "zeros", "ones", "arange"] = "arange",
    chunked_coords: bool = False,
    consolidated_metadata: bool = True,
) -> ObjectStore

Create a Zarr store in the specified object store with an empty dataset.

Parameters:

object_store (ObjectStore) –

The object store to write the Zarr dataset to.
target_array_size (str | Quantity, default: '1 GB' ) –

The size of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
target_spatial_resolution (str | Quantity, default: '.5 degrees' ) –

The spatial resolution of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
target_chunk_size (str | Quantity, default: '10 MB' ) –

The size of the chunks in the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
target_chunk_shape (TARGET_SHAPES, default: 'dumpling' ) –

The shape of the xarray.DataArray, default is "dumpling".
compressor (Codec | BytesBytesCodec | None, default: None ) –

The compressor to use for the Zarr store, default is None (no compression).
dtype (dtype, default: dtype('float32') ) –

The data type of the xarray.DataArray, default is np.dtype("float32").
fill_method (Literal['random', 'zeros', 'ones', 'arange'], default: 'arange' ) –
The method to use for filling the Zarr array. Options are:
- "random": Fill with random values.
- "zeros": Fill with zeros.
- "ones": Fill with ones.
- "arange": Fill with a range of values.
chunked_coords (bool, default: False ) –

Whether coords are chunked or not. Chunk size would be (1,).
consolidated_metadata (bool, default: True ) –

Whether to consolidate the Zarr metadata.

Returns:

ObjectStore –

A Zarr store with the specified parameters.

datacube_benchmark.benchmark_zarr_array

benchmark_zarr_array(
    zarr_array: Array,
    access_pattern: Literal[
        "point", "time_series", "spatial_slice", "full"
    ] = "point",
    num_samples: int = 10,
    warmup_samples: int = 10,
) -> dict

Comprehensive benchmark of zarr array random access performance.

Returns detailed statistics about the performance.

Parameters:

zarr_array (Array) –

The zarr array to benchmark
access_pattern (Literal['point', 'time_series', 'spatial_slice', 'full'], default: 'point' ) –

Type of access pattern: "point", "time_series", "spatial_slice", "full"
num_samples (int, default: 10 ) –

Number of random access operations to perform
warmup_samples (int, default: 10 ) –

Number of warmup operations (not included in timing)

Returns:

dict –

A dictionary containing performance statistics including mean, median, std deviation, min, max access times and details about the zarr array such as shape, dtype, and size.

datacube_benchmark.benchmark_access_patterns

benchmark_access_patterns(
    zarr_array: Array, num_samples: int = 10, warmup_samples: int = 10
) -> DataFrame

Benchmark all three access patterns and return combined results.

Parameters:

zarr_array (Array) –

The zarr array to benchmark
num_samples (int, default: 10 ) –

Number of random access operations to perform for each pattern
warmup_samples (int, default: 10 ) –

Number of warmup operations (not included in timing)

Returns:

DataFrame –

pandas.DataFrame with results for each access pattern

datacube_benchmark.types.TARGET_SHAPES `module-attribute`

TARGET_SHAPES = Literal['pancake', 'dumpling', 'churro']

API Documentation

datacube_benchmark.utils.array_storage_size

datacube_benchmark.create_empty_dataarray

datacube_benchmark.create_zarr_store

datacube_benchmark.benchmark_zarr_array

datacube_benchmark.benchmark_access_patterns

datacube_benchmark.types.TARGET_SHAPES module-attribute

datacube_benchmark.types.TARGET_SHAPES `module-attribute`