API Documentation
datacube_benchmark.utils.array_storage_size
Calculate the total storage size of a Zarr array by summing the sizes of its chunks.
datacube_benchmark.create_empty_dataarray
create_empty_dataarray(
target_array_size: str | Quantity = "1 GB",
target_spatial_resolution: str | Quantity = ".5 degrees",
target_chunk_size: str | Quantity = "10 MB",
target_chunk_shape: TARGET_SHAPES = "dumpling",
dtype: dtype = dtype("float32"),
) -> DataArray
Create an empty xarray.DataArray with specified size, shape, and dtype.
Parameters:
-
target_array_size
(str | Quantity
, default:'1 GB'
) –The size of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
-
target_spatial_resolution
(str | Quantity
, default:'.5 degrees'
) –The spatial resolution of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
-
target_chunk_size
(str | Quantity
, default:'10 MB'
) –The size of the chunks in the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
-
target_chunk_shape
(TARGET_SHAPES
, default:'dumpling'
) –The shape of the xarray.DataArray, default is "dumpling".
-
dtype
(dtype
, default:dtype('float32')
) –The data type of the xarray.DataArray, default is np.dtype("float32")
Returns:
-
DataArray
–An empty xarray.DataArray with the specified parameters.
datacube_benchmark.create_zarr_store
create_zarr_store(
object_store: ObjectStore,
target_array_size: str | Quantity = "1 GB",
target_spatial_resolution: str | Quantity = ".5 degrees",
target_chunk_size: str | Quantity = "10 MB",
target_chunk_shape: TARGET_SHAPES = "dumpling",
compressor: Codec | BytesBytesCodec | None = None,
dtype: dtype = dtype("float32"),
fill_method: Literal["random", "zeros", "ones", "arange"] = "arange",
chunked_coords: bool = False,
consolidated_metadata: bool = True,
) -> ObjectStore
Create a Zarr store in the specified object store with an empty dataset.
Parameters:
-
object_store
(ObjectStore
) –The object store to write the Zarr dataset to.
-
target_array_size
(str | Quantity
, default:'1 GB'
) –The size of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
-
target_spatial_resolution
(str | Quantity
, default:'.5 degrees'
) –The spatial resolution of the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
-
target_chunk_size
(str | Quantity
, default:'10 MB'
) –The size of the chunks in the xarray.DataArray, can be a string or a pint.Quantity. String must be convertible to a pint.Quantity.
-
target_chunk_shape
(TARGET_SHAPES
, default:'dumpling'
) –The shape of the xarray.DataArray, default is "dumpling".
-
compressor
(Codec | BytesBytesCodec | None
, default:None
) –The compressor to use for the Zarr store, default is None (no compression).
-
dtype
(dtype
, default:dtype('float32')
) –The data type of the xarray.DataArray, default is np.dtype("float32").
-
fill_method
(Literal['random', 'zeros', 'ones', 'arange']
, default:'arange'
) –The method to use for filling the Zarr array. Options are:
"random"
: Fill with random values."zeros"
: Fill with zeros."ones"
: Fill with ones."arange"
: Fill with a range of values.
-
chunked_coords
(bool
, default:False
) –Whether coords are chunked or not. Chunk size would be (1,).
-
consolidated_metadata
(bool
, default:True
) –Whether to consolidate the Zarr metadata.
Returns:
-
ObjectStore
–A Zarr store with the specified parameters.
datacube_benchmark.benchmark_zarr_array
benchmark_zarr_array(
zarr_array: Array,
access_pattern: Literal[
"point", "time_series", "spatial_slice", "full"
] = "point",
num_samples: int = 10,
warmup_samples: int = 10,
) -> dict
Comprehensive benchmark of zarr array random access performance.
Returns detailed statistics about the performance.
Parameters:
-
zarr_array
(Array
) –The zarr array to benchmark
-
access_pattern
(Literal['point', 'time_series', 'spatial_slice', 'full']
, default:'point'
) –Type of access pattern: "point", "time_series", "spatial_slice", "full"
-
num_samples
(int
, default:10
) –Number of random access operations to perform
-
warmup_samples
(int
, default:10
) –Number of warmup operations (not included in timing)
Returns:
-
dict
–A dictionary containing performance statistics including mean, median, std deviation, min, max access times and details about the zarr array such as shape, dtype, and size.
datacube_benchmark.benchmark_access_patterns
benchmark_access_patterns(
zarr_array: Array, num_samples: int = 10, warmup_samples: int = 10
) -> DataFrame
Benchmark all three access patterns and return combined results.
Parameters:
-
zarr_array
(Array
) –The zarr array to benchmark
-
num_samples
(int
, default:10
) –Number of random access operations to perform for each pattern
-
warmup_samples
(int
, default:10
) –Number of warmup operations (not included in timing)
Returns:
-
DataFrame
–pandas.DataFrame with results for each access pattern
datacube_benchmark.types.TARGET_SHAPES
module-attribute
TARGET_SHAPES = Literal['pancake', 'dumpling', 'churro']