Skip to content

Utilities

lazycogs.align_bbox

align_bbox(
    affine: Affine | Sequence[float], bbox: tuple[float, float, float, float]
) -> tuple[float, float, float, float]

Snap a bounding box to the pixel grid defined by an affine transform.

Expands the bbox outward so that all four edges fall exactly on a grid line. Useful for aligning an AOI to the native grid of a COG collection (e.g. from a STAC item's proj:transform property) before calling :func:lazycogs.open.

Parameters:

Name Type Description Default
affine Affine | Sequence[float]

Affine transform in row-major order, either 6-element (pixel_w, 0, x_origin, 0, pixel_h, y_origin) or 9-element (pixel_w, 0, x_origin, 0, pixel_h, y_origin, 0, 0, 1). Accepts an :class:affine.Affine object or the list stored in a STAC item's proj:transform property.

required
bbox tuple[float, float, float, float]

(minx, miny, maxx, maxy) in the same CRS as the transform.

required

Returns:

Type Description
float

(minx, miny, maxx, maxy) snapped to the nearest enclosing grid

float

lines.

lazycogs.store_for

store_for(
    href: str,
    *,
    asset: str | None = None,
    duckdb_client: DuckdbClient | None = None,
    **kwargs: object,
) -> ObjectStore

Construct an ObjectStore by inspecting a stac-geoparquet sample asset.

Reads one sample item from href, derives the store root URL from a data asset HREF, and constructs an ObjectStore with obstore's own environment-based credential discovery. If the item carries STAC Storage Extension metadata (v1.0.0 or v2.0.0), region and requester_pays are also inferred automatically.

Caller-supplied kwargs override all inferred values; pass skip_signature=True for public buckets that do not require signed requests, or supply explicit credentials.

Parameters:

Name Type Description Default
href str

Path to a geoparquet file or hive-partitioned parquet directory.

required
asset str | None

Asset key to inspect when choosing a representative asset. Defaults to the first data asset (role "data" or media type "image/tiff"), falling back to the first asset in the item.

None
duckdb_client DuckdbClient | None

Optional DuckdbClient instance. When None (default), a plain DuckdbClient() is used. Pass a custom client to query hive-partitioned datasets.

None
**kwargs object

Forwarded to :func:obstore.store.from_url, overriding any inferred values.

{}

Returns:

Type Description
ObjectStore

A freshly constructed ObjectStore (not cached).

Raises:

Type Description
ValueError

If no STAC items are found in href.

KeyError

If asset is specified but not present in the item.

lazycogs.set_reproject_workers

set_reproject_workers(n: int) -> None

Set the number of threads each thread's event loop uses for reprojection.

Each thread (dask worker, Jupyter kernel callback thread, etc.) gets one persistent background event loop with one bounded reprojection ThreadPoolExecutor. All chunk reads on that thread share the same loop and executor. Dask tasks on different threads do not compete for a shared pool. Total reprojection threads at any moment is at most n x active_thread_count.

Reprojection is memory-bandwidth-bound rather than compute-bound, so values above 4 typically offer no benefit and can hurt throughput due to memory contention. The default is min(os.cpu_count(), 4).

To improve overall throughput, prefer adding time or band parallelism via dask (chunks={"time": 1}) over raising this value.

Parameters:

Name Type Description Default
n int

Number of worker threads per event loop. Must be >= 1.

required

Raises:

Type Description
ValueError

If n is less than 1.

lazycogs.ExplainPlan dataclass

Complete dry-run read plan for a lazycogs query.

Attributes:

Name Type Description
href str

Path to the source geoparquet file.

crs str

String representation of the output CRS.

resolution float

Output pixel size in CRS units.

bands list[str]

Ordered list of band names included in the plan.

time_coords list[datetime64]

Time coordinate values for all explained time steps.

dst_width int

Output grid width in pixels (for the current DataArray extent).

dst_height int

Output grid height in pixels (for the current DataArray extent).

chunk_width int

Spatial chunk width in pixels.

chunk_height int

Spatial chunk height in pixels.

chunk_reads list[ChunkRead]

One entry per (band, time step, spatial tile).

fetch_headers bool

Whether COG headers were opened to populate overview and window fields on each :class:ItemRead.

empty_chunk_count property

empty_chunk_count: int

Number of chunks with zero matching COG files.

total_chunk_reads property

total_chunk_reads: int

Total number of (band, time, spatial) chunk reads.

total_cog_reads property

total_cog_reads: int

Total number of COG file reads across all chunks.

__repr__

__repr__() -> str

Return a compact single-line summary.

summary

summary() -> str

Return a multi-line human-readable summary of the explain plan.

to_dataframe

to_dataframe() -> DataFrame

Return a DataFrame with one row per (chunk x item) combination.

Empty chunks contribute one row with item fields set to None. When fetch_headers=False, the overview and window columns are all None.

Returns:

Type Description
DataFrame

A pandas.DataFrame with columns for chunk metadata, item

DataFrame

metadata, and (when available) COG header details.

Raises:

Type Description
ImportError

If pandas is not installed.

lazycogs.ChunkRead dataclass

All reads required for one (band, time step, spatial tile).

Attributes:

Name Type Description
band str

Asset key for this chunk.

time_index int

Index of this time step in the full time axis.

date_filter str

rustac-compatible datetime filter string for this time step.

time_coord datetime64

Coordinate value for this time step.

chunk_row int

Tile row index within the spatial grid (0-indexed).

chunk_col int

Tile column index within the spatial grid (0-indexed).

chunk_affine Affine

Affine transform of the tile (top-left origin).

chunk_width int

Tile width in pixels.

chunk_height int

Tile height in pixels.

cog_reads list[CogRead]

Per-COG read details.

n_cog_reads int

Number of COG files matched (derived from cog_reads).

__post_init__

__post_init__() -> None

Derive n_cog_reads from the cog_reads list.

lazycogs.CogRead dataclass

Read details for one COG file within one chunk.

Attributes:

Name Type Description
item_id str

STAC item ID.

asset_key str

Asset key (band name) that would be read.

href str

Asset HREF.

overview_level int | None

Overview level that would be read. None means full resolution. Only populated when fetch_headers=True.

overview_resolution float | None

Pixel size of the selected level in source CRS units. Only populated when fetch_headers=True.

window_col_off int | None

Column offset of the read window in source pixels. Only populated when fetch_headers=True.

window_row_off int | None

Row offset of the read window in source pixels. Only populated when fetch_headers=True.

window_width int | None

Width of the read window in source pixels. Only populated when fetch_headers=True.

window_height int | None

Height of the read window in source pixels. Only populated when fetch_headers=True.

lazycogs.StacCogAccessor

xarray accessor adding explain functionality to lazycogs DataArrays.

Registered as the stac_cog namespace on all xr.DataArray objects. The :meth:explain method is only useful on DataArrays produced by :func:lazycogs.open.

__init__

__init__(da: DataArray) -> None

Initialise the accessor.

Parameters:

Name Type Description Default
da DataArray

The DataArray this accessor is attached to.

required

explain

explain(*, fetch_headers: bool = False) -> ExplainPlan

Return a dry-run read plan without fetching any pixel data.

Runs the same DuckDB spatial queries that would fire during .compute(), but stops before any COG pixel I/O. With fetch_headers=True the COG IFD headers are also fetched (one small HTTP range request per matched item) to determine which overview level and pixel window would be read.

Parameters:

Name Type Description Default
fetch_headers bool

When True, open each matched COG header to populate :attr:ItemRead.overview_level and the window fields. Requires network I/O. Defaults to False.

False

Returns:

Name Type Description
An ExplainPlan

class:ExplainPlan describing all (band, time step, spatial

ExplainPlan

tile) reads for the current DataArray extent and chunking.

Raises:

Type Description
ValueError

If the DataArray was not produced by lazycogs.open() (missing explain metadata in attrs).