Skip to content

open

lazycogs.open

open(
    href: str,
    *,
    datetime: str | None = None,
    bbox: tuple[float, float, float, float],
    crs: str | CRS,
    resolution: float,
    filter: str | dict[str, Any] | None = None,
    ids: list[str] | None = None,
    bands: list[str] | None = None,
    chunks: dict[str, int] | None = None,
    sortby: str | list[str | dict[str, str]] | None = None,
    nodata: float | None = None,
    dtype: str | dtype | None = None,
    mosaic_method: type[MosaicMethodBase] | None = None,
    time_period: str | None = "P1D",
    store: Store | None = None,
    max_concurrent_reads: int = 32,
    path_from_href: Callable[[str], str] | None = None,
    duckdb_client: DuckdbClient | None = None,
) -> DataArray

Open a mosaic of STAC items as a lazy (band, time, y, x) DataArray.

href must be a path to a geoparquet file (.parquet or .geoparquet) or, when duckdb_client is provided, to a hive-partitioned parquet directory.

Parameters:

Name Type Description Default
href str

Path to a geoparquet file (.parquet or .geoparquet) or a hive-partitioned parquet directory when duckdb_client is provided with use_hive_partitioning=True.

required
datetime str | None

RFC 3339 datetime or range (e.g. "2023-01-01/2023-12-31") used to pre-filter items from the parquet.

None
bbox tuple[float, float, float, float]

(minx, miny, maxx, maxy) in the target crs.

required
crs str | CRS

Target output CRS.

required
resolution float

Output pixel size in crs units.

required
filter str | dict[str, Any] | None

CQL2 filter expression (text string or JSON dict) forwarded to DuckDB queries, e.g. "eo:cloud_cover < 20".

None
ids list[str] | None

STAC item IDs to restrict the search to.

None
bands list[str] | None

Asset keys to include. If None, inferred from the first matching item's preferred data assets.

None
chunks dict[str, int] | None

Chunk sizes passed to DataArray.chunk(). If None (default), returns a LazilyIndexedArray-backed DataArray where only the requested pixels are fetched on each access — ideal for point or small-region queries. Pass an explicit dict to convert to a dask-backed array for parallel computation over larger regions.

None
sortby str | list[str | dict[str, str]] | None

Sort keys forwarded to DuckDB queries.

None
nodata float | None

No-data fill value for output arrays. When omitted, lazycogs advertises a scalar nodata sentinel only when sampled bands agree on one.

None
dtype str | dtype | None

Output array dtype. When omitted, inferred from sampled asset dtypes on the first matching item. Float-only mosaic methods may auto-promote inferred integer outputs to float32. Explicit integer dtype= still raises for those methods.

None
mosaic_method type[MosaicMethodBase] | None

Mosaic method class (not instance) to use. Defaults to :class:~lazycogs._mosaic_methods.FirstMethod.

None
time_period str | None

Temporal grouping mode. Supported forms are None (one step per unique normalized timestamp), PnD (days), P1W (ISO calendar week), P1M (calendar month), P1Y (calendar year), and PTnH (fixed hour windows). Defaults to "P1D" (one step per calendar day), which preserves the previous behaviour. Multi-day and multi-hour windows are aligned to an epoch of 2000-01-01.

'P1D'
store Store | None

Pre-configured :class:async_geotiff.Store accepted by GeoTIFF.open to use for all asset reads. Useful when credentials, custom endpoints, or non-default options are needed without relying on automatic store resolution from each HREF. When None (default), each asset URL is parsed to create or reuse a shared cached obstore-backed store behind a small lock.

None
max_concurrent_reads int

Maximum number of COG reads to run concurrently per chunk. Concurrency is bounded to this size with an asyncio.Semaphore, which bounds peak in-flight memory when a chunk overlaps many files. Methods that support early exit (e.g. the default :class:~lazycogs._mosaic_methods.FirstMethod) will stop reading once every output pixel is filled, so lower values also reduce unnecessary I/O on dense datasets. Defaults to 32.

32
path_from_href Callable[[str], str] | None

Optional callable (href: str) -> str that extracts the object path from an asset HREF. When provided, it replaces the default urlparse-based extraction used in :func:~lazycogs._store.resolve. Most useful when combined with a custom store whose root does not align with the URL path structure of the asset HREFs.

Example — NASA LPDAAC proxy https url for S3 asset::

from obstore.store import S3Store
from urllib.parse import urlparse

store = S3Store(bucket="lp-prod-protected", ...)

def strip_bucket(href: str) -> str:
    # href: https://data.lpdaac.earthdatacloud.nasa.gov/
    #   lp-prod-protected/path/to/file.tif
    # store is rooted at the bucket, so the path is
    # just path/to/file.tif
    return (
        urlparse(href).path.lstrip("/").removeprefix("lp-prod-protected/")
    )

da = lazycogs.open(
    "items.parquet", ..., store=store, path_from_href=strip_bucket
)
None
duckdb_client DuckdbClient | None

Optional DuckdbClient instance. When None (default), a plain DuckdbClient() is created, which is equivalent to the previous rustac.search_sync behaviour. Pass a custom client to enable features such as hive-partitioned datasets::

import rustac, lazycogs

client = DuckdbClient(use_hive_partitioning=True)
da = lazycogs.open(
    "s3://bucket/stac/",
    duckdb_client=client,
    bbox=...,
    crs=...,
    resolution=...,
)
None

Returns:

Type Description
DataArray

Lazy xr.DataArray with dimensions (band, time, y, x).

Raises:

Type Description
ValueError

If href is not a .parquet or .geoparquet file and no duckdb_client is provided, if no matching items are found, or if time_period is not a recognised ISO 8601 duration.