Skip to content

open

lazycogs.open

open(
    href: str,
    *,
    datetime: str | None = None,
    bbox: tuple[float, float, float, float],
    crs: str | CRS,
    resolution: float,
    filter: str | dict[str, Any] | None = None,
    ids: list[str] | None = None,
    bands: list[str] | None = None,
    chunks: dict[str, int] | None = None,
    sortby: str | list[str | dict[str, str]] | None = None,
    nodata: float | None = None,
    dtype: str | dtype | None = None,
    mosaic_method: type[MosaicMethodBase] | None = None,
    time_period: str = "P1D",
    store: ObjectStore | None = None,
    max_concurrent_reads: int = 32,
    path_from_href: Callable[[str], str] | None = None,
    duckdb_client: DuckdbClient | None = None,
) -> DataArray

Open a mosaic of STAC items as a lazy (time, band, y, x) DataArray.

href must be a path to a geoparquet file (.parquet or .geoparquet) or, when duckdb_client is provided, to a hive-partitioned parquet directory.

Parameters:

Name Type Description Default
href str

Path to a geoparquet file (.parquet or .geoparquet) or a hive-partitioned parquet directory when duckdb_client is provided with use_hive_partitioning=True.

required
datetime str | None

RFC 3339 datetime or range (e.g. "2023-01-01/2023-12-31") used to pre-filter items from the parquet.

None
bbox tuple[float, float, float, float]

(minx, miny, maxx, maxy) in the target crs.

required
crs str | CRS

Target output CRS.

required
resolution float

Output pixel size in crs units.

required
filter str | dict[str, Any] | None

CQL2 filter expression (text string or JSON dict) forwarded to DuckDB queries, e.g. "eo:cloud_cover < 20".

None
ids list[str] | None

STAC item IDs to restrict the search to.

None
bands list[str] | None

Asset keys to include. If None, auto-detected from the first matching item.

None
chunks dict[str, int] | None

Chunk sizes passed to DataArray.chunk(). If None (default), returns a LazilyIndexedArray-backed DataArray where only the requested pixels are fetched on each access — ideal for point or small-region queries. Pass an explicit dict to convert to a dask-backed array for parallel computation over larger regions.

None
sortby str | list[str | dict[str, str]] | None

Sort keys forwarded to DuckDB queries.

None
nodata float | None

No-data fill value for output arrays.

None
dtype str | dtype | None

Output array dtype. Defaults to float32.

None
mosaic_method type[MosaicMethodBase] | None

Mosaic method class (not instance) to use. Defaults to :class:~lazycogs._mosaic_methods.FirstMethod.

None
time_period str

ISO 8601 duration string controlling how items are grouped into time steps. Supported forms: PnD (days), P1W (ISO calendar week), P1M (calendar month), P1Y (calendar year). Defaults to "P1D" (one step per calendar day), which preserves the previous behaviour. Multi-day windows such as "P16D" are aligned to an epoch of 2000-01-01.

'P1D'
store ObjectStore | None

Pre-configured obstore ObjectStore instance to use for all asset reads. Useful when credentials, custom endpoints, or non-default options are needed without relying on automatic store resolution from each HREF. When None (default), each asset URL is parsed to create or reuse a per-thread cached store.

None
max_concurrent_reads int

Maximum number of COG reads to run concurrently per chunk. Items are processed in batches of this size, which bounds peak in-flight memory when a chunk overlaps many files. Methods that support early exit (e.g. the default :class:~lazycogs._mosaic_methods.FirstMethod) will stop reading once every output pixel is filled, so lower values also reduce unnecessary I/O on dense datasets. Defaults to 32.

32
path_from_href Callable[[str], str] | None

Optional callable (href: str) -> str that extracts the object path from an asset HREF. When provided, it replaces the default urlparse-based extraction used in :func:~lazycogs._store.resolve. Most useful when combined with a custom store whose root does not align with the URL path structure of the asset HREFs.

Example — NASA LPDAAC proxy https url for S3 asset::

from obstore.store import S3Store
from urllib.parse import urlparse

store = S3Store(bucket="lp-prod-protected", ...)

def strip_bucket(href: str) -> str:
    # href: https://data.lpdaac.earthdatacloud.nasa.gov/
    #   lp-prod-protected/path/to/file.tif
    # store is rooted at the bucket, so the path is
    # just path/to/file.tif
    return (
        urlparse(href).path.lstrip("/").removeprefix("lp-prod-protected/")
    )

da = lazycogs.open(
    "items.parquet", ..., store=store, path_from_href=strip_bucket
)
None
duckdb_client DuckdbClient | None

Optional DuckdbClient instance. When None (default), a plain DuckdbClient() is created, which is equivalent to the previous rustac.search_sync behaviour. Pass a custom client to enable features such as hive-partitioned datasets::

import rustac, lazycogs

client = DuckdbClient(use_hive_partitioning=True)
da = lazycogs.open(
    "s3://bucket/stac/",
    duckdb_client=client,
    bbox=...,
    crs=...,
    resolution=...,
)
None

Returns:

Type Description
DataArray

Lazy xr.DataArray with dimensions (time, band, y, x).

Raises:

Type Description
ValueError

If href is not a .parquet or .geoparquet file and no duckdb_client is provided, if no matching items are found, or if time_period is not a recognised ISO 8601 duration.