STAC + VirtualiZarr¶
End-to-end example: query a STAC API for Landsat COGs in California spanning
two adjacent UTM zones (10N and 11N), virtualize them into an Icechunk store
with zarr-datafusion-search metadata, and merge them into a single grid using
merge(datafusion=True).
The UTM zone boundary runs roughly along -120° longitude through the Sierra Nevada. We pick a bbox around Lake Tahoe that straddles both zones.
AWS credentials: This notebook reads from the public usgs-landsat bucket
(requester-pays) in us-west-2. You need valid AWS credentials configured.
import warnings
import matplotlib.pyplot as plt
import xarray as xr
import zarr
from affine import Affine
from rasterix import RasterIndex
from lazymerge.merge import merge
warnings.filterwarnings("ignore")
xr.set_options(display_expand_indexes=True)
<xarray.core.options.set_options at 0x11a3f5010>
1. Configure cloud storage¶
Landsat Collection 2 Level-2 data lives in the usgs-landsat S3 bucket
(requester-pays, us-west-2).
import boto3
from obspec_utils.registry import ObjectStoreRegistry
from obstore.auth.boto3 import Boto3CredentialProvider
from obstore.store import S3Store
scheme = "s3://"
bucket = "usgs-landsat"
region = "us-west-2"
session = boto3.Session()
object_store = S3Store(
bucket=bucket,
region=region,
request_payer=True,
credential_provider=Boto3CredentialProvider(session=session),
)
registry = ObjectStoreRegistry({f"{scheme}{bucket}": object_store})
2. Set up Icechunk repository¶
Icechunk provides a versioned zarr store that supports virtual chunk references back to the original COG files on S3.
import icechunk
location = "data/icechunk_california"
storage = icechunk.local_filesystem_storage(location)
config = icechunk.RepositoryConfig.default()
s3_chunk_store = icechunk.s3_store(
region=region,
requester_pays=True,
)
config.set_virtual_chunk_container(
icechunk.VirtualChunkContainer(f"{scheme}{bucket}/", s3_chunk_store),
)
credentials = icechunk.containers_credentials(
{f"{scheme}{bucket}/": icechunk.s3_credentials(from_env=True)},
)
repo = icechunk.Repository.open_or_create(
storage=storage,
config=config,
authorize_virtual_chunk_access=credentials,
)
ic_session = repo.writable_session("main")
2026-05-27T01:19:52.813823Z WARN icechunk::storage::object_store: The LocalFileSystem storage is not safe for concurrent commits. If more than one thread/process will attempt to commit at the same time, prefer using object stores. at icechunk/src/storage/object_store.rs:92
3. Ingest STAC search results¶
Query Earth Search for Landsat Collection 2 Level-2 scenes around Lake Tahoe. The bbox (-121.5, 38.5, -118.5, 40.5) spans the UTM 10N / 11N boundary at -120° longitude, so we get scenes in both zones.
We request 6 scenes and ingest the red, green, and blue asset hrefs.
from zarr_datafusion_search import ingest_stac_search
rows = await ingest_stac_search(
url="https://earth-search.aws.element84.com/v1",
session=ic_session,
collections=["landsat-c2-l2"],
bbox=(-121.5, 38.5, -118.5, 40.5),
max_items=6,
datetime="2025-10-01/2025-10-15",
asset_hrefs=["red", "green", "blue"],
)
rows
6
4. Query metadata with DataFusion¶
Use DataFusion SQL to inspect the ingested metadata and verify we have scenes in both UTM zones.
from datafusion import SessionContext
from geodatafusion import register_all
from zarr_datafusion_search import ZarrTable
ic_session = repo.writable_session("main")
zarr_table = await ZarrTable.from_icechunk(session=ic_session, group_path="/meta")
ctx = SessionContext()
register_all(ctx)
ctx.register_table_provider("meta", zarr_table)
df = ctx.sql(
"SELECT * FROM meta ORDER BY datetime",
)
df
| asset_blue | asset_green | asset_red | bbox | created | datetime | description | earthsearch:payload_id | eo:cloud_cover | grid:code | gsd | id | landsat:cloud_cover_land | landsat:collection_category | landsat:collection_number | landsat:correction | landsat:product_generated | landsat:scene_id | landsat:wrs_path | landsat:wrs_row | landsat:wrs_type | platform | proj:epsg | sci:doi | shape_x | shape_y | transform_0 | transform_1 | transform_2 | transform_3 | transform_4 | transform_5 | updated | view:off_nadir | view:sun_azimuth | view:sun_elevation |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 30.47 | WRS2-043032 | 30 | 30.47 | T1 | 02 | L2SP | 2025-10-12T16:19:40Z | LC90430322025284LGN00 | 043 | 032 | 2 | landsat-9 | 32611 | 10.5066/P9OGBGM6 | 7901 | 8011 | 30.0 | 0.0 | 150885.0 | 0.0 | -30.0 | 4587915.0 | 0 | 158.43982425 | 39.96946877 | ||||||||||
| 9.05 | WRS2-043033 | 30 | 9.05 | T1 | 02 | L2SP | 2025-10-12T16:32:40Z | LC90430332025284LGN00 | 043 | 033 | 2 | landsat-9 | 32610 | 10.5066/P9OGBGM6 | 7581 | 7701 | 30.0 | 0.0 | 632385.0 | 0.0 | -30.0 | 4425015.0 | 0 | 157.54660497 | 41.19354606 | ||||||||||
| 3.34 | WRS2-043034 | 30 | 3.35 | T1 | 02 | L2SP | 2025-10-12T16:22:19Z | LC90430342025284LGN00 | 043 | 034 | 2 | landsat-9 | 32610 | 10.5066/P9OGBGM6 | 7591 | 7721 | 30.0 | 0.0 | 598785.0 | 0.0 | -30.0 | 4265415.0 | 0 | 156.62566494 | 42.408338 | ||||||||||
| 0.42 | WRS2-042032 | 30 | 0.42 | T1 | 02 | L2SP | 2025-11-18T02:08:11Z | LC80420322025285LGN00 | 042 | 032 | 2 | landsat-8 | 32611 | 10.5066/P9OGBGM6 | 7811 | 7931 | 30.0 | 0.0 | 284685.0 | 0.0 | -30.0 | 4583715.0 | 0 | 158.64254999 | 39.62665708 | ||||||||||
| 0.12 | WRS2-042033 | 30 | 0.12 | T1 | 02 | L2SP | 2025-11-18T12:16:19Z | LC80420332025285LGN00 | 042 | 033 | 2 | landsat-8 | 32611 | 10.5066/P9OGBGM6 | 7811 | 7941 | 30.0 | 0.0 | 243885.0 | 0.0 | -30.0 | 4425915.0 | 0 | 157.76121163 | 40.85308421 | ||||||||||
| 0.57 | WRS2-042034 | 30 | 0.57 | T1 | 02 | L2SP | 2025-11-18T02:22:30Z | LC80420342025285LGN00 | 042 | 034 | 2 | landsat-8 | 32611 | 10.5066/P9OGBGM6 | 7821 | 7941 | 30.0 | 0.0 | 202485.0 | 0.0 | -30.0 | 4268115.0 | 0 | 156.85262094 | 42.07091531 |
5. Virtualize COGs and write conventions¶
For each scene, use VirtualiZarr to create virtual zarr references to the original COG overviews on S3, then write zarr conventions metadata (multiscales, spatial, proj) to the Icechunk store.
from collections.abc import MutableMapping
from typing import Any
from rasterio.warp import transform_bounds
from shapely import wkb
from virtual_tiff import VirtualTIFF
from virtualizarr import open_virtual_dataset
from lazymerge.conventions import ProjAttrs, SpatialAttrs, write_proj, write_spatial
def write_multiscales(overviews: int, attrs: MutableMapping[str, Any]) -> None:
layout = []
factor = 2
for overview in range(overviews):
if overview == 0:
config = {"asset": str(overview)}
else:
scale = float(factor**overview)
config = {
"asset": str(overview),
"derived_from": str(overview - 1),
"factors": [factor, factor],
"transform": {
"scale": [scale, scale],
"translation": [0.0, 0.0],
},
}
layout.append(config)
attrs["multiscales"] = {"layout": layout}
def compute_transform_from_bbox(
projected_bbox: tuple[float, float, float, float],
shape: tuple[int, int],
) -> tuple[float, float, float, float, float, float]:
"""Compute an affine transform from a projected bounding box and array shape.
Follows the north-up raster convention (y decreasing top-to-bottom),
similar to lazycogs' compute_output_grid.
"""
minx, miny, maxx, maxy = projected_bbox
height, width = shape
res_x = (maxx - minx) / width
res_y = -(maxy - miny) / height
return (res_x, 0.0, minx, 0.0, res_y, maxy)
def get_base_transform(
row: dict[str, Any],
projected_bbox: tuple[float, float, float, float],
shape: tuple[int, int],
) -> tuple[float, float, float, float, float, float]:
"""Get the affine transform from the row, falling back to bbox/shape calculation."""
if "transform_0" in row and row["transform_0"] is not None:
return (
row["transform_0"],
row["transform_1"],
row["transform_2"],
row["transform_3"],
row["transform_4"],
row["transform_5"],
)
return compute_transform_from_bbox(projected_bbox, shape)
def write_virtualizarr_data(
id: str,
assets: dict[str, str],
overviews: int,
session,
registry,
) -> None:
for key, asset in assets.items():
for overview in range(overviews):
group = f"{id}/{key}/{overview}"
ds = open_virtual_dataset(
url=asset,
registry=registry,
parser=VirtualTIFF(ifd=overview),
)
ds.vz.to_icechunk(store=session.store, group=group)
def write_conventions(row: dict[str, Any], overviews: int, session, bands: list[str]) -> None:
store = session.store
item_id = row["id"]
# Compute projected bbox once (same for all overviews)
geom = wkb.loads(row["bbox"])
bbox_4326 = [*geom.bounds]
proj_code = row["proj:epsg"]
projected_bbox = transform_bounds("EPSG:4326", proj_code, *bbox_4326)
base_shape = (row["shape_y"], row["shape_x"])
# Use transform from metadata if available, otherwise derive from bbox + shape
base_transform = get_base_transform(row, projected_bbox, base_shape)
proj_attrs = ProjAttrs(code=f"EPSG:{proj_code}")
# Write spatial+proj on the item group (for source discovery)
item_group = zarr.open_group(store, path=f"/{item_id}", zarr_format=3)
write_spatial(
item_group,
SpatialAttrs(
dimensions=["y", "x"],
transform=base_transform,
bbox=projected_bbox,
shape=base_shape,
),
)
write_proj(item_group, proj_attrs)
for band in bands:
# Write multiscales convention on each band group
band_group = zarr.open_group(store, path=f"/{item_id}/{band}", zarr_format=3)
write_multiscales(overviews=overviews, attrs=band_group.attrs)
# Write spatial+proj on each overview array
for level in range(overviews):
factor = 2**level
ovr_res_x = base_transform[0] * factor
ovr_res_y = base_transform[4] * factor
ovr_transform = (
ovr_res_x,
base_transform[1],
base_transform[2],
base_transform[3],
ovr_res_y,
base_transform[5],
)
# Open the VirtualiZarr array (group-wrapped: {level}/{level})
level_group = zarr.open_group(
store,
path=f"/{item_id}/{band}/{level}",
zarr_format=3,
)
for _name, member in level_group.members():
if isinstance(member, zarr.Array):
arr = member
break
else:
continue
ovr_shape = arr.shape
write_spatial(
arr,
SpatialAttrs(
dimensions=["y", "x"],
transform=ovr_transform,
bbox=projected_bbox,
shape=ovr_shape,
),
)
write_proj(arr, proj_attrs)
overviews = 7
bands = ["red", "green", "blue"]
for row in df.to_arrow_table().to_pylist():
assets = {
"red": row["asset_red"],
"green": row["asset_green"],
"blue": row["asset_blue"],
}
ic_session = repo.writable_session("main")
write_virtualizarr_data(
id=row["id"],
assets=assets,
overviews=overviews,
session=ic_session,
registry=registry,
)
write_conventions(row=row, overviews=overviews, session=ic_session, bands=bands)
message = ic_session.commit(row["id"])
6. Inspect the store¶
Verify the Icechunk store has the expected structure: /meta group with
columnar arrays, plus per-scene groups with band subgroups and overviews.
ic_session = repo.writable_session("main")
root = zarr.open(ic_session.store)
root.tree()
/
├── LC08_L2SP_042032_20251012_02_T1
│ ├── blue
│ │ ├── 0
│ │ │ └── 0 (7931, 7811) uint16
│ │ ├── 1
│ │ │ └── 1 (3966, 3906) uint16
│ │ ├── 2
│ │ │ └── 2 (1983, 1953) uint16
│ │ ├── 3
│ │ │ └── 3 (992, 977) uint16
│ │ ├── 4
│ │ │ └── 4 (496, 489) uint16
│ │ ├── 5
│ │ │ └── 5 (248, 245) uint16
│ │ └── 6
│ │ └── 6 (124, 123) uint16
│ ├── green
│ │ ├── 0
│ │ │ └── 0 (7931, 7811) uint16
│ │ ├── 1
│ │ │ └── 1 (3966, 3906) uint16
│ │ ├── 2
│ │ │ └── 2 (1983, 1953) uint16
│ │ ├── 3
│ │ │ └── 3 (992, 977) uint16
│ │ ├── 4
│ │ │ └── 4 (496, 489) uint16
│ │ ├── 5
│ │ │ └── 5 (248, 245) uint16
│ │ └── 6
│ │ └── 6 (124, 123) uint16
│ └── red
│ ├── 0
│ │ └── 0 (7931, 7811) uint16
│ ├── 1
│ │ └── 1 (3966, 3906) uint16
│ ├── 2
│ │ └── 2 (1983, 1953) uint16
│ ├── 3
│ │ └── 3 (992, 977) uint16
│ ├── 4
│ │ └── 4 (496, 489) uint16
│ ├── 5
│ │ └── 5 (248, 245) uint16
│ └── 6
│ └── 6 (124, 123) uint16
├── LC08_L2SP_042033_20251012_02_T1
│ ├── blue
│ │ ├── 0
│ │ │ └── 0 (7941, 7811) uint16
│ │ ├── 1
│ │ │ └── 1 (3971, 3906) uint16
│ │ ├── 2
│ │ │ └── 2 (1986, 1953) uint16
│ │ ├── 3
│ │ │ └── 3 (993, 977) uint16
│ │ ├── 4
│ │ │ └── 4 (497, 489) uint16
│ │ ├── 5
│ │ │ └── 5 (249, 245) uint16
│ │ └── 6
│ │ └── 6 (125, 123) uint16
│ ├── green
│ │ ├── 0
│ │ │ └── 0 (7941, 7811) uint16
│ │ ├── 1
│ │ │ └── 1 (3971, 3906) uint16
│ │ ├── 2
│ │ │ └── 2 (1986, 1953) uint16
│ │ ├── 3
│ │ │ └── 3 (993, 977) uint16
│ │ ├── 4
│ │ │ └── 4 (497, 489) uint16
│ │ ├── 5
│ │ │ └── 5 (249, 245) uint16
│ │ └── 6
│ │ └── 6 (125, 123) uint16
│ └── red
│ ├── 0
│ │ └── 0 (7941, 7811) uint16
│ ├── 1
│ │ └── 1 (3971, 3906) uint16
│ ├── 2
│ │ └── 2 (1986, 1953) uint16
│ ├── 3
│ │ └── 3 (993, 977) uint16
│ ├── 4
│ │ └── 4 (497, 489) uint16
│ ├── 5
│ │ └── 5 (249, 245) uint16
│ └── 6
│ └── 6 (125, 123) uint16
├── LC08_L2SP_042034_20251012_02_T1
│ ├── blue
│ │ ├── 0
│ │ │ └── 0 (7941, 7821) uint16
│ │ ├── 1
│ │ │ └── 1 (3971, 3911) uint16
│ │ ├── 2
│ │ │ └── 2 (1986, 1956) uint16
│ │ ├── 3
│ │ │ └── 3 (993, 978) uint16
│ │ ├── 4
│ │ │ └── 4 (497, 489) uint16
│ │ ├── 5
│ │ │ └── 5 (249, 245) uint16
│ │ └── 6
│ │ └── 6 (125, 123) uint16
│ ├── green
│ │ ├── 0
│ │ │ └── 0 (7941, 7821) uint16
│ │ ├── 1
│ │ │ └── 1 (3971, 3911) uint16
│ │ ├── 2
│ │ │ └── 2 (1986, 1956) uint16
│ │ ├── 3
│ │ │ └── 3 (993, 978) uint16
│ │ ├── 4
│ │ │ └── 4 (497, 489) uint16
│ │ ├── 5
│ │ │ └── 5 (249, 245) uint16
│ │ └── 6
│ │ └── 6 (125, 123) uint16
│ └── red
│ ├── 0
│ │ └── 0 (7941, 7821) uint16
│ ├── 1
│ │ └── 1 (3971, 3911) uint16
│ ├── 2
│ │ └── 2 (1986, 1956) uint16
│ ├── 3
│ │ └── 3 (993, 978) uint16
│ ├── 4
│ │ └── 4 (497, 489) uint16
│ ├── 5
│ │ └── 5 (249, 245) uint16
│ └── 6
│ └── 6 (125, 123) uint16
├── LC09_L2SP_043032_20251011_02_T1
│ ├── blue
│ │ ├── 0
│ │ │ └── 0 (8011, 7901) uint16
│ │ ├── 1
│ │ │ └── 1 (4006, 3951) uint16
│ │ ├── 2
│ │ │ └── 2 (2003, 1976) uint16
│ │ ├── 3
│ │ │ └── 3 (1002, 988) uint16
│ │ ├── 4
│ │ │ └── 4 (501, 494) uint16
│ │ ├── 5
│ │ │ └── 5 (251, 247) uint16
│ │ └── 6
│ │ └── 6 (126, 124) uint16
│ ├── green
│ │ ├── 0
│ │ │ └── 0 (8011, 7901) uint16
│ │ ├── 1
│ │ │ └── 1 (4006, 3951) uint16
│ │ ├── 2
│ │ │ └── 2 (2003, 1976) uint16
│ │ ├── 3
│ │ │ └── 3 (1002, 988) uint16
│ │ ├── 4
│ │ │ └── 4 (501, 494) uint16
│ │ ├── 5
│ │ │ └── 5 (251, 247) uint16
│ │ └── 6
│ │ └── 6 (126, 124) uint16
│ └── red
│ ├── 0
│ │ └── 0 (8011, 7901) uint16
│ ├── 1
│ │ └── 1 (4006, 3951) uint16
│ ├── 2
│ │ └── 2 (2003, 1976) uint16
│ ├── 3
│ │ └── 3 (1002, 988) uint16
│ ├── 4
│ │ └── 4 (501, 494) uint16
│ ├── 5
│ │ └── 5 (251, 247) uint16
│ └── 6
│ └── 6 (126, 124) uint16
├── LC09_L2SP_043033_20251011_02_T1
│ ├── blue
│ │ ├── 0
│ │ │ └── 0 (7701, 7581) uint16
│ │ ├── 1
│ │ │ └── 1 (3851, 3791) uint16
│ │ ├── 2
│ │ │ └── 2 (1926, 1896) uint16
│ │ ├── 3
│ │ │ └── 3 (963, 948) uint16
│ │ ├── 4
│ │ │ └── 4 (482, 474) uint16
│ │ ├── 5
│ │ │ └── 5 (241, 237) uint16
│ │ └── 6
│ │ └── 6 (121, 119) uint16
│ ├── green
│ │ ├── 0
│ │ │ └── 0 (7701, 7581) uint16
│ │ ├── 1
│ │ │ └── 1 (3851, 3791) uint16
│ │ ├── 2
│ │ │ └── 2 (1926, 1896) uint16
│ │ ├── 3
│ │ │ └── 3 (963, 948) uint16
│ │ ├── 4
│ │ │ └── 4 (482, 474) uint16
│ │ ├── 5
│ │ │ └── 5 (241, 237) uint16
│ │ └── 6
│ │ └── 6 (121, 119) uint16
│ └── red
│ ├── 0
│ │ └── 0 (7701, 7581) uint16
│ ├── 1
│ │ └── 1 (3851, 3791) uint16
│ ├── 2
│ │ └── 2 (1926, 1896) uint16
│ ├── 3
│ │ └── 3 (963, 948) uint16
│ ├── 4
│ │ └── 4 (482, 474) uint16
│ ├── 5
│ │ └── 5 (241, 237) uint16
│ └── 6
│ └── 6 (121, 119) uint16
├── LC09_L2SP_043034_20251011_02_T1
│ ├── blue
│ │ ├── 0
│ │ │ └── 0 (7721, 7591) uint16
│ │ ├── 1
│ │ │ └── 1 (3861, 3796) uint16
│ │ ├── 2
│ │ │ └── 2 (1931, 1898) uint16
│ │ ├── 3
│ │ │ └── 3 (966, 949) uint16
│ │ ├── 4
│ │ │ └── 4 (483, 475) uint16
│ │ ├── 5
│ │ │ └── 5 (242, 238) uint16
│ │ └── 6
│ │ └── 6 (121, 119) uint16
│ ├── green
│ │ ├── 0
│ │ │ └── 0 (7721, 7591) uint16
│ │ ├── 1
│ │ │ └── 1 (3861, 3796) uint16
│ │ ├── 2
│ │ │ └── 2 (1931, 1898) uint16
│ │ ├── 3
│ │ │ └── 3 (966, 949) uint16
│ │ ├── 4
│ │ │ └── 4 (483, 475) uint16
│ │ ├── 5
│ │ │ └── 5 (242, 238) uint16
│ │ └── 6
│ │ └── 6 (121, 119) uint16
│ └── red
│ ├── 0
│ │ └── 0 (7721, 7591) uint16
│ ├── 1
│ │ └── 1 (3861, 3796) uint16
│ ├── 2
│ │ └── 2 (1931, 1898) uint16
│ ├── 3
│ │ └── 3 (966, 949) uint16
│ ├── 4
│ │ └── 4 (483, 475) uint16
│ ├── 5
│ │ └── 5 (242, 238) uint16
│ └── 6
│ └── 6 (121, 119) uint16
└── meta
├── asset_blue (6,) StringDType()
├── asset_green (6,) StringDType()
├── asset_red (6,) StringDType()
├── bbox (6,) object
├── created (6,) datetime64[ms]
├── datetime (6,) datetime64[ms]
├── description (6,) StringDType()
├── earthsearch:payload_id (6,) StringDType()
├── eo:cloud_cover (6,) float64
├── grid:code (6,) StringDType()
├── gsd (6,) int64
├── id (6,) StringDType()
├── landsat:cloud_cover_land (6,) float64
├── landsat:collection_category (6,) StringDType()
├── landsat:collection_number (6,) StringDType()
├── landsat:correction (6,) StringDType()
├── landsat:product_generated (6,) StringDType()
├── landsat:scene_id (6,) StringDType()
├── landsat:wrs_path (6,) StringDType()
├── landsat:wrs_row (6,) StringDType()
├── landsat:wrs_type (6,) StringDType()
├── platform (6,) StringDType()
├── proj:epsg (6,) int64
├── sci:doi (6,) StringDType()
├── shape_x (6,) int64
├── shape_y (6,) int64
├── transform_0 (6,) float64
├── transform_1 (6,) float64
├── transform_2 (6,) float64
├── transform_3 (6,) float64
├── transform_4 (6,) float64
├── transform_5 (6,) float64
├── updated (6,) datetime64[ms]
├── view:off_nadir (6,) int64
├── view:sun_azimuth (6,) float64
└── view:sun_elevation (6,) float64
7. Lazy merge¶
Define a target grid in UTM 11N (EPSG:32611) covering the Lake Tahoe area. Sources in UTM 10N will be reprojected on the fly during merge.
We use sortby="id"` to control compositing order.
bands = ["red", "green", "blue"]
result_arr, result_spatial, result_proj, time_coords = merge(
store=ic_session.store,
crs="EPSG:32611",
bbox=(200000.0, 4260000.0, 340000.0, 4490000.0),
resolution=30.0,
chunk_size=(1024, 1024),
bands=bands,
datafusion=True,
sortby="datetime",
nodata=0,
)
7b. Explain plan (dry-run)¶
Before running compute, use explain() to see exactly what source regions
each target chunk would read — without fetching any pixel data. This helps
diagnose performance and verify overview selection.
from lazymerge.explain import explain
plan = explain(
store=ic_session.store,
crs="EPSG:32611",
bbox=(200000.0, 4260000.0, 340000.0, 4490000.0),
resolution=30.0,
chunk_size=(1024, 1024),
bands=bands,
datafusion=True,
sortby="datetime",
nodata=0,
)
plan.to_dataframe()
| chunk_row | chunk_col | chunk_height | chunk_width | n_source_reads | source_path | source_crs | overview_path | native_resolution | overview_resolution | region_row_start | region_row_end | region_col_start | region_col_end | region_height | region_width | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 1024 | 1024 | 3 | LC09_L2SP_043032_20251011_02_T1/red/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 3263 | 4288 | 1637 | 2662 | 1025 | 1025 |
| 1 | 0 | 0 | 1024 | 1024 | 3 | LC09_L2SP_043032_20251011_02_T1/green/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 3263 | 4288 | 1637 | 2662 | 1025 | 1025 |
| 2 | 0 | 0 | 1024 | 1024 | 3 | LC09_L2SP_043032_20251011_02_T1/blue/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 3263 | 4288 | 1637 | 2662 | 1025 | 1025 |
| 3 | 0 | 1 | 1024 | 1024 | 3 | LC09_L2SP_043032_20251011_02_T1/red/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 3263 | 4288 | 2661 | 3686 | 1025 | 1025 |
| 4 | 0 | 1 | 1024 | 1024 | 3 | LC09_L2SP_043032_20251011_02_T1/green/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 3263 | 4288 | 2661 | 3686 | 1025 | 1025 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 304 | 7 | 4 | 499 | 571 | 9 | LC08_L2SP_042033_20251012_02_T1/green/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 5031 | 5531 | 2633 | 3205 | 500 | 572 |
| 305 | 7 | 4 | 499 | 571 | 9 | LC08_L2SP_042033_20251012_02_T1/blue/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 5031 | 5531 | 2633 | 3205 | 500 | 572 |
| 306 | 7 | 4 | 499 | 571 | 9 | LC08_L2SP_042034_20251012_02_T1/red/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 0 | 271 | 4013 | 4585 | 271 | 572 |
| 307 | 7 | 4 | 499 | 571 | 9 | LC08_L2SP_042034_20251012_02_T1/green/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 0 | 271 | 4013 | 4585 | 271 | 572 |
| 308 | 7 | 4 | 499 | 571 | 9 | LC08_L2SP_042034_20251012_02_T1/blue/0 | EPSG:32611 | 0 | 30.0 | 30.0 | 0 | 271 | 4013 | 4585 | 271 | 572 |
309 rows × 16 columns
8. Open the lazy Cubed array with xarray¶
da = xr.DataArray(result_arr, dims=["bands", "y", "x"])
affine = Affine(*result_spatial.transform)
raster_idx = RasterIndex.from_transform(
affine=affine,
width=da.sizes["x"],
height=da.sizes["y"],
x_dim="x",
y_dim="y",
crs=result_proj.code,
)
coords = xr.Coordinates.from_xindex(raster_idx)
da = da.assign_coords(coords)
da = da.proj.assign_crs(spatial_ref=result_proj.code, allow_override=True)
da = da.assign_coords({"bands": bands})
da
<xarray.DataArray 'array-004' (bands: 3, y: 7667, x: 4667)> Size: 429MB
cubed.Array<array-004, shape=(3, 7667, 4667), dtype=float32, chunks=((1, 1, 1), (1024, 1024, 1024, 1024, 1024, 1024, 1024, 499), (1024, 1024, 1024, 1024, 571))>
Coordinates:
* bands (bands) <U5 60B 'red' 'green' 'blue'
* y (y) float64 61kB 4.49e+06 4.49e+06 ... 4.26e+06 4.26e+06
* x (x) float64 37kB 2e+05 2e+05 2.001e+05 ... 3.4e+05 3.4e+05
* spatial_ref int64 8B 0
Indexes:
┌ x RasterIndex (crs=EPSG:32611)
└ y
spatial_ref CRSIndex (crs=EPSG:32611)9. Compute and visualize¶
Compute to materialize the array and then plot it.
da = da.compute()
da
<xarray.DataArray 'array-004' (bands: 3, y: 7667, x: 4667)> Size: 429MB
array([[[29472., 28274., 26576., ..., 12613., 12388., 12296.],
[24593., 23967., 20958., ..., 12663., 13021., 12416.],
[16734., 17361., 15448., ..., 12481., 13763., 13227.],
...,
[ 7672., 8198., 8366., ..., 11542., 11750., 11939.],
[ 7704., 8113., 8495., ..., 11527., 11862., 12243.],
[ 7887., 8010., 8582., ..., 11931., 11721., 12248.]],
[[28647., 27264., 25150., ..., 11548., 11376., 11334.],
[22637., 22411., 19189., ..., 11544., 11798., 11368.],
[14974., 15456., 13774., ..., 11404., 12367., 11998.],
...,
[ 7831., 8421., 8566., ..., 10844., 11005., 11142.],
[ 7855., 8397., 8793., ..., 10819., 11078., 11359.],
[ 7885., 8181., 8748., ..., 11144., 10981., 11372.]],
[[29795., 28273., 26883., ..., 10193., 10105., 10051.],
[26258., 23727., 20988., ..., 10325., 10346., 10143.],
[17353., 17270., 14127., ..., 10134., 10825., 10439.],
...,
[ 7531., 7805., 7914., ..., 9563., 9661., 9907.],
[ 7359., 7617., 7910., ..., 9689., 9828., 10043.],
[ 7508., 7582., 7984., ..., 9860., 9812., 10051.]]],
shape=(3, 7667, 4667), dtype=float32)
Coordinates:
* bands (bands) <U5 60B 'red' 'green' 'blue'
* y (y) float64 61kB 4.49e+06 4.49e+06 ... 4.26e+06 4.26e+06
* x (x) float64 37kB 2e+05 2e+05 2.001e+05 ... 3.4e+05 3.4e+05
* spatial_ref int64 8B 0
Indexes:
┌ x RasterIndex (crs=EPSG:32611)
└ y
spatial_ref CRSIndex (crs=EPSG:32611)da.plot.imshow(rgb="bands", robust=True, figsize=(12, 8))
plt.title("Landsat rgb — merged from UTM 10N + 11N sources")
plt.tight_layout();
10. Use additional SQL filters¶
We can also add additional SQL filters to narrow the criteria for source granules
result_arr, result_spatial, result_proj, time_coord = merge(
store=ic_session.store,
crs="EPSG:32611",
bbox=(200000.0, 4260000.0, 340000.0, 4490000.0),
resolution=30.0,
chunk_size=(1024, 1024),
bands=["red", "green", "blue"],
datafusion=True,
sortby="datetime",
nodata=0,
sql_filter='"eo:cloud_cover" < 30',
)
da = xr.DataArray(result_arr, dims=["bands", "y", "x"])
affine = Affine(*result_spatial.transform)
raster_idx = RasterIndex.from_transform(
affine=affine,
width=da.sizes["x"],
height=da.sizes["y"],
x_dim="x",
y_dim="y",
crs=result_proj.code,
)
coords = xr.Coordinates.from_xindex(raster_idx)
da = da.assign_coords(coords)
da = da.proj.assign_crs(spatial_ref=result_proj.code, allow_override=True)
da.plot.imshow(rgb="bands", robust=True, figsize=(12, 8))
plt.title("Landsat rgb — cloud_cover < 30")
plt.tight_layout();
11. Temporal grouping¶
Merge also supports a temporal_grouping parameter that allows aggregation into temporal bins and returns a coordinate array that can be used when constructing an Xarray DataArray.
bands = ["red", "green", "blue"]
result_arr, result_spatial, result_proj, time_coords = merge(
store=ic_session.store,
crs="EPSG:32611",
bbox=(200000.0, 4260000.0, 340000.0, 4490000.0),
resolution=30.0,
chunk_size=(1024, 1024),
bands=bands,
datafusion=True,
sortby="datetime",
nodata=0,
temporal_grouping="P1D",
)
da = xr.DataArray(result_arr, dims=["time", "bands", "y", "x"])
affine = Affine(*result_spatial.transform)
raster_idx = RasterIndex.from_transform(
affine=affine,
width=da.sizes["x"],
height=da.sizes["y"],
x_dim="x",
y_dim="y",
crs=result_proj.code,
)
coords = xr.Coordinates.from_xindex(raster_idx)
da = da.assign_coords(coords)
da = da.proj.assign_crs(spatial_ref=result_proj.code, allow_override=True)
da = da.assign_coords({"time": time_coords, "bands": bands})
da
<xarray.DataArray 'array-012' (time: 2, bands: 3, y: 7667, x: 4667)> Size: 859MB
cubed.Array<array-012, shape=(2, 3, 7667, 4667), dtype=float32, chunks=((1, 1), (1, 1, 1), (1024, 1024, 1024, 1024, 1024, 1024, 1024, 499), (1024, 1024, 1024, 1024, 571))>
Coordinates:
* time (time) datetime64[s] 16B 2025-10-11 2025-10-12
* bands (bands) <U5 60B 'red' 'green' 'blue'
* y (y) float64 61kB 4.49e+06 4.49e+06 ... 4.26e+06 4.26e+06
* x (x) float64 37kB 2e+05 2e+05 2.001e+05 ... 3.4e+05 3.4e+05
* spatial_ref int64 8B 0
Indexes:
┌ x RasterIndex (crs=EPSG:32611)
└ y
spatial_ref CRSIndex (crs=EPSG:32611)da = da.sel(time="2025-10-12")
da
<xarray.DataArray 'array-012' (bands: 3, y: 7667, x: 4667)> Size: 429MB
cubed.Array<array-014, shape=(3, 7667, 4667), dtype=float32, chunks=((1, 1, 1), (1024, 1024, 1024, 1024, 1024, 1024, 1024, 499), (1024, 1024, 1024, 1024, 571))>
Coordinates:
* bands (bands) <U5 60B 'red' 'green' 'blue'
* y (y) float64 61kB 4.49e+06 4.49e+06 ... 4.26e+06 4.26e+06
* x (x) float64 37kB 2e+05 2e+05 2.001e+05 ... 3.4e+05 3.4e+05
* spatial_ref int64 8B 0
time datetime64[s] 8B 2025-10-12
Indexes:
┌ x RasterIndex (crs=EPSG:32611)
└ y
spatial_ref CRSIndex (crs=EPSG:32611)da.plot.imshow(rgb="bands", robust=True, figsize=(12, 8))
plt.title("Landsat rgb 2025-10-12")
plt.tight_layout();