lazymerge conventions¶
Lazily merge geospatial Zarr arrays across CRS boundaries using Cubed.
This notebook creates synthetic source arrays in two different UTM zones (EPSG:32618 and EPSG:32617), merges them into a single target grid, and visualizes the result with xarray.
The store parameter accepts any zarr-compatible store (zarr Store, Icechunk Session, or a string path).
For DataFusion-based source discovery, see the demo_datafusion notebook.
import numpy as np
import xarray as xr
import zarr
from affine import Affine
from rasterix import RasterIndex
from lazymerge.conventions import (
ProjAttrs,
SpatialAttrs,
read_proj,
read_spatial,
write_proj,
write_spatial,
)
from lazymerge.merge import merge
from lazymerge.sources import scan_store
from lazymerge.target import to_zarr
1. Create synthetic source arrays¶
Four tiles arranged in a 2x2 grid. The left two tiles are in UTM 18N (EPSG:32618), and the right two are in UTM 17N (EPSG:32617) — covering adjacent geographic areas but stored in different coordinate reference systems.
store = zarr.storage.MemoryStore()
root = zarr.open_group(store, mode="w")
sources = [
# UTM 18N tiles (left side)
{
"name": "utm18n_tile_a",
"crs": "EPSG:32618",
"bbox": (500000.0, 5990000.0, 505000.0, 5995000.0),
"resolution": 10.0,
"fill": 100.0,
},
{
"name": "utm18n_tile_c",
"crs": "EPSG:32618",
"bbox": (500000.0, 5995000.0, 505000.0, 6000000.0),
"resolution": 10.0,
"fill": 300.0,
},
# UTM 17N tiles (right side — same geographic area as UTM 18N
# x=[503000, 508000], but expressed in UTM zone 17N coordinates)
{
"name": "utm17n_tile_b",
"crs": "EPSG:32617",
"bbox": (895094.0, 6006920.0, 900511.0, 6012337.0),
"resolution": 10.0,
"fill": 200.0,
},
{
"name": "utm17n_tile_d",
"crs": "EPSG:32617",
"bbox": (894669.0, 6011912.0, 900086.0, 6017329.0),
"resolution": 10.0,
"fill": 400.0,
},
]
for src in sources:
xmin, ymin, xmax, ymax = src["bbox"]
res = src["resolution"]
width = int((xmax - xmin) / res)
height = int((ymax - ymin) / res)
arr = root.create_array(src["name"], shape=(height, width), dtype="f4", chunks=(256, 256))
arr[:] = src["fill"]
write_spatial(
arr,
SpatialAttrs(
dimensions=["y", "x"],
transform=(res, 0.0, xmin, 0.0, -res, ymax),
bbox=(xmin, ymin, xmax, ymax),
shape=(height, width),
),
)
write_proj(arr, ProjAttrs(code=src["crs"]))
2. Build source index¶
scan_store walks the Zarr group, reads spatial: and proj: convention attributes
from each array, and builds a spatial index for fast intersection queries.
index = scan_store(root)
for _entry in index.entries:
pass
3. Lazy merge¶
Define the output grid in UTM 18N spanning all four source tiles and lazily merge.
merge returns a lazy Cubed array — no computation happens until .compute() is called.
Each chunk independently finds intersecting sources, reads their data, and warps it into
the target grid. The UTM 17N sources will be reprojected into this grid during merge.
result_arr, result_spatial, result_proj, time_coords = merge(
store=store,
crs="EPSG:32618",
bbox=(500000.0, 5990000.0, 508000.0, 6000000.0),
resolution=10.0,
chunk_size=(256, 256),
source_index=index,
)
4. Open the lazy Cubed array with xarray¶
da = xr.DataArray(result_arr, dims=["y", "x"])
# Build georeferenced coordinates using rasterix
affine = Affine(*result_spatial.transform)
index = RasterIndex.from_transform(
affine=affine,
width=da.sizes["x"],
height=da.sizes["y"],
x_dim="x",
y_dim="y",
crs=result_proj.code,
)
coords = xr.Coordinates.from_xindex(index)
da = da.assign_coords(coords)
da = da.proj.assign_crs(spatial_ref=result_proj.code, allow_override=True)
da
<xarray.DataArray 'array-006' (y: 1000, x: 800)> Size: 3MB
cubed.Array<array-006, shape=(1000, 800), dtype=float32, chunks=((256, 256, 256, 232), (256, 256, 256, 32))>
Coordinates:
* y (y) float64 8kB 6e+06 6e+06 6e+06 ... 5.99e+06 5.99e+06
* x (x) float64 6kB 5e+05 5e+05 5e+05 ... 5.08e+05 5.08e+05
* spatial_ref int64 8B 0
Indexes:
┌ x RasterIndex (crs=EPSG:32618)
└ y
spatial_ref CRSIndex (crs=EPSG:32618)5. Compute and visualize¶
Compute the result and plot it.
da = da.compute()
da
<xarray.DataArray 'array-006' (y: 1000, x: 800)> Size: 3MB
array([[300., 300., 300., ..., 400., 400., 400.],
[300., 300., 300., ..., 400., 400., 400.],
[300., 300., 300., ..., 400., 400., 400.],
...,
[100., 100., 100., ..., 200., 200., 200.],
[100., 100., 100., ..., 200., 200., nan],
[100., 100., 100., ..., 200., 200., nan]],
shape=(1000, 800), dtype=float32)
Coordinates:
* y (y) float64 8kB 6e+06 6e+06 6e+06 ... 5.99e+06 5.99e+06
* x (x) float64 6kB 5e+05 5e+05 5e+05 ... 5.08e+05 5.08e+05
* spatial_ref int64 8B 0
Indexes:
┌ x RasterIndex (crs=EPSG:32618)
└ y
spatial_ref CRSIndex (crs=EPSG:32618)da.plot(figsize=(12, 6), cmap="viridis", add_colorbar=True);
5. Materialize to Zarr and read back¶
Write the merged result to a new Zarr store with convention metadata, then read it back to verify the round-trip.
output_store = zarr.storage.MemoryStore()
to_zarr(result_arr, result_spatial, result_proj, output_store, path="mosaic")
out_root = zarr.open_group(output_store, mode="r")
out_arr = out_root["mosaic"]
out_spatial = read_spatial(out_arr)
out_proj = read_proj(out_arr)
# Show zarr_conventions metadata written by zarr-cm
conventions = out_arr.attrs.get("zarr_conventions", [])
if conventions:
for _conv in conventions:
pass
6. Per-source visualization¶
Plot each source tile's contribution individually. The four fill values (100, 200, 300, 400) correspond to the four quadrants of the mosaic. NaN regions are where a tile doesn't cover the target grid.
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(14, 10), sharex=True, sharey=True)
tiles = [
(100.0, "utm18n_tile_a (EPSG:32618, fill=100)"),
(200.0, "utm17n_tile_b (EPSG:32617, fill=200)"),
(300.0, "utm18n_tile_c (EPSG:32618, fill=300)"),
(400.0, "utm17n_tile_d (EPSG:32617, fill=400)"),
]
for ax, (fill_val, title) in zip(axes.flat, tiles, strict=False):
mask = xr.where(da == fill_val, da, np.nan)
mask.plot(ax=ax, cmap="viridis", vmin=100, vmax=400, add_colorbar=False)
ax.set_title(title)
ax.set_aspect("equal")
fig.suptitle("Individual source tile contributions", fontsize=14, y=1.01)
fig.tight_layout();