Converting a Cloud-Optimized GeoTIFF to Zarr¶
This notebook walks through a complete end-to-end workflow: opening a remote Cloud-Optimized GeoTIFF, extracting geospatial metadata, writing a multi-resolution Zarr V3 store with all three conventions (proj:, spatial:, multiscales), and validating the result.
We use the Sentinel-2 L2A TCI (true-color) band from the async-geotiff example.
Prerequisites: The proj: Convention | Composition
Step 1: Open the remote COG¶
We use async-geotiff to open the Cloud-Optimized GeoTIFF directly from S3. The GeoTIFF object exposes the geospatial properties we need — crs, transform, bounds, and shape — without reading any pixel data.
import json
from geozarr_toolkit import (
MultiscalesConventionMetadata,
ProjConventionMetadata,
SpatialConventionMetadata,
create_multiscales_layout,
create_proj_attrs,
create_spatial_attrs,
create_zarr_conventions,
)
# Set to True to write to S3, False to use a local store
USE_S3 = False
from async_geotiff import GeoTIFF
from obstore.store import S3Store
store = S3Store("sentinel-cogs", region="us-west-2", skip_signature=True)
path = "sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2B_12SUF_20220609_0_L2A/TCI.tif"
geotiff = await GeoTIFF.open(path, store=store)
print(f"CRS: {geotiff.crs}")
print(f"Transform: {geotiff.transform}")
print(f"Shape: {geotiff.shape}")
print(f"Bounds: {geotiff.bounds}")
print(f"Bands: {geotiff.count}")
print(f"Dtype: {geotiff.dtype}")
CRS: EPSG:32612 Transform: | 10.00, 0.00, 300000.00| | 0.00,-10.00, 4100040.00| | 0.00, 0.00, 1.00| Shape: (10980, 10980) Bounds: (300000.0, 3990240.0, 409800.0, 4100040.0) Bands: 3 Dtype: uint8
Step 2: Build convention metadata from the COG¶
The GeoTIFF's properties map directly to convention attributes. The COG also contains internal overviews (reduced-resolution copies) which map naturally to the multiscales convention — each overview becomes a scale level.
geotiff.crs.to_epsg()→proj:codegeotiff.transform(Affine coefficients) →spatial:transformgeotiff.shape→spatial:shapegeotiff.bounds→spatial:bboxgeotiff.overviews→multiscaleslayout
# Build proj: and spatial: attributes from the GeoTIFF's properties
t = geotiff.transform
geozarr_attrs = create_proj_attrs(code=f"EPSG:{geotiff.crs.to_epsg()}")
geozarr_attrs.update(
create_spatial_attrs(
dimensions=["Y", "X"],
bbox=list(geotiff.bounds),
)
)
# Build multiscales layout from the COG's overviews
# The base (full-resolution) image is level 0; each overview is a coarser level.
base_res = t.a # pixel width of the base level
levels = [
{"asset": "0", "transform": {"scale": [1.0, 1.0], "translation": [0.0, 0.0]}},
]
for i, overview in enumerate(geotiff.overviews):
ov_res = overview.transform.a
scale_factor = ov_res / base_res
levels.append(
{
"asset": str(i + 1),
"derived_from": "0",
"transform": {
"scale": [scale_factor, scale_factor],
"translation": [0.0, 0.0],
},
}
)
geozarr_attrs.update(create_multiscales_layout(levels))
geozarr_attrs["zarr_conventions"] = create_zarr_conventions(
MultiscalesConventionMetadata(),
ProjConventionMetadata(),
SpatialConventionMetadata(),
)
print(f"Base resolution: {base_res} m")
print(f"Overview levels: {len(geotiff.overviews)}")
for i, overview in enumerate(geotiff.overviews):
print(
f" Overview {i+1}: {overview.width}x{overview.height} px, {overview.transform.a:.1f} m/px"
)
print()
print(json.dumps(geozarr_attrs, indent=2))
Base resolution: 10.0 m
Overview levels: 4
Overview 1: 5490x5490 px, 20.0 m/px
Overview 2: 2745x2745 px, 40.0 m/px
Overview 3: 1373x1373 px, 80.0 m/px
Overview 4: 687x687 px, 159.8 m/px
{
"proj:code": "EPSG:32612",
"spatial:dimensions": [
"Y",
"X"
],
"spatial:bbox": [
300000.0,
3990240.0,
409800.0,
4100040.0
],
"spatial:transform_type": "affine",
"spatial:registration": "pixel",
"multiscales": {
"layout": [
{
"asset": "0",
"transform": {
"scale": [
1.0,
1.0
],
"translation": [
0.0,
0.0
]
}
},
{
"asset": "1",
"derived_from": "0",
"transform": {
"scale": [
2.0,
2.0
],
"translation": [
0.0,
0.0
]
}
},
{
"asset": "2",
"derived_from": "0",
"transform": {
"scale": [
4.0,
4.0
],
"translation": [
0.0,
0.0
]
}
},
{
"asset": "3",
"derived_from": "0",
"transform": {
"scale": [
7.997086671522214,
7.997086671522214
],
"translation": [
0.0,
0.0
]
}
},
{
"asset": "4",
"derived_from": "0",
"transform": {
"scale": [
15.982532751091702,
15.982532751091702
],
"translation": [
0.0,
0.0
]
}
}
]
},
"zarr_conventions": [
{
"uuid": "d35379db-88df-4056-af3a-620245f8e347",
"schema_url": "https://raw.githubusercontent.com/zarr-conventions/multiscales/refs/tags/v1/schema.json",
"spec_url": "https://github.com/zarr-conventions/multiscales/blob/v1/README.md",
"name": "multiscales",
"description": "Multiscale layout of zarr datasets"
},
{
"uuid": "f17cb550-5864-4468-aeb7-f3180cfb622f",
"schema_url": "https://raw.githubusercontent.com/zarr-experimental/geo-proj/refs/tags/v1/schema.json",
"spec_url": "https://github.com/zarr-experimental/geo-proj/blob/v1/README.md",
"name": "proj:",
"description": "Coordinate reference system information for geospatial data"
},
{
"uuid": "689b58e2-cf7b-45e0-9fff-9cfc0883d6b4",
"schema_url": "https://raw.githubusercontent.com/zarr-conventions/spatial/refs/tags/v1/schema.json",
"spec_url": "https://github.com/zarr-conventions/spatial/blob/v1/README.md",
"name": "spatial:",
"description": "Spatial coordinate and transformation information"
}
]
}
Step 3: Read and write to Zarr V3 with multiscales¶
We read the full-resolution image and each overview, writing them as separate child arrays in a Zarr V3 store. Set USE_S3 above to control the output destination:
USE_S3 = True: writes to a remote S3 bucket via obstore'sS3StoreUSE_S3 = False: writes to a local directory via Zarr'sLocalStore
import zarr
from zarr.storage import LocalStore, ObjectStore
bucket = "us-west-2.opendata.source.coop"
prefix = "pangeo/geozarr-examples/TCI.zarr"
local_path = "data/TCI.zarr"
if USE_S3:
output_store = S3Store(bucket, prefix=prefix, region="us-west-2")
zarr_store = ObjectStore(output_store)
else:
zarr_store = LocalStore(local_path)
root: zarr.Group = zarr.open_group(zarr_store, mode="w", zarr_format=3)
# Set convention attributes on the group
root.attrs.update(geozarr_attrs)
# Write the full-resolution image as level "0"
base_array = await geotiff.read()
root.create_array("0", data=base_array.data, chunks=(3, 512, 512))
print(f"Level 0 (base): shape={base_array.data.shape}, dtype={base_array.data.dtype}")
# Write each overview as a separate level
for i, overview in enumerate(geotiff.overviews):
ov_array = await overview.read()
root.create_array(str(i + 1), data=ov_array.data, chunks=(3, 512, 512))
print(f"Level {i+1} (overview): shape={ov_array.data.shape}")
location = f"s3://{bucket}/{prefix}" if USE_S3 else local_path
print(f"\nWrote Zarr V3 store to {location}")
Level 0 (base): shape=(3, 10980, 10980), dtype=uint8 Level 1 (overview): shape=(3, 5490, 5490) Level 2 (overview): shape=(3, 2745, 2745) Level 3 (overview): shape=(3, 1373, 1373) Level 4 (overview): shape=(3, 687, 687) Wrote Zarr V3 store to data/TCI.zarr
Step 4: Validate the Zarr store¶
We reopen the store and use validate_group to confirm the conventions are correctly applied.
from geozarr_toolkit import detect_conventions, validate_group
# Reopen and validate
if USE_S3:
read_store = S3Store(bucket, prefix=prefix, region="us-west-2", skip_signature=True)
zarr_store = ObjectStore(read_store)
else:
zarr_store = LocalStore(local_path)
root = zarr.open_group(zarr_store, mode="r")
detected = detect_conventions(dict(root.attrs))
print(f"Detected conventions: {detected}")
results = validate_group(root)
for conv, errors in results.items():
status = "PASS" if not errors else "FAIL"
print(f" [{status}] {conv}")
for err in errors:
print(f" {err}")
print(f"\nStore tree:")
root.tree()
Detected conventions: ['spatial', 'proj', 'multiscales'] [PASS] spatial [PASS] proj [PASS] multiscales [PASS] zarr_conventions Store tree:
/ ├── 0 (3, 10980, 10980) uint8 ├── 1 (3, 5490, 5490) uint8 ├── 2 (3, 2745, 2745) uint8 ├── 3 (3, 1373, 1373) uint8 └── 4 (3, 687, 687) uint8
Summary¶
This notebook demonstrated the full workflow from COG to convention-compliant Zarr V3:
- Open a remote COG with async-geotiff (no pixel data read)
- Extract CRS, transform, bounds, and overview structure
- Map these properties to proj:, spatial:, and multiscales convention attributes
- Write the full image and all overview levels to a remote Zarr V3 store on S3
- Validate that the store conforms to all three conventions
The same pattern applies to any georeferenced raster — the convention attributes are derived from standard geospatial properties that every GeoTIFF provides.