The proj: Convention¶
The proj: convention encodes Coordinate Reference System (CRS) information for geospatial data stored in Zarr format. It answers the question "what coordinate system is this data in?" using one of three standard encodings.
This notebook covers:
- The three CRS encoding methods: EPSG code, WKT2, and PROJJSON
- Convention registration via
zarr_conventions - Validation
- Converting between CRS formats with pyproj
See also:
- Inheritance — how CRS metadata propagates from groups to arrays
- Composition — combining proj: with spatial: and multiscales
- COG to Zarr — end-to-end conversion of a Cloud-Optimized GeoTIFF
Example Dataset¶
Throughout this notebook we use a Sentinel-2 L2A scene from the sentinel-cogs bucket on AWS as our running example (following the async-geotiff demo).
The scene is tile 12/S/UF acquired on 2022-06-09. Its key geospatial properties are:
| Property | Value |
|---|---|
| CRS | EPSG:32612 (WGS 84 / UTM zone 12N) |
| Pixel size | 10 m (TCI band) |
| Origin | (300000.0, 4100040.0) |
| Dimensions | 10980 rows x 10980 columns |
| Bounding box | 300000.0, 3990240.0, 409800.0, 4100040.0 |
Sentinel-2 is a good example because it has bands at three native resolutions (10 m, 20 m, 60 m) that all share the same CRS — a natural fit for group-level inheritance and multiscale composition.
import json
from pyproj import CRS
# The CRS for our Sentinel-2 scene
crs = CRS.from_epsg(32612)
print(crs)
EPSG:32612
Overview¶
The proj: convention defines three properties, all using the proj: namespace prefix:
| Property | Type | Description |
|---|---|---|
proj:code |
string | Authority:code identifier (e.g., EPSG:4326) |
proj:wkt2 |
string | WKT2 (ISO 19162) CRS representation |
proj:projjson |
object | PROJJSON CRS representation |
Exactly one of these must be provided. The convention can be applied to both Zarr groups and arrays.
Method 1: EPSG Code¶
The simplest way to specify a CRS is with an authority:code identifier. The proj:code string follows the pattern AUTHORITY:CODE and must match ^[A-Z]+:[0-9]+$.
Known projection authorities include:
| Authority | Description |
|---|---|
| EPSG | European Petroleum Survey Group |
| IAU | International Astronomical Union (e.g., IAU_2015:30100) |
| OGC | Open Geospatial Consortium |
| ESRI | Esri spatial references |
This is the preferred method when a well-known code exists for the CRS, because it's compact and unambiguous.
from geozarr_toolkit import create_proj_attrs
# Our Sentinel-2 scene uses UTM zone 12N
attrs = create_proj_attrs(code="EPSG:32612")
print(json.dumps(attrs, indent=2))
{
"proj:code": "EPSG:32612"
}
Method 2: WKT2¶
WKT2 (ISO 19162:2019) provides a full textual CRS representation. It is useful when:
- No valid authority code exists for the CRS
- You need the full CRS definition to be self-contained in the metadata
- The CRS uses custom parameters not captured by a registered code
Here we use pyproj to obtain the WKT2 string for the same Sentinel-2 CRS.
# The same UTM zone 12N CRS, expressed as WKT2
wkt2_string = crs.to_wkt()
attrs = create_proj_attrs(wkt2=wkt2_string)
print(json.dumps(attrs, indent=2))
{
"proj:wkt2": "PROJCRS[\"WGS 84 / UTM zone 12N\",BASEGEOGCRS[\"WGS 84\",ENSEMBLE[\"World Geodetic System 1984 ensemble\",MEMBER[\"World Geodetic System 1984 (Transit)\"],MEMBER[\"World Geodetic System 1984 (G730)\"],MEMBER[\"World Geodetic System 1984 (G873)\"],MEMBER[\"World Geodetic System 1984 (G1150)\"],MEMBER[\"World Geodetic System 1984 (G1674)\"],MEMBER[\"World Geodetic System 1984 (G1762)\"],MEMBER[\"World Geodetic System 1984 (G2139)\"],MEMBER[\"World Geodetic System 1984 (G2296)\"],ELLIPSOID[\"WGS 84\",6378137,298.257223563,LENGTHUNIT[\"metre\",1]],ENSEMBLEACCURACY[2.0]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433]],ID[\"EPSG\",4326]],CONVERSION[\"UTM zone 12N\",METHOD[\"Transverse Mercator\",ID[\"EPSG\",9807]],PARAMETER[\"Latitude of natural origin\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8801]],PARAMETER[\"Longitude of natural origin\",-111,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"Scale factor at natural origin\",0.9996,SCALEUNIT[\"unity\",1],ID[\"EPSG\",8805]],PARAMETER[\"False easting\",500000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8806]],PARAMETER[\"False northing\",0,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8807]]],CS[Cartesian,2],AXIS[\"(E)\",east,ORDER[1],LENGTHUNIT[\"metre\",1]],AXIS[\"(N)\",north,ORDER[2],LENGTHUNIT[\"metre\",1]],USAGE[SCOPE[\"Navigation and medium accuracy spatial referencing.\"],AREA[\"Between 114\u00b0W and 108\u00b0W, northern hemisphere between equator and 84\u00b0N, onshore and offshore. Canada - Alberta; Northwest Territories (NWT); Nunavut; Saskatchewan. Mexico. United States (USA).\"],BBOX[0,-114,84,-108]],ID[\"EPSG\",32612]]"
}
Method 3: PROJJSON¶
PROJJSON is a JSON encoding of CRS definitions following the PROJ specification. Since it's a native JSON object, it integrates naturally with Zarr's JSON-based metadata and can be validated against the PROJJSON schema.
# The same UTM zone 12N CRS, expressed as PROJJSON
projjson_obj = crs.to_json_dict()
attrs = create_proj_attrs(projjson=projjson_obj)
print(json.dumps(attrs, indent=2))
{
"proj:projjson": {
"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json",
"type": "ProjectedCRS",
"name": "WGS 84 / UTM zone 12N",
"base_crs": {
"name": "WGS 84",
"datum_ensemble": {
"name": "World Geodetic System 1984 ensemble",
"members": [
{
"name": "World Geodetic System 1984 (Transit)",
"id": {
"authority": "EPSG",
"code": 1166
}
},
{
"name": "World Geodetic System 1984 (G730)",
"id": {
"authority": "EPSG",
"code": 1152
}
},
{
"name": "World Geodetic System 1984 (G873)",
"id": {
"authority": "EPSG",
"code": 1153
}
},
{
"name": "World Geodetic System 1984 (G1150)",
"id": {
"authority": "EPSG",
"code": 1154
}
},
{
"name": "World Geodetic System 1984 (G1674)",
"id": {
"authority": "EPSG",
"code": 1155
}
},
{
"name": "World Geodetic System 1984 (G1762)",
"id": {
"authority": "EPSG",
"code": 1156
}
},
{
"name": "World Geodetic System 1984 (G2139)",
"id": {
"authority": "EPSG",
"code": 1309
}
},
{
"name": "World Geodetic System 1984 (G2296)",
"id": {
"authority": "EPSG",
"code": 1383
}
}
],
"ellipsoid": {
"name": "WGS 84",
"semi_major_axis": 6378137,
"inverse_flattening": 298.257223563
},
"accuracy": "2.0",
"id": {
"authority": "EPSG",
"code": 6326
}
},
"coordinate_system": {
"subtype": "ellipsoidal",
"axis": [
{
"name": "Geodetic latitude",
"abbreviation": "Lat",
"direction": "north",
"unit": "degree"
},
{
"name": "Geodetic longitude",
"abbreviation": "Lon",
"direction": "east",
"unit": "degree"
}
]
},
"id": {
"authority": "EPSG",
"code": 4326
}
},
"conversion": {
"name": "UTM zone 12N",
"method": {
"name": "Transverse Mercator",
"id": {
"authority": "EPSG",
"code": 9807
}
},
"parameters": [
{
"name": "Latitude of natural origin",
"value": 0,
"unit": "degree",
"id": {
"authority": "EPSG",
"code": 8801
}
},
{
"name": "Longitude of natural origin",
"value": -111,
"unit": "degree",
"id": {
"authority": "EPSG",
"code": 8802
}
},
{
"name": "Scale factor at natural origin",
"value": 0.9996,
"unit": "unity",
"id": {
"authority": "EPSG",
"code": 8805
}
},
{
"name": "False easting",
"value": 500000,
"unit": "metre",
"id": {
"authority": "EPSG",
"code": 8806
}
},
{
"name": "False northing",
"value": 0,
"unit": "metre",
"id": {
"authority": "EPSG",
"code": 8807
}
}
]
},
"coordinate_system": {
"subtype": "Cartesian",
"axis": [
{
"name": "Easting",
"abbreviation": "E",
"direction": "east",
"unit": "metre"
},
{
"name": "Northing",
"abbreviation": "N",
"direction": "north",
"unit": "metre"
}
]
},
"scope": "Navigation and medium accuracy spatial referencing.",
"area": "Between 114\u00b0W and 108\u00b0W, northern hemisphere between equator and 84\u00b0N, onshore and offshore. Canada - Alberta; Northwest Territories (NWT); Nunavut; Saskatchewan. Mexico. United States (USA).",
"bbox": {
"south_latitude": 0,
"west_longitude": -114,
"north_latitude": 84,
"east_longitude": -108
},
"id": {
"authority": "EPSG",
"code": 32612
}
}
}
All three methods describe the same CRS — the choice depends on your use case:
| Method | When to use |
|---|---|
proj:code |
A well-known authority code exists (most common) |
proj:wkt2 |
Self-contained text representation needed, or no authority code exists |
proj:projjson |
JSON-native representation preferred, or detailed CRS structure needed |
Convention Registration¶
Every Zarr convention must be registered in the zarr_conventions array in the node's attributes. This array identifies which conventions are in use and provides links to their schemas and specifications.
A convention entry must include at least one of uuid, schema_url, or spec_url to be identifiable.
from geozarr_toolkit import ProjConventionMetadata, create_zarr_conventions
conventions = create_zarr_conventions(ProjConventionMetadata())
print(json.dumps(conventions, indent=2))
[
{
"uuid": "f17cb550-5864-4468-aeb7-f3180cfb622f",
"schema_url": "https://raw.githubusercontent.com/zarr-experimental/geo-proj/refs/tags/v1/schema.json",
"spec_url": "https://github.com/zarr-experimental/geo-proj/blob/v1/README.md",
"name": "proj:",
"description": "Coordinate reference system information for geospatial data"
}
]
The convention entry contains:
- uuid (
f17cb550-...): Permanent identifier for the proj: convention - schema_url: Link to the JSON Schema used for machine validation
- spec_url: Link to the human-readable specification
- name: The namespace prefix (
proj:) - description: Brief summary of the convention's purpose
Putting It Together¶
Here's what the complete Zarr V3 metadata looks like for a Sentinel-2 group using the proj: convention. This is the structure that would appear in the group's zarr.json file.
# Complete zarr.json metadata for the Sentinel-2 TCI group
full_attrs = create_proj_attrs(code="EPSG:32612")
full_attrs["zarr_conventions"] = create_zarr_conventions(ProjConventionMetadata())
zarr_metadata = {
"zarr_format": 3,
"node_type": "group",
"attributes": full_attrs,
}
print(json.dumps(zarr_metadata, indent=2))
{
"zarr_format": 3,
"node_type": "group",
"attributes": {
"proj:code": "EPSG:32612",
"zarr_conventions": [
{
"uuid": "f17cb550-5864-4468-aeb7-f3180cfb622f",
"schema_url": "https://raw.githubusercontent.com/zarr-experimental/geo-proj/refs/tags/v1/schema.json",
"spec_url": "https://github.com/zarr-experimental/geo-proj/blob/v1/README.md",
"name": "proj:",
"description": "Coordinate reference system information for geospatial data"
}
]
}
}
Validation¶
The validate_proj helper checks that attributes conform to the convention. It returns a (is_valid, errors) tuple. The key rule is that exactly one of proj:code, proj:wkt2, or proj:projjson must be present.
from geozarr_toolkit import validate_proj
# Valid: our Sentinel-2 scene's CRS
is_valid, errors = validate_proj({"proj:code": "EPSG:32612"})
print(f"Valid: {is_valid}, Errors: {errors}")
Valid: True, Errors: []
# Invalid: no CRS encoding provided
is_valid, errors = validate_proj({})
print(f"Valid: {is_valid}")
for error in errors:
print(f" {error}")
Valid: False
{'type': 'value_error', 'loc': (), 'msg': 'Value error, At least one of proj:code, proj:wkt2, or proj:projjson must be provided', 'input': {}, 'ctx': {'error': ValueError('At least one of proj:code, proj:wkt2, or proj:projjson must be provided')}, 'url': 'https://errors.pydantic.dev/2.12/v/value_error'}
# All three representations of the Sentinel-2 scene's CRS
print("proj:code")
print(f" EPSG:{crs.to_epsg()}")
print()
print("proj:wkt2 (truncated)")
print(f" {crs.to_wkt()[:80]}...")
print()
print("proj:projjson (summary)")
pj = crs.to_json_dict()
print(f" type: {pj['type']}")
print(f" name: {pj['name']}")
print(f" keys: {list(pj.keys())}")
proj:code EPSG:32612 proj:wkt2 (truncated) PROJCRS["WGS 84 / UTM zone 12N",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic Sy... proj:projjson (summary) type: ProjectedCRS name: WGS 84 / UTM zone 12N keys: ['$schema', 'type', 'name', 'base_crs', 'conversion', 'coordinate_system', 'scope', 'area', 'bbox', 'id']
Summary¶
The proj: convention provides three methods for encoding CRS information in Zarr:
| Method | When to use |
|---|---|
proj:code |
A well-known authority code exists (most common) |
proj:wkt2 |
Self-contained text representation needed |
proj:projjson |
JSON-native representation preferred |
Each convention entry is registered in zarr_conventions with a UUID, schema URL, and spec URL.
Next: Inheritance | Composition | COG to Zarr
For the full specification, see the proj: convention README.