Blog post cover image

Image by Gus Becker

Authors: Max Jones and Davis Bennett

Davis Bennett is an independent software developer specializing in tools for processing large imaging datasets and is an active member of the zarr community. Davis authored the specification and implementation of new Zarr data types and codecs simplification.

Zarr represents a better way to work with massive datasets: open, scalable, and designed for the cloud. We’ve invested in developing Virtual Zarr, advancing Zarr v3 adoption, and convening community events because we believe this ecosystem is key to more accessible and sustainable data infrastructure.

Zarr is a cloud-native open standard for chunked array data. It’s widely used in Earth science and beyond because it makes massive datasets easier to store, share, and analyze in distributed environments. The format has grown quickly thanks to contributions from a broad open-source community, and it continues to evolve with new tools, specifications, and events.

Image

Zarr store, Credit: Gus Becker

Virtual Zarr

Virtual Zarr makes it possible to work with data stored in archival file formats as cloud-native datacubes. The idea of a virtual access layer isn’t new. OPeNDAP, GDAL, Kerchunk, and others have shown the power of this approach for years. What’s different now is the simplicity, scalability, and accessibility that Virtual Zarr brings to a wider community.

We’ve helped shape two key components: VirtualiZarr, a community developed library that provides a user-friendly way to build virtual datacubes, and Icechunk, Earthmover's solution for scalable, version-controlled serialization. Both build on the open, cloud-native Zarr format. The impact is real—see the massive cost savings from our collaborative NASA pilot.

Advancing Zarr Python for Version 3 Adoption

Migrating from Zarr v2 to v3 has been a challenge for many. To smooth that transition, we’ve contributed directly to the Zarr Python implementation. We have added new extensible data types, simplified codecs, and improved performance by connecting Zarr with our new obstore library.

These improvements not only make adoption easier but also open up new possibilities for performance and interoperability. Check out Emmanuel's LinkedIn post to see obstore in action, powering snappy and dynamic visualizations, on par with dynamic tiling using Cloud-Optimized GeoTIFFs (COGs).

Community Sprints and the Zarr Summit

This week we're leading two major community events: the Zarr-focused STAC Community Sprint and the Zarr Summit. Both are drawing wide interest, and we’re grateful to The Navigation Fund for supporting the Summit.

If you can’t join in person, the conversation doesn’t stop there: We’d love to hear how you’re using Zarr and where it could take your work. Reach out if you’d like to connect—we’d love to hear your ideas and challenges.


Funding support for this work comes from the Data Systems Evolution team at NASA Marshall Space Flight Center's Office of Data Science and Informatics (ODSI). ODSI enables scientific exploration and discovery through innovative data visualization techniques and analysis capabilities that lower the barrier to entry for cloud-hosted data.

Zarr Summit is made possible through foundational funding support from The Navigation Fund which offers grants to high-impact organizations and projects that are taking bold action and making significant changes in the key areas of open science, farm animal welfare, transforming criminal justice and climate change.

What we're doing.

Latest