We Needed Better Cloud Storage for Python so We Built Obstore

Estimated: 7 min read

Kyle BarrononAug 1, 2025

Obstore is a fast, lightweight Python library for working with object storage—backed by Rust and built for clarity, speed, and interoperability. It’s already being used across cloud-native geospatial tools and supports common workflows out of the box.

We work with terabytes of open environmental data in the cloud. And we’ve felt firsthand how frustrating it can be to access and work with that data quickly and reliably—especially across cloud providers.

From Cloud-Optimized GeoTIFFs (COGs) to Zarr arrays to GeoParquet datasets, today's cloud-native formats are designed for scalable, performant access. But the tools to interact with them don't always keep up.

That's why we built obstore, a fast, minimal interface for reading and writing data in object storage that supports all major cloud providers—Amazon S3 (and S3-compatible interfaces), Google Cloud Storage, and Azure Storage—directly from Python. Obstore is open source, Python dependency-free, and backed by Rust for performance.

It's designed to be not only faster but also easier to use and integrate. And it's already being used in projects across the cloud-native geospatial community.

Why This Matters

Many Python libraries want to support multiple cloud providers. But using vendor-specific SDKs requires developers to build their own abstraction layers to manage inconsistent APIs.

Because of this, some Python libraries have been created to manage these abstractions for you, most notably fsspec. But fsspec was designed around filesystems. This creates a core mismatch: object storage is not a filesystem—but Python tools like fsspec often treat it like one. This impedance mismatch leads to surprising behavior, poor performance, and integration complexity. For high-throughput, cloud-native use cases, especially with large arrays or tiled data, that overhead adds up.

Obstore takes a different approach. It exposes just the operations that object stores support natively, using a stateless HTTP-style API. No file handles. No caching surprises. Just predictable, fast calls that do exactly what they say.

Obstore brings high-throughput cloud storage access to Python.

Why We Built Obstore

Obstore supports all major cloud providers with a single, consistent interface.

Stateless, HTTP-style model
Minimal, predictable API (~12 methods)
Backed by Rust for performance
No required Python dependencies
Streaming support for uploads, downloads, and lists.
Type hints for better developer experience
Easy, extensible authentication
Built for the 99% of use cases that matter

We didn't set out to build yet another storage library. We built Obstore because we kept hitting the same problems—across tools, formats, and projects—and needed something better.

Stateless by Design

Stateful APIs add uncertainty and unpredictability. Take the following example, which uses the stateful s3fs library, (the canonical S3 implementation for fsspec). Is the list request cached? How many requests are made, one or two? What happens if the remote data changes? Will the second list automatically reflect new data?

from time import sleep
from s3fs import S3FileSystem

fs = S3FileSystem()
fs.ls("s3://mybucket")
sleep(5)
fs.ls("s3://mybucket")

The API documentation of S3FileSystem.ls doesn't say; you have to look into the source code to find out that the default is refresh=False. So that means that the list call is cached, only one HTTP request is made, and the second call to ls will not reflect new data without an explicit call to refresh=True. s3fs's stateful API gives unpredictable behavior in the context of a changing bucket.

Obstore avoids this. Its API is stateless by default, so every API call is independent, and you can always expect the same behavior. In the corresponding Obstore code, two list requests are made, and the second call will always reflect the current state of the bucket.

from time import sleep
from obstore.store import S3Store

store = S3Store("mybucket")
store.list().collect()
sleep(5)
store.list().collect()

Minimal API Surface

Obstore gives you just the operations you actually need: get, put, copy, delete, rename, list, list_with_delimiter (which interprets / in paths as directories). It also supports creating signed URLs and a file-like object API.

All of these operations have async equivalents, such as get_async. The entire API is around a dozen functions.

Fast and Rust-backed

Obstore wraps the Rust object_store library via PyO3, enabling:

9x higher throughput than fsspec and 2.8x higher throughput than aioboto3 for many concurrent, small get requests on S3 from an async Python context.
Full support for streaming uploads and downloads.
Efficient zero-copy data exchange between Rust and Python via Python's buffer protocol.

We plan to share more performance benchmarks in an upcoming post.

No Dependency Bloat

Because it's compiled with Rust, Obstore is fully self-contained. Access S3, GCS, and Azure storage without any additional dependencies: no boto3, no google-cloud-storage, no Azure SDKs. This means fewer version conflicts, faster cold starts, smaller deployment sizes, and less to understand, debug, and secure.

Authentication and request signing are handled by the underlying object_store Rust library

Streaming Support

Obstore supports streaming downloads, uploads, and list calls out of the box, with your choice of synchronous or asynchronous behavior.

For downloads, you're presented with a synchronous or asynchronous iterator. You can operate on a byte stream before the entire file has downloaded while still only making one request.

For uploads, file objects and synchronous or asynchronous iterators of byte buffers are supported, allowing you to upload data from any byte source without materializing everything in memory. Obstore supports multipart uploads by default, with configurable concurrency.

Similarly to downloads, Obstore's list calls present a synchronous or asynchronous iterator, which avoid the user needing to handle pagination manually.

Type Hinting

Obstore includes full type hinting for all its methods and classes, making it easy to use in modern Python IDEs.

Flexible Authentication

Authentication is often the hardest part of working with cloud storage. Obstore supports both native and custom credential handling.

By "native" authentication we're referring to the authentication methods that are natively supported by the underlying Rust object_store library. These include the most common credential methods, like basic authentication, container credentials, or instance credentials.

But Obstore also supports "credential providers"—custom authentication callbacks. This means you can provide your own arbitrary Python function to handle authentication, whether it's for a custom cloud provider, a specific token refresh strategy, or any other use case.

These credential providers automatically refresh tokens before they expire, with no manual handling required. This is especially useful for long-running processes.

We support some of these out-of-the-box in the obstore.auth module, such as providers that wrap boto3, wrap google.auth, or wrap azure.identity. We also provide out-of-the-box support for NASA Earthdata and the Microsoft Planetary Computer.

Alternatively, you can implement your own custom authentication provider to handle any specific authentication needs you have.

Fsspec Compatibility

We know that many existing Python tools and libraries rely on fsspec for cloud storage access. Obstore provides an fsspec-compatible interface so you can use it as a drop-in replacement without rewriting your code.

This should be considered a last resort. Using Obstore via its fsspec integration—especially through fsspec's synchronous API—is unlikely to yield performance improvements.

Obstore in Action

Here's a quick example of using Obstore to explore a public S3 bucket of Sentinel-2 imagery:

from obstore.store import S3Store

url = "s3://sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2A_12SUF_20220601_0_L2A"
store = S3Store.from_url(url, region="us-west-2", skip_signature=True)

print([obj["path"] for obj in store.list_with_delimiter()["objects"]])
# ['AOT.tif', 'B01.tif', 'B02.tif', 'B03.tif', 'B04.tif', 'B05.tif', 'B06.tif', 'B07.tif', 'B08.tif', 'B09.tif', 'B11.tif', 'B12.tif', 'B8A.tif', 'L2A_PVI.tif', 'S2A_12SUF_20220601_0_L2A.json', 'SCL.tif', 'TCI.tif', 'WVP.tif', 'granule_metadata.xml', 'thumbnail.jpg', 'tileinfo_metadata.json']

thumbnail = store.get("thumbnail.jpg").bytes()
with open("thumbnail.jpg", "wb") as f:
f.write(thumbnail)

Under the Hood

Obstore is part of a growing Rust-Python ecosystem we use at Development Seed. It's built on:

pyo3-object_store: Shared Rust logic for any Python library using the object_store Rust library.
pyo3-arrow: Zero-copy Apache Arrow integration between Rust and Python.
pyo3-bytes: A zero-copy interface to the Python buffer protocol for viewing Python buffers from Rust or exposing Rust-owned buffers to Python.

These pieces are designed to be reused across projects—from async TIFF readers to cloud-native GeoParquet tools.

Get Started

Install Obstore via pip or conda:

pip install obstore

conda install -c conda-forge obstore

Read the docs at developmentseed.org/obstore, or check out the getting started guide.

What's Next

Obstore was built in the open, with help from the Zarr community, the Rust database ecosystem, and contributors like Max Jones and Jeff Albrecht.

If you're using Obstore—or want to—reach out.

Let's build faster, leaner, and more open cloud data tools together. If you're building cloud-native workflows, especially for geospatial or scientific data, we'd love your feedback or your contributions.

You can view Kyle's recent Obstore talk at the CNG conference from earlier this year, here.

What we're doing.

Latest