Skip to content

Releasing obstore 0.5!

Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust.

This post gives an overview of what's new in obstore version 0.5.

Refer to the changelog for all updates.

Class method wrappers

Until now, obstore provided only a functional API with top-level functions exported from the obstore module. Now, obstore additionally provides these functions as methods on each store class.

Previously:

import obstore as obs
from obstore.store import AzureStore

store = AzureStore()
obs.put(store, ...)

Now:

from obstore.store import AzureStore

store = AzureStore()
store.put(...) # (1)!
  1. Note that this calls the class method store.put instead of the module-level function obstore.put.

This also can ease understanding of the API, since you can explore available methods from the store object directly.

Credential providers

Authentication tends to be among the trickiest but most important elements of connecting to object storage. There are many ways to handle credentials, and trying to support every one natively in Obstore demands a high maintenance burden.

Instead, this release supports custom credential providers: Python callbacks that allow for full control over credential generation.

We'll dive into a few salient points, but make sure to read the full authentication documentation in the user guide.

"Official" SDK credential providers

You can use the Boto3CredentialProvider to use boto3.Session to handle credentials.

from boto3 import Session
from obstore.auth.boto3 import Boto3CredentialProvider
from obstore.store import S3Store

session = Session(...)
credential_provider = Boto3CredentialProvider(session)
store = S3Store("bucket_name", credential_provider=credential_provider)

Custom credential providers

There's a long tail of possible authentication mechanisms. Obstore allows you to provide your own custom authentication callback.

You can provide either a synchronous or asynchronous custom authentication function.

The simplest custom credential provider can be just a function callback:

from datetime import datetime, timedelta, UTC

def get_credentials() -> S3Credential:
    return {
        "access_key_id": "...",
        "secret_access_key": "...",
        # Not always required
        "token": "...",
        "expires_at": datetime.now(UTC) + timedelta(minutes=30),
    }

Then just pass that function into credential_provider:

S3Store(..., credential_provider=get_credentials)

More advanced credential providers, which may need to store state, can be class based. See the authentication user guide for more information.

Automatic token refresh

If the credential returned by the credential provider includes an expires_at key, obstore will automatically call the credential provider to refresh your token before the expiration time.

Your code doesn't need to think about token expiration times!

This allows for seamlessly using something like the AWS Security Token Service (STS), which provides temporary token credentials each hour. See StsCredentialProvider for an example of a credential provider that uses STS.Client.assume_role to automatically refresh tokens.

Improved Fsspec integration

This release also significantly improves integration with the fsspec ecosystem.

You can now register obstore as the default handler for supported protocols, like s3, gs, and az. Then calling fsspec.filesystem or fsspec.open will automatically defer to obstore.fsspec.FsspecStore and obstore.fsspec.BufferedFile, respectively.

The fsspec integration is no longer tied to a specific bucket. Instead, FsspecStore will automatically handle multiple buckets within a single protocol.

For example, obstore's fsspec integration is now tested as working with pyarrow.

For more information, read the fsspec page in the user guide.

Improved AWS type hinting

Type hinting has been improved for AWS enums, for example AWS region. Now, when you're constructing an S3Store, if your editor supports it, you'll receive suggestions based on the type hints.

Here are two examples from vscode:

Benchmarking

We've continued work on benchmarking obstore.

New benchmarks run on an EC2 M5 instance indicate obstore provides 2.8x higher throughput than aioboto3 when fetching the first 16KB of a file many times from an async context.

Improved documentation

All updates

Refer to the changelog for all updates.