Releasing obstore 0.5!¶
Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust.
This post gives an overview of what's new in obstore version 0.5.
Refer to the changelog for all updates.
Class method wrappers¶
Until now, obstore provided only a functional API with top-level functions exported from the obstore
module. Now, obstore additionally provides these functions as methods on each store class.
Previously:
import obstore as obs
from obstore.store import AzureStore
store = AzureStore()
obs.put(store, ...)
Now:
from obstore.store import AzureStore
store = AzureStore()
store.put(...) # (1)!
- Note that this calls the class method
store.put
instead of the module-level functionobstore.put
.
This also can ease understanding of the API, since you can explore available methods from the store object directly.
Credential providers¶
Authentication tends to be among the trickiest but most important elements of connecting to object storage. There are many ways to handle credentials, and trying to support every one natively in Obstore demands a high maintenance burden.
Instead, this release supports custom credential providers: Python callbacks that allow for full control over credential generation.
We'll dive into a few salient points, but make sure to read the full authentication documentation in the user guide.
"Official" SDK credential providers¶
You can use the Boto3CredentialProvider
to use boto3.Session
to handle credentials.
from boto3 import Session
from obstore.auth.boto3 import Boto3CredentialProvider
from obstore.store import S3Store
session = Session(...)
credential_provider = Boto3CredentialProvider(session)
store = S3Store("bucket_name", credential_provider=credential_provider)
Custom credential providers¶
There's a long tail of possible authentication mechanisms. Obstore allows you to provide your own custom authentication callback.
You can provide either a synchronous or asynchronous custom authentication function.
The simplest custom credential provider can be just a function callback:
from datetime import datetime, timedelta, UTC
def get_credentials() -> S3Credential:
return {
"access_key_id": "...",
"secret_access_key": "...",
# Not always required
"token": "...",
"expires_at": datetime.now(UTC) + timedelta(minutes=30),
}
Then just pass that function into credential_provider
:
S3Store(..., credential_provider=get_credentials)
More advanced credential providers, which may need to store state, can be class based. See the authentication user guide for more information.
Automatic token refresh¶
If the credential returned by the credential provider includes an expires_at
key, obstore will automatically call the credential provider to refresh your token before the expiration time.
Your code doesn't need to think about token expiration times!
This allows for seamlessly using something like the AWS Security Token Service (STS), which provides temporary token credentials each hour. See StsCredentialProvider
for an example of a credential provider that uses STS.Client.assume_role
to automatically refresh tokens.
Improved Fsspec integration¶
This release also significantly improves integration with the fsspec ecosystem.
You can now register obstore as the default handler for supported protocols, like s3
, gs
, and az
. Then calling fsspec.filesystem
or fsspec.open
will automatically defer to obstore.fsspec.FsspecStore
and obstore.fsspec.BufferedFile
, respectively.
The fsspec integration is no longer tied to a specific bucket. Instead, FsspecStore
will automatically handle multiple buckets within a single protocol.
For example, obstore's fsspec integration is now tested as working with pyarrow.
For more information, read the fsspec page in the user guide.
Improved AWS type hinting¶
Type hinting has been improved for AWS enums, for example AWS region. Now, when you're constructing an S3Store, if your editor supports it, you'll receive suggestions based on the type hints.
Here are two examples from vscode:
Benchmarking¶
We've continued work on benchmarking obstore.
New benchmarks run on an EC2 M5 instance indicate obstore provides 2.8x higher throughput than aioboto3 when fetching the first 16KB of a file many times from an async context.
Improved documentation¶
All updates¶
Refer to the changelog for all updates.