Skip to content

Releasing obstore 0.7!

Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust.

This post gives an overview of what's new in obstore version 0.7.

Refer to the changelog for all updates.

Anonymous connections to Google Cloud Storage

Obstore now supports anonymous connections to GCS. Pass skip_signature=True to configure an anonymous connection.

from obstore.store import GCSStore

store = GCSStore(
    "weatherbench2",
    prefix="datasets/era5/1959-2023_01_10-full_37-1h-0p25deg-chunk-1.zarr",
    # Anonymous connection
    skip_signature=True,
)
store.list_with_delimiter()["objects"]

Now prints:

[{'path': '.zattrs',
  'last_modified': datetime.datetime(2023, 11, 22, 9, 4, 54, 481000, tzinfo=datetime.timezone.utc),
  'size': 2,
  'e_tag': '"99914b932bd37a50b983c5e7c90ae93b"',
  'version': None},
 {'path': '.zgroup',
  'last_modified': datetime.datetime(2023, 11, 22, 9, 4, 53, 465000, tzinfo=datetime.timezone.utc),
  'size': 24,
  'e_tag': '"e20297935e73dd0154104d4ea53040ab"',
  'version': None},
 {'path': '.zmetadata',
  'last_modified': datetime.datetime(2023, 11, 22, 9, 4, 54, 947000, tzinfo=datetime.timezone.utc),
  'size': 46842,
  'e_tag': '"9d287796ca614bfec4f1bb20a4ac1ba3"',
  'version': None}]

Obspec v0.1 compatibility

Obstore provides an implementation for accessing Amazon S3, Google Cloud Storage, and Azure Storage, but some libraries may want to also support other backends, such as HTTP clients or more obscure things like SFTP or HDFS filesystems.

Additionally, there's a bunch of useful behavior that could exist on top of Obstore: caching, metrics, globbing, bulk operations. While all of those operations are useful, we want to keep the core Obstore library as small as possible, tightly coupled with the underlying Rust object_store library.

Obspec exists to provide the abstractions for generic programming against object store backends. Obspec is essentially a formalization and generalization of the Obstore API, so if you're already using Obstore, very few changes are needed to use Obspec instead.

Downstream libraries can program against the Obspec API to be fully generic around what underlying backend is used at runtime.

For further information, refer to the Obspec documentation and the Obspec announcement blog post.

Customize headers sent in requests

ClientConfig now accepts a default_headers key. This allows you to add additional headers that will be sent by the HTTP client on every request.

Improvements to NASA Earthdata credential provider

The NASA Earthdata credential provider now allows user to customize the host that handles credentialization.

It also allows for more possibilities of passing credentials. Authentication information can be a NASA Earthdata token, NASA Earthdata username/password (tuple), or None, in which case, environment variables or a ~/.netrc file are used, if set.

See updated documentation on the NASA Earthdata page.

Fixed creation of AzureStore from HTTPS URL

Previously, this would create an incorrect AzureStore configuration:

url = "https://overturemapswestus2.blob.core.windows.net/release"
store = AzureStore.from_url(url, skip_signature=True)

because it would interpret release as part of the within-bucket prefix, when it should really be interpreted as the container name.

This is now fixed and this test passes:

url = "https://overturemapswestus2.blob.core.windows.net/release"
store = AzureStore.from_url(url, skip_signature=True)

assert store.config.get("container_name") == "release"
assert store.config.get("account_name") == "overturemapswestus2"
assert store.prefix is None

Improved documentation

All updates

Refer to the changelog for all updates.