obstore¶
Simple, fast integration with object storage services like Amazon S3, Google Cloud Storage, Azure Blob Storage, and S3-compliant APIs like Cloudflare R2.
- Sync and async API.
- Streaming downloads with configurable chunking.
- Streaming
list
, with no need to paginate. - Support for conditional put ("put if not exists"), as well as custom tags and attributes.
- Automatically supports multipart uploads under the hood for large file objects.
- Optionally return list results as Arrow, which is faster than materializing Python
dict
/list
objects. - Easy to install with no required Python dependencies.
- The underlying Rust library is production quality and used in large scale production systems, such as the Rust package registry crates.io.
- Support for zero-copy data exchange from Rust into Python in
get_range
andget_ranges
. - Simple API with static type checking.
- Helpers for constructing from environment variables and
boto3.Session
objects
Installation¶
pip install obstore
Documentation¶
Full documentation is available on the website.
Usage¶
Constructing a store¶
Classes to construct a store are exported from the obstore.store
submodule:
S3Store
: Configure a connection to Amazon S3.GCSStore
: Configure a connection to Google Cloud Storage.AzureStore
: Configure a connection to Microsoft Azure Blob Storage.HTTPStore
: Configure a connection to a generic HTTP serverLocalStore
: Local filesystem storage providing the same object store interface.MemoryStore
: A fully in-memory implementation of ObjectStore.
Example¶
import boto3
from obstore.store import S3Store
session = boto3.Session()
store = S3Store.from_session(session, "bucket-name", config={"AWS_REGION": "us-east-1"})
Configuration¶
Each store class above has its own configuration, accessible through the config
named parameter. This is covered in the docs, and string literals are in the type hints.
Additional HTTP client configuration is available via the client_options
named parameter.
Interacting with a store¶
All methods for interacting with a store are exported as top-level functions (not methods on the store
object):
copy
: Copy an object from one path to another in the same object store.delete
: Delete the object at the specified location.get
: Return the bytes that are stored at the specified location.head
: Return the metadata for the specified locationlist
: List all the objects with the given prefix.put
: Save the provided bytes to the specified locationrename
: Move an object from one path to another in the same object store.
There are a few additional APIs useful for specific use cases:
get_range
: Get a specific byte range from a file.get_ranges
: Get multiple byte ranges from a single file.list_with_delimiter
: List objects within a specific directory.sign
: Create a signed URL.
All methods have a comparable async method with the same name plus an _async
suffix.
Example¶
import obstore as obs
store = obs.store.MemoryStore()
obs.put(store, "file.txt", b"hello world!")
response = obs.get(store, "file.txt")
response.meta
# {'path': 'file.txt',
# 'last_modified': datetime.datetime(2024, 10, 21, 16, 19, 45, 102620, tzinfo=datetime.timezone.utc),
# 'size': 12,
# 'e_tag': '0',
# 'version': None}
assert response.bytes() == b"hello world!"
byte_range = obs.get_range(store, "file.txt", offset=0, length=5)
assert byte_range == b"hello"
obs.copy(store, "file.txt", "other.txt")
assert obs.get(store, "other.txt").bytes() == b"hello world!"
All of these methods also have async
counterparts, suffixed with _async
.
import obstore as obs
store = obs.store.MemoryStore()
await obs.put_async(store, "file.txt", b"hello world!")
response = await obs.get_async(store, "file.txt")
response.meta
# {'path': 'file.txt',
# 'last_modified': datetime.datetime(2024, 10, 21, 16, 20, 36, 477418, tzinfo=datetime.timezone.utc),
# 'size': 12,
# 'e_tag': '0',
# 'version': None}
assert await response.bytes_async() == b"hello world!"
byte_range = await obs.get_range_async(store, "file.txt", offset=0, length=5)
assert byte_range == b"hello"
await obs.copy_async(store, "file.txt", "other.txt")
resp = await obs.get_async(store, "other.txt")
assert await resp.bytes_async() == b"hello world!"
Comparison to object-store-python¶
Read a detailed comparison to object-store-python
, a previous Python library that also wraps the same Rust object_store
crate.