Skip to content

obstore.fsspec

obstore.fsspec

Integration with the fsspec library.

The fsspec integration is best effort and may not provide the same performance as the rest of obstore. If you find any bugs with this integration, please file an issue.

The underlying object_store Rust crate cautions against relying too strongly on stateful filesystem representations of object stores:

The ObjectStore interface is designed to mirror the APIs of object stores and not filesystems, and thus has stateless APIs instead of cursor based interfaces such as Read or Seek available in filesystems.

This design provides the following advantages:

  • All operations are atomic, and readers cannot observe partial and/or failed writes
  • Methods map directly to object store APIs, providing both efficiency and predictability
  • Abstracts away filesystem and operating system specific quirks, ensuring portability
  • Allows for functionality not native to filesystems, such as operation preconditions and atomic multipart uploads

Where possible, implementations should use the underlying obstore APIs directly. Only where this is not possible should users fall back to this fsspec integration.

SUPPORTED_PROTOCOLS module-attribute

SUPPORTED_PROTOCOLS: set[str] = {
    "abfs",
    "abfss",
    "adl",
    "az",
    "azure",
    "file",
    "gcs",
    "gs",
    "http",
    "https",
    "memory",
    "s3",
    "s3a",
}

All supported protocols.

SUPPORTED_PROTOCOLS_T module-attribute

SUPPORTED_PROTOCOLS_T = Literal[
    "abfs",
    "abfss",
    "adl",
    "az",
    "azure",
    "file",
    "gcs",
    "gs",
    "http",
    "https",
    "memory",
    "s3",
    "s3a",
]

A type hint for all supported protocols.

BufferedFile

Bases: AbstractBufferedFile

A buffered readable or writable file.

This is a wrapper around obstore.ReadableFile and obstore.WritableFile. If you don't have a need to use the fsspec integration, you may be better served by using open_reader or open_writer directly.

loc property writable

loc: int

Get current file location.

__init__

__init__(
    fs: FsspecStore,
    path: str,
    mode: Literal["rb"] = "rb",
    *,
    buffer_size: int = 1024 * 1024,
    **kwargs: Any,
) -> None
__init__(
    fs: FsspecStore,
    path: str,
    mode: Literal["wb"],
    *,
    buffer_size: int = 10 * 1024 * 1024,
    attributes: Attributes | None = None,
    tags: dict[str, str] | None = None,
    **kwargs: Any,
) -> None
__init__(
    fs: FsspecStore,
    path: str,
    mode: Literal["rb", "wb"] = "rb",
    *,
    buffer_size: int | None = None,
    attributes: Attributes | None = None,
    tags: dict[str, str] | None = None,
    **kwargs: Any,
) -> None

Create new buffered file.

Parameters:

  • fs (FsspecStore) –

    The underlying fsspec store to read from.

  • path (str) –

    The path within the store to use.

  • mode (Literal['rb', 'wb'], default: 'rb' ) –

    "rb" for a readable binary file or "wb" for a writable binary file. Defaults to "rb".

Other Parameters:

  • attributes (Attributes | None) –

    Provide a set of Attributes. Only used when writing. Defaults to None.

  • buffer_size (int | None) –

    Up to buffer_size bytes will be buffered in memory. When reading: The minimum number of bytes to read in a single request. When writing: If buffer_size is exceeded, data will be uploaded as a multipart upload in chunks of buffer_size. Defaults to None.

  • tags (dict[str, str] | None) –

    Provide tags for this object. Only used when writing. Defaults to None.

  • kwargs (Any) –

    Keyword arguments passed on to fsspec.spec.AbstractBufferedFile.

close

close() -> None

Close file. Ensure flushing the buffer.

flush

flush(force: bool = False) -> None

Write buffered data to backend store.

Writes the current buffer, if it is larger than the block-size, or if the file is being closed.

Parameters:

  • force (bool, default: False ) –

    Unused.

read

read(length: int = -1) -> bytes

Return bytes from the remote file.

Parameters:

  • length (int, default: -1 ) –

    if positive, returns up to this many bytes; if negative, return all remaining bytes.

Returns:

  • bytes

    Data in bytes

readline

readline() -> bytes

Read until first occurrence of newline character.

readlines

readlines() -> list[bytes]

Return all data, split by the newline character.

seek

seek(loc: int, whence: int = 0) -> int

Set current file location.

Parameters:

  • loc (int) –

    byte location

  • whence (int, default: 0 ) –

    Either - 0: from start of file - 1: current location - 2: end of file

tell

tell() -> int

Get current file location.

write

write(data: bytes) -> int

Write data to buffer.

Parameters:

  • data (bytes) –

    Set of bytes to be written.

FsspecStore

Bases: AsyncFileSystem

An fsspec implementation based on a obstore Store.

You should be able to pass an instance of this class into any API that expects an fsspec-style object.

__init__

__init__(
    protocol: Literal["s3", "s3a"],
    *args: Any,
    config: S3Config | None = None,
    client_options: ClientConfig | None = None,
    retry_config: RetryConfig | None = None,
    asynchronous: bool = False,
    max_cache_size: int = 10,
    loop: Any = None,
    batch_size: int | None = None,
    **kwargs: Unpack[S3Config],
) -> None
__init__(
    protocol: Literal["gs"],
    *args: Any,
    config: GCSConfig | None = None,
    client_options: ClientConfig | None = None,
    retry_config: RetryConfig | None = None,
    asynchronous: bool = False,
    max_cache_size: int = 10,
    loop: Any = None,
    batch_size: int | None = None,
    **kwargs: Unpack[GCSConfig],
) -> None
__init__(
    protocol: Literal["az", "adl", "azure", "abfs", "abfss"],
    *args: Any,
    config: AzureConfig | None = None,
    client_options: ClientConfig | None = None,
    retry_config: RetryConfig | None = None,
    asynchronous: bool = False,
    max_cache_size: int = 10,
    loop: Any = None,
    batch_size: int | None = None,
    **kwargs: Unpack[AzureConfig],
) -> None
__init__(
    protocol: Literal["file"],
    *args: Any,
    config: None = None,
    client_options: None = None,
    retry_config: None = None,
    asynchronous: bool = False,
    max_cache_size: int = 10,
    loop: Any = None,
    batch_size: int | None = None,
    automatic_cleanup: bool = False,
    mkdir: bool = False,
) -> None
__init__(
    protocol: SUPPORTED_PROTOCOLS_T | str | None = None,
    *args: Any,
    config: S3Config | GCSConfig | AzureConfig | None = None,
    client_options: ClientConfig | None = None,
    retry_config: RetryConfig | None = None,
    asynchronous: bool = False,
    max_cache_size: int = 10,
    loop: Any = None,
    batch_size: int | None = None,
    **kwargs: Any,
) -> None

Construct a new FsspecStore.

Parameters:

  • protocol (SUPPORTED_PROTOCOLS_T | str | None, default: None ) –

    The storage protocol to use, such as "s3", "gcs", or "abfs". If None, the default class-level protocol is used. Default to None.

  • config (S3Config | GCSConfig | AzureConfig | None, default: None ) –

    Configuration for the cloud storage provider, which can be one of S3Config, GCSConfig, AzureConfig, or AzureConfigInput. Any of these values will be applied after checking for environment variables. If None, no cloud storage configuration is applied beyond what is found in environment variables.

  • client_options (ClientConfig | None, default: None ) –

    Additional options for configuring the client.

  • retry_config (RetryConfig | None, default: None ) –

    Configuration for handling request errors.

  • args (Any, default: () ) –

    positional arguments passed on to the fsspec.asyn.AsyncFileSystem constructor.

Other Parameters:

  • asynchronous (bool) –

    Set to True if this instance is meant to be be called using the fsspec async API. This should only be set to true when running within a coroutine.

  • max_cache_size (int) –

    The maximum number of stores the cache should keep. A cached store is kept internally for each bucket name. Defaults to 10.

  • loop (Any) –

    since both fsspec/python and tokio/rust may be using loops, this should be kept None for now, and will not be used.

  • batch_size (int | None) –

    some operations on many files will batch their requests; if you are seeing timeouts, you may want to set this number smaller than the defaults, which are determined in fsspec.asyn._get_batch_size.

  • kwargs (Any) –

    per-store configuration passed down to store-specific builders.

Examples:

from obstore.fsspec import FsspecStore

store = FsspecStore("https")
resp = store.cat_file("https://raw.githubusercontent.com/developmentseed/obstore/refs/heads/main/README.md")
assert resp.startswith(b"# obstore")

register

register(
    protocol: SUPPORTED_PROTOCOLS_T
    | str
    | Iterable[SUPPORTED_PROTOCOLS_T]
    | Iterable[str]
    | None = None,
    *,
    asynchronous: bool = False,
) -> None

Dynamically register a subclass of FsspecStore for the given protocol(s).

This function creates a new subclass of FsspecStore with the specified protocol and registers it with fsspec. If multiple protocols are provided, the function registers each one individually.

Parameters:

  • protocol (SUPPORTED_PROTOCOLS_T | str | Iterable[SUPPORTED_PROTOCOLS_T] | Iterable[str] | None, default: None ) –

    A single protocol (e.g., "s3", "gcs", "abfs") or a list of protocols to register FsspecStore for.

    Defaults to None, which will register obstore as the provider for all supported protocols except for file:// and memory://. If you wish to use obstore via fsspec for file:// or memory:// URLs, list them explicitly.

  • asynchronous (bool, default: False ) –

    If True, the registered store will support asynchronous operations. Defaults to False.

Example:

# Register obstore as the default handler for all supported protocols except for
# `memory://` and `file://`
register()

register("s3")

# Registers an async store for "s3"
register("s3", asynchronous=True)

# Registers both "gcs" and "abfs"
register(["gcs", "abfs"])

Notes
  • Each protocol gets a dynamically generated subclass named FsspecStore_<protocol>. This avoids modifying the original FsspecStore class.