obstore.fsspec
obstore.fsspec ¶
Integration with the fsspec library.
The fsspec integration is best effort and may not provide the same performance as the rest of obstore. If you find any bugs with this integration, please file an issue.
The underlying object_store
Rust crate
cautions
against relying too strongly on stateful filesystem representations of object stores:
The ObjectStore interface is designed to mirror the APIs of object stores and not filesystems, and thus has stateless APIs instead of cursor based interfaces such as Read or Seek available in filesystems.
This design provides the following advantages:
- All operations are atomic, and readers cannot observe partial and/or failed writes
- Methods map directly to object store APIs, providing both efficiency and predictability
- Abstracts away filesystem and operating system specific quirks, ensuring portability
- Allows for functionality not native to filesystems, such as operation preconditions and atomic multipart uploads
Where possible, implementations should use the underlying obstore
APIs
directly. Only where this is not possible should users fall back to this fsspec
integration.
SUPPORTED_PROTOCOLS
module-attribute
¶
SUPPORTED_PROTOCOLS: set[str] = {
"abfs",
"abfss",
"adl",
"az",
"azure",
"file",
"gcs",
"gs",
"http",
"https",
"memory",
"s3",
"s3a",
}
All supported protocols.
SUPPORTED_PROTOCOLS_T
module-attribute
¶
SUPPORTED_PROTOCOLS_T = Literal[
"abfs",
"abfss",
"adl",
"az",
"azure",
"file",
"gcs",
"gs",
"http",
"https",
"memory",
"s3",
"s3a",
]
A type hint for all supported protocols.
BufferedFile ¶
Bases: AbstractBufferedFile
A buffered readable or writable file.
This is a wrapper around obstore.ReadableFile
and obstore.WritableFile
.
If you don't have a need to use the fsspec integration, you may be better served by
using open_reader
or open_writer
directly.
__init__ ¶
__init__(
fs: FsspecStore,
path: str,
mode: Literal["rb"] = "rb",
*,
buffer_size: int = 1024 * 1024,
**kwargs: Any,
) -> None
__init__(
fs: FsspecStore,
path: str,
mode: Literal["wb"],
*,
buffer_size: int = 10 * 1024 * 1024,
attributes: Attributes | None = None,
tags: dict[str, str] | None = None,
**kwargs: Any,
) -> None
__init__(
fs: FsspecStore,
path: str,
mode: Literal["rb", "wb"] = "rb",
*,
buffer_size: int | None = None,
attributes: Attributes | None = None,
tags: dict[str, str] | None = None,
**kwargs: Any,
) -> None
Create new buffered file.
Parameters:
-
fs
(FsspecStore
) –The underlying fsspec store to read from.
-
path
(str
) –The path within the store to use.
-
mode
(Literal['rb', 'wb']
, default:'rb'
) –"rb"
for a readable binary file or"wb"
for a writable binary file. Defaults to "rb".
Other Parameters:
-
attributes
(Attributes | None
) –Provide a set of
Attributes
. Only used when writing. Defaults toNone
. -
buffer_size
(int | None
) –Up to
buffer_size
bytes will be buffered in memory. When reading: The minimum number of bytes to read in a single request. When writing: Ifbuffer_size
is exceeded, data will be uploaded as a multipart upload in chunks ofbuffer_size
. Defaults to None. -
tags
(dict[str, str] | None
) –Provide tags for this object. Only used when writing. Defaults to
None
. -
kwargs
(Any
) –Keyword arguments passed on to
fsspec.spec.AbstractBufferedFile
.
flush ¶
flush(force: bool = False) -> None
Write buffered data to backend store.
Writes the current buffer, if it is larger than the block-size, or if the file is being closed.
Parameters:
-
force
(bool
, default:False
) –Unused.
read ¶
seek ¶
FsspecStore ¶
Bases: AsyncFileSystem
An fsspec implementation based on a obstore Store.
You should be able to pass an instance of this class into any API that expects an fsspec-style object.
__init__ ¶
__init__(
protocol: Literal["s3", "s3a"],
*args: Any,
config: S3Config | None = None,
client_options: ClientConfig | None = None,
retry_config: RetryConfig | None = None,
asynchronous: bool = False,
max_cache_size: int = 10,
loop: Any = None,
batch_size: int | None = None,
**kwargs: Unpack[S3Config],
) -> None
__init__(
protocol: Literal["gs"],
*args: Any,
config: GCSConfig | None = None,
client_options: ClientConfig | None = None,
retry_config: RetryConfig | None = None,
asynchronous: bool = False,
max_cache_size: int = 10,
loop: Any = None,
batch_size: int | None = None,
**kwargs: Unpack[GCSConfig],
) -> None
__init__(
protocol: Literal["az", "adl", "azure", "abfs", "abfss"],
*args: Any,
config: AzureConfig | None = None,
client_options: ClientConfig | None = None,
retry_config: RetryConfig | None = None,
asynchronous: bool = False,
max_cache_size: int = 10,
loop: Any = None,
batch_size: int | None = None,
**kwargs: Unpack[AzureConfig],
) -> None
__init__(
protocol: SUPPORTED_PROTOCOLS_T | str | None = None,
*args: Any,
config: S3Config | GCSConfig | AzureConfig | None = None,
client_options: ClientConfig | None = None,
retry_config: RetryConfig | None = None,
asynchronous: bool = False,
max_cache_size: int = 10,
loop: Any = None,
batch_size: int | None = None,
**kwargs: Any,
) -> None
Construct a new FsspecStore.
Parameters:
-
protocol
(SUPPORTED_PROTOCOLS_T | str | None
, default:None
) –The storage protocol to use, such as "s3", "gcs", or "abfs". If
None
, the default class-level protocol is used. Default to None. -
config
(S3Config | GCSConfig | AzureConfig | None
, default:None
) –Configuration for the cloud storage provider, which can be one of S3Config, GCSConfig, AzureConfig, or AzureConfigInput. Any of these values will be applied after checking for environment variables. If
None
, no cloud storage configuration is applied beyond what is found in environment variables. -
client_options
(ClientConfig | None
, default:None
) –Additional options for configuring the client.
-
retry_config
(RetryConfig | None
, default:None
) –Configuration for handling request errors.
-
args
(Any
, default:()
) –positional arguments passed on to the
fsspec.asyn.AsyncFileSystem
constructor.
Other Parameters:
-
asynchronous
(bool
) –Set to
True
if this instance is meant to be be called using the fsspec async API. This should only be set to true when running within a coroutine. -
max_cache_size
(int
) –The maximum number of stores the cache should keep. A cached store is kept internally for each bucket name. Defaults to 10.
-
loop
(Any
) –since both fsspec/python and tokio/rust may be using loops, this should be kept
None
for now, and will not be used. -
batch_size
(int | None
) –some operations on many files will batch their requests; if you are seeing timeouts, you may want to set this number smaller than the defaults, which are determined in
fsspec.asyn._get_batch_size
. -
kwargs
(Any
) –per-store configuration passed down to store-specific builders.
Examples:
from obstore.fsspec import FsspecStore
store = FsspecStore("https")
resp = store.cat_file("https://raw.githubusercontent.com/developmentseed/obstore/refs/heads/main/README.md")
assert resp.startswith(b"# obstore")
register ¶
register(
protocol: SUPPORTED_PROTOCOLS_T
| str
| Iterable[SUPPORTED_PROTOCOLS_T]
| Iterable[str]
| None = None,
*,
asynchronous: bool = False,
) -> None
Dynamically register a subclass of FsspecStore for the given protocol(s).
This function creates a new subclass of FsspecStore with the specified protocol and registers it with fsspec. If multiple protocols are provided, the function registers each one individually.
Parameters:
-
protocol
(SUPPORTED_PROTOCOLS_T | str | Iterable[SUPPORTED_PROTOCOLS_T] | Iterable[str] | None
, default:None
) –A single protocol (e.g., "s3", "gcs", "abfs") or a list of protocols to register FsspecStore for.
Defaults to
None
, which will registerobstore
as the provider for all supported protocols except forfile://
andmemory://
. If you wish to useobstore
via fsspec forfile://
ormemory://
URLs, list them explicitly. -
asynchronous
(bool
, default:False
) –If
True
, the registered store will support asynchronous operations. Defaults toFalse
.
Example:
# Register obstore as the default handler for all supported protocols except for
# `memory://` and `file://`
register()
register("s3")
# Registers an async store for "s3"
register("s3", asynchronous=True)
# Registers both "gcs" and "abfs"
register(["gcs", "abfs"])
Notes
- Each protocol gets a dynamically generated subclass named
FsspecStore_<protocol>
. This avoids modifying the original FsspecStore class.