File-like Object¶
Native support for reading from object stores as a file-like object.
Use obstore.open_reader
or obstore.open_reader_async
to open readable files. Use obstore.open_writer
or obstore.open_writer_async
to open writable files.
obstore.open_reader ¶
open_reader(
store: ObjectStore, path: str, *, buffer_size: int = 1024 * 1024
) -> ReadableFile
Open a readable file object from the specified location.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
path
(str
) –The path within ObjectStore to retrieve.
Other Parameters:
-
buffer_size
(int
) –The minimum number of bytes to read in a single request. Up to
buffer_size
bytes will be buffered in memory.
Returns:
-
ReadableFile
–ReadableFile
obstore.open_reader_async
async
¶
open_reader_async(
store: ObjectStore, path: str, *, buffer_size: int = 1024 * 1024
) -> AsyncReadableFile
Call open_reader
asynchronously, returning a readable file object with asynchronous operations.
Refer to the documentation for open_reader.
obstore.open_writer ¶
open_writer(
store: ObjectStore,
path: str,
*,
attributes: Attributes | None = None,
buffer_size: int = 10 * 1024 * 1024,
tags: dict[str, str] | None = None,
max_concurrency: int = 12,
) -> WritableFile
Open a writable file object at the specified location.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
path
(str
) –The path within ObjectStore to retrieve.
Other Parameters:
-
attributes
(Attributes | None
) –Provide a set of
Attributes
. Defaults toNone
. -
buffer_size
(int
) –The underlying buffer size to use. Up to
buffer_size
bytes will be buffered in memory. Ifbuffer_size
is exceeded, data will be uploaded as a multipart upload in chunks ofbuffer_size
. -
tags
(dict[str, str] | None
) –Provide tags for this object. Defaults to
None
. -
max_concurrency
(int
) –The maximum number of chunks to upload concurrently. Defaults to 12.
Returns:
-
WritableFile
–ReadableFile
obstore.open_writer_async ¶
open_writer_async(
store: ObjectStore,
path: str,
*,
attributes: Attributes | None = None,
buffer_size: int = 10 * 1024 * 1024,
tags: dict[str, str] | None = None,
max_concurrency: int = 12,
) -> AsyncWritableFile
Open an asynchronous writable file object at the specified location.
Refer to the documentation for open_writer.
obstore.ReadableFile ¶
A synchronous-buffered reader that implements a similar interface as a Python
BufferedReader
.
Internally this maintains a buffer of the requested size, and uses
get_range
to populate its internal buffer once depleted. This
buffer is cleared on seek.
Whilst simple, this interface will typically be outperformed by the native obstore
methods that better map to the network APIs. This is because most object stores have
very high first-byte latencies, on the order of 100-200ms, and so avoiding
unnecessary round-trips is critical to throughput.
Systems looking to sequentially scan a file should instead consider using
get
, or get_range
to read a particular
range.
Systems looking to read multiple ranges of a file should instead consider using
get_ranges
, which will optimise the vectored IO.
read ¶
Read up to size
bytes from the object and return them.
As a convenience, if size is unspecified or None
, all bytes until EOF are
returned.
readlines ¶
Read all remaining lines into a list of buffers.
seek ¶
Change the stream position.
Change the stream position to the given byte offset
, interpreted relative to
the position indicated by whence
, and return the new absolute position. Values
for whence
are:
os.SEEK_SET
or 0: start of the stream (the default);offset
should be zero or positiveos.SEEK_CUR
or 1: current stream position;offset
may be negativeos.SEEK_END
or 2: end of the stream;offset
is usually negative
obstore.AsyncReadableFile ¶
An async-buffered reader that implements a similar interface as a Python
BufferedReader
.
Internally this maintains a buffer of the requested size, and uses
get_range
to populate its internal buffer once depleted. This
buffer is cleared on seek.
Whilst simple, this interface will typically be outperformed by the native obstore
methods that better map to the network APIs. This is because most object stores have
very high first-byte latencies, on the order of 100-200ms, and so avoiding
unnecessary round-trips is critical to throughput.
Systems looking to sequentially scan a file should instead consider using
get
, or get_range
to read a particular
range.
Systems looking to read multiple ranges of a file should instead consider using
get_ranges
, which will optimise the vectored IO.
read
async
¶
Read up to size
bytes from the object and return them.
As a convenience, if size is unspecified or None
, all bytes until EOF are
returned.
readline
async
¶
readline() -> Bytes
Read a single line of the file, up until the next newline character.
readlines
async
¶
Read all remaining lines into a list of buffers.
seek
async
¶
Change the stream position.
Change the stream position to the given byte offset
, interpreted relative to
the position indicated by whence
, and return the new absolute position. Values
for whence
are:
os.SEEK_SET
or 0: start of the stream (the default);offset
should be zero or positiveos.SEEK_CUR
or 1: current stream position;offset
may be negativeos.SEEK_END
or 2: end of the stream;offset
is usually negative
obstore.WritableFile ¶
Bases: AbstractContextManager
A buffered writable file object with synchronous operations.
This implements a similar interface as a Python
BufferedWriter
.
closed ¶
closed() -> bool
Check whether this file has been closed.
Note that this is a method, not an attribute.
flush ¶
flush() -> None
Flushes this output stream, ensuring that all intermediately buffered contents reach their destination.
write ¶
Write the bytes-like object, buffer
, and return the number of bytes written.
obstore.AsyncWritableFile ¶
Bases: AbstractAsyncContextManager
A buffered writable file object with asynchronous operations.
closed
async
¶
closed() -> bool
Check whether this file has been closed.
Note that this is an async method, not an attribute.
flush
async
¶
flush() -> None
Flushes this output stream, ensuring that all intermediately buffered contents reach their destination.
write
async
¶
Write the bytes-like object, buffer
, and return the number of bytes written.