File-like Object¶
Native support for reading from object stores as a file-like object.
Use obstore.open_reader
or obstore.open_reader_async
to open readable files. Use obstore.open_writer
or obstore.open_writer_async
to open writable files.
obstore.open_reader ¶
open_reader(
store: ObjectStore, path: str, *, buffer_size: int = 1024 * 1024
) -> ReadableFile
Open a readable file object from the specified location.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
path
(str
) –The path within ObjectStore to retrieve.
Other Parameters:
-
buffer_size
(int
) –The number of bytes to read in a single request. Up to
buffer_size
bytes will be buffered in memory. Ifbuffer_size
is exceeded, data will be uploaded as a multipart upload in chunks ofbuffer_size
.
Returns:
-
ReadableFile
–ReadableFile
obstore.open_reader_async
async
¶
open_reader_async(
store: ObjectStore, path: str, *, buffer_size: int = 1024 * 1024
) -> AsyncReadableFile
Call open_reader
asynchronously, returning a readable file object with asynchronous operations.
Refer to the documentation for open_reader.
obstore.open_writer ¶
open_writer(
store: ObjectStore,
path: str,
*,
attributes: Attributes | None = None,
buffer_size: int = 10 * 1024 * 1024,
tags: Dict[str, str] | None = None,
max_concurrency: int = 12,
) -> WritableFile
Open a writable file object at the specified location.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
path
(str
) –The path within ObjectStore to retrieve.
Other Parameters:
-
attributes
(Attributes | None
) –Provide a set of
Attributes
. Defaults toNone
. -
buffer_size
(int
) –The underlying buffer size to use. Up to
buffer_size
bytes will be buffered in memory. Ifbuffer_size
is exceeded, data will be uploaded as a multipart upload in chunks ofbuffer_size
. -
tags
(Dict[str, str] | None
) –Provide tags for this object. Defaults to
None
. -
max_concurrency
(int
) –The maximum number of chunks to upload concurrently. Defaults to 12.
Returns:
-
WritableFile
–ReadableFile
obstore.open_writer_async ¶
open_writer_async(
store: ObjectStore,
path: str,
*,
attributes: Attributes | None = None,
buffer_size: int = 10 * 1024 * 1024,
tags: Dict[str, str] | None = None,
max_concurrency: int = 12,
) -> AsyncWritableFile
Open an asynchronous writable file object at the specified location.
Refer to the documentation for open_writer.
obstore.ReadableFile ¶
A synchronous-buffered reader that implements a similar interface as a Python
BufferedReader
.
Internally this maintains a buffer of the requested size, and uses
get_range
to populate its internal buffer once depleted. This
buffer is cleared on seek.
Whilst simple, this interface will typically be outperformed by the native obstore
methods that better map to the network APIs. This is because most object stores have
very high first-byte latencies, on the order of 100-200ms, and so avoiding
unnecessary round-trips is critical to throughput.
Systems looking to sequentially scan a file should instead consider using
get
, or get_range
to read a particular
range.
Systems looking to read multiple ranges of a file should instead consider using
get_ranges
, which will optimise the vectored IO.
read ¶
Read up to size
bytes from the object and return them. As a convenience, if
size is unspecified or None
, all bytes until EOF are returned.
readall ¶
readall() -> Bytes
Read and return all the bytes from the stream until EOF, using multiple calls to the stream if necessary.
readlines ¶
Read all remaining lines into a list of buffers
seek ¶
Change the stream position to the given byte offset, interpreted relative to the position indicated by whence, and return the new absolute position. Values for whence are:
os.SEEK_SET
or 0: start of the stream (the default);offset
should be zero or positiveos.SEEK_CUR
or 1: current stream position;offset
may be negativeos.SEEK_END
or 2: end of the stream;offset
is usually negative
obstore.AsyncReadableFile ¶
An async-buffered reader that implements a similar interface as a Python
BufferedReader
.
Internally this maintains a buffer of the requested size, and uses
get_range
to populate its internal buffer once depleted. This
buffer is cleared on seek.
Whilst simple, this interface will typically be outperformed by the native obstore
methods that better map to the network APIs. This is because most object stores have
very high first-byte latencies, on the order of 100-200ms, and so avoiding
unnecessary round-trips is critical to throughput.
Systems looking to sequentially scan a file should instead consider using
get
, or get_range
to read a particular
range.
Systems looking to read multiple ranges of a file should instead consider using
get_ranges
, which will optimise the vectored IO.
read
async
¶
Read up to size
bytes from the object and return them. As a convenience, if
size is unspecified or None
, all bytes until EOF are returned.
readall
async
¶
readall() -> Bytes
Read and return all the bytes from the stream until EOF, using multiple calls to the stream if necessary.
readline
async
¶
readline() -> Bytes
Read a single line of the file, up until the next newline character.
readlines
async
¶
Read all remaining lines into a list of buffers
seek
async
¶
Change the stream position to the given byte offset, interpreted relative to the position indicated by whence, and return the new absolute position. Values for whence are:
os.SEEK_SET
or 0: start of the stream (the default);offset
should be zero or positiveos.SEEK_CUR
or 1: current stream position;offset
may be negativeos.SEEK_END
or 2: end of the stream;offset
is usually negative
obstore.WritableFile ¶
Bases: AbstractContextManager
A buffered writable file object with synchronous operations.
This implements a similar interface as a Python
BufferedWriter
.
closed ¶
closed() -> bool
Returns True
if the current file has already been closed.
Note that this is a method, not an attribute.
flush ¶
flush() -> None
Flushes this output stream, ensuring that all intermediately buffered contents reach their destination.
write ¶
Write the bytes-like object, buffer
, and return the number of bytes written.
obstore.AsyncWritableFile ¶
Bases: AbstractAsyncContextManager
A buffered writable file object with asynchronous operations.
closed
async
¶
closed() -> bool
Returns True
if the current file has already been closed.
Note that this is an async method, not an attribute.
flush
async
¶
flush() -> None
Flushes this output stream, ensuring that all intermediately buffered contents reach their destination.
write
async
¶
Write the bytes-like object, buffer
, and return the number of bytes written.