Skip to content

File-like Object

Native support for reading from object stores as a file-like object.

Use obstore.open_reader or obstore.open_reader_async to open readable files. Use obstore.open_writer or obstore.open_writer_async to open writable files.

obstore.open_reader

open_reader(
    store: ObjectStore, path: str, *, buffer_size: int = 1024 * 1024
) -> ReadableFile

Open a readable file object from the specified location.

Parameters:

  • store (ObjectStore) –

    The ObjectStore instance to use.

  • path (str) –

    The path within ObjectStore to retrieve.

Other Parameters:

  • buffer_size (int) –

    The number of bytes to read in a single request. Up to buffer_size bytes will be buffered in memory. If buffer_size is exceeded, data will be uploaded as a multipart upload in chunks of buffer_size.

Returns:

obstore.open_reader_async async

open_reader_async(
    store: ObjectStore, path: str, *, buffer_size: int = 1024 * 1024
) -> AsyncReadableFile

Call open_reader asynchronously, returning a readable file object with asynchronous operations.

Refer to the documentation for open_reader.

obstore.open_writer

open_writer(
    store: ObjectStore,
    path: str,
    *,
    attributes: Attributes | None = None,
    buffer_size: int = 10 * 1024 * 1024,
    tags: Dict[str, str] | None = None,
    max_concurrency: int = 12,
) -> WritableFile

Open a writable file object at the specified location.

Parameters:

  • store (ObjectStore) –

    The ObjectStore instance to use.

  • path (str) –

    The path within ObjectStore to retrieve.

Other Parameters:

  • attributes (Attributes | None) –

    Provide a set of Attributes. Defaults to None.

  • buffer_size (int) –

    The underlying buffer size to use. Up to buffer_size bytes will be buffered in memory. If buffer_size is exceeded, data will be uploaded as a multipart upload in chunks of buffer_size.

  • tags (Dict[str, str] | None) –

    Provide tags for this object. Defaults to None.

  • max_concurrency (int) –

    The maximum number of chunks to upload concurrently. Defaults to 12.

Returns:

obstore.open_writer_async

open_writer_async(
    store: ObjectStore,
    path: str,
    *,
    attributes: Attributes | None = None,
    buffer_size: int = 10 * 1024 * 1024,
    tags: Dict[str, str] | None = None,
    max_concurrency: int = 12,
) -> AsyncWritableFile

Open an asynchronous writable file object at the specified location.

Refer to the documentation for open_writer.

obstore.ReadableFile

A synchronous-buffered reader that implements a similar interface as a Python BufferedReader.

Internally this maintains a buffer of the requested size, and uses get_range to populate its internal buffer once depleted. This buffer is cleared on seek.

Whilst simple, this interface will typically be outperformed by the native obstore methods that better map to the network APIs. This is because most object stores have very high first-byte latencies, on the order of 100-200ms, and so avoiding unnecessary round-trips is critical to throughput.

Systems looking to sequentially scan a file should instead consider using get, or get_range to read a particular range.

Systems looking to read multiple ranges of a file should instead consider using get_ranges, which will optimise the vectored IO.

meta property

meta: ObjectMeta

Access the metadata of the underlying file

size property

size: int

The size in bytes of the object.

close

close() -> None

Close the current file.

This is currently a no-op.

read

read(size: int | None = None) -> Bytes

Read up to size bytes from the object and return them. As a convenience, if size is unspecified or None, all bytes until EOF are returned.

readall

readall() -> Bytes

Read and return all the bytes from the stream until EOF, using multiple calls to the stream if necessary.

readline

readline() -> Bytes

Read a single line of the file, up until the next newline character.

readlines

readlines(hint: int = -1) -> List[Bytes]

Read all remaining lines into a list of buffers

seek

seek(offset: int, whence: int = SEEK_SET) -> int

Change the stream position to the given byte offset, interpreted relative to the position indicated by whence, and return the new absolute position. Values for whence are:

  • os.SEEK_SET or 0: start of the stream (the default); offset should be zero or positive
  • os.SEEK_CUR or 1: current stream position; offset may be negative
  • os.SEEK_END or 2: end of the stream; offset is usually negative

seekable

seekable() -> bool

Return True if the stream supports random access.

tell

tell() -> int

Return the current stream position.

obstore.AsyncReadableFile

An async-buffered reader that implements a similar interface as a Python BufferedReader.

Internally this maintains a buffer of the requested size, and uses get_range to populate its internal buffer once depleted. This buffer is cleared on seek.

Whilst simple, this interface will typically be outperformed by the native obstore methods that better map to the network APIs. This is because most object stores have very high first-byte latencies, on the order of 100-200ms, and so avoiding unnecessary round-trips is critical to throughput.

Systems looking to sequentially scan a file should instead consider using get, or get_range to read a particular range.

Systems looking to read multiple ranges of a file should instead consider using get_ranges, which will optimise the vectored IO.

meta property

meta: ObjectMeta

Access the metadata of the underlying file

size property

size: int

The size in bytes of the object.

close

close() -> None

Close the current file.

This is currently a no-op.

read async

read(size: int | None = None) -> Bytes

Read up to size bytes from the object and return them. As a convenience, if size is unspecified or None, all bytes until EOF are returned.

readall async

readall() -> Bytes

Read and return all the bytes from the stream until EOF, using multiple calls to the stream if necessary.

readline async

readline() -> Bytes

Read a single line of the file, up until the next newline character.

readlines async

readlines(hint: int = -1) -> List[Bytes]

Read all remaining lines into a list of buffers

seek async

seek(offset: int, whence: int = SEEK_SET) -> int

Change the stream position to the given byte offset, interpreted relative to the position indicated by whence, and return the new absolute position. Values for whence are:

  • os.SEEK_SET or 0: start of the stream (the default); offset should be zero or positive
  • os.SEEK_CUR or 1: current stream position; offset may be negative
  • os.SEEK_END or 2: end of the stream; offset is usually negative

seekable

seekable() -> bool

Return True if the stream supports random access.

tell async

tell() -> int

Return the current stream position.

obstore.WritableFile

Bases: AbstractContextManager

A buffered writable file object with synchronous operations.

This implements a similar interface as a Python BufferedWriter.

close

close() -> None

Close the current file.

closed

closed() -> bool

Returns True if the current file has already been closed.

Note that this is a method, not an attribute.

flush

flush() -> None

Flushes this output stream, ensuring that all intermediately buffered contents reach their destination.

write

write(buffer: bytes | Buffer) -> int

Write the bytes-like object, buffer, and return the number of bytes written.

obstore.AsyncWritableFile

Bases: AbstractAsyncContextManager

A buffered writable file object with asynchronous operations.

close async

close() -> None

Close the current file.

closed async

closed() -> bool

Returns True if the current file has already been closed.

Note that this is an async method, not an attribute.

flush async

flush() -> None

Flushes this output stream, ensuring that all intermediately buffered contents reach their destination.

write async

write(buffer: bytes | Buffer) -> int

Write the bytes-like object, buffer, and return the number of bytes written.