Skip to content

Get

obstore.get

get(
    store: ObjectStore, path: str, *, options: GetOptions | None = None
) -> GetResult

Return the bytes that are stored at the specified location.

Parameters:

  • store (ObjectStore) –

    The ObjectStore instance to use.

  • path (str) –

    The path within ObjectStore to retrieve.

  • options (GetOptions | None, default: None ) –

    options for accessing the file. Defaults to None.

Returns:

obstore.get_async async

get_async(
    store: ObjectStore, path: str, *, options: GetOptions | None = None
) -> GetResult

Call get asynchronously.

Refer to the documentation for get.

obstore.get_range

get_range(store: ObjectStore, path: str, start: int, end: int) -> Bytes

Return the bytes that are stored at the specified location in the given byte range.

If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.

Parameters:

  • store (ObjectStore) –

    The ObjectStore instance to use.

  • path (str) –

    The path within ObjectStore to retrieve.

  • start (int) –

    The start of the byte range.

  • end (int) –

    The end of the byte range (exclusive).

Returns:

  • Bytes

    A Bytes object implementing the Python buffer protocol, allowing zero-copy access to the underlying memory provided by Rust.

obstore.get_range_async async

get_range_async(store: ObjectStore, path: str, start: int, end: int) -> Bytes

Call get_range asynchronously.

Refer to the documentation for get_range.

obstore.get_ranges

get_ranges(
    store: ObjectStore, path: str, starts: Sequence[int], ends: Sequence[int]
) -> List[Bytes]

Return the bytes that are stored at the specified location in the given byte ranges

To improve performance this will:

  • Combine ranges less than 10MB apart into a single call to fetch
  • Make multiple fetch requests in parallel (up to maximum of 10)

Parameters:

  • store (ObjectStore) –

    The ObjectStore instance to use.

  • path (str) –

    The path within ObjectStore to retrieve.

  • starts (Sequence[int]) –

    A sequence of int where each offset starts.

  • ends (Sequence[int]) –

    A sequence of int where each offset ends (exclusive).

Returns:

  • List[Bytes]

    A sequence of Bytes, one for each range. This Bytes object implements the Python buffer protocol, allowing zero-copy access to the underlying memory provided by Rust.

obstore.get_ranges_async async

get_ranges_async(
    store: ObjectStore, path: str, starts: Sequence[int], ends: Sequence[int]
) -> List[Bytes]

Call get_ranges asynchronously.

Refer to the documentation for get_ranges.

obstore.GetOptions

Bases: TypedDict

Options for a get request.

All options are optional.

head instance-attribute

head: bool

Request transfer of no content

datatracker.ietf.org/doc/html/rfc9110#name-head

if_match instance-attribute

if_match: str | None

Request will succeed if the ObjectMeta::e_tag matches otherwise returning PreconditionError.

See datatracker.ietf.org/doc/html/rfc9110#name-if-match

Examples:

If-Match: "xyzzy"
If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz"
If-Match: *

if_modified_since instance-attribute

if_modified_since: datetime | None

Request will succeed if the object has not been modified since otherwise returning PreconditionError.

Some stores, such as S3, will only return NotModified for exact timestamp matches, instead of for any timestamp greater than or equal.

datatracker.ietf.org/doc/html/rfc9110#section-13.1.4

if_none_match instance-attribute

if_none_match: str | None

Request will succeed if the ObjectMeta::e_tag does not match otherwise returning NotModifiedError.

See datatracker.ietf.org/doc/html/rfc9110#section-13.1.2

Examples:

If-None-Match: "xyzzy"
If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz"
If-None-Match: *

if_unmodified_since instance-attribute

if_unmodified_since: datetime | None

Request will succeed if the object has been modified since

datatracker.ietf.org/doc/html/rfc9110#section-13.1.3

range instance-attribute

Request transfer of only the specified range of bytes otherwise returning NotModifiedError.

The semantics of this tuple are:

  • (int, int): Request a specific range of bytes (start, end).

    If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.

    The end offset is exclusive.

  • {"offset": int}: Request all bytes starting from a given byte offset.

    This is equivalent to bytes={int}- as an HTTP header.

  • {"suffix": int}: Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

datatracker.ietf.org/doc/html/rfc9110#name-range

version instance-attribute

version: str | None

Request a particular object version

obstore.GetResult

Result for a get request.

You can materialize the entire buffer by using either bytes or bytes_async, or you can stream the result using stream. __iter__ and __aiter__ are implemented as aliases to stream, so you can alternatively call iter() or aiter() on GetResult to start an iterator.

Using as an async iterator:

resp = await obs.get_async(store, path)
# 5MB chunk size in stream
stream = resp.stream(min_chunk_size=5 * 1024 * 1024)
async for buf in stream:
    print(len(buf))

Using as a sync iterator:

resp = obs.get(store, path)
# 20MB chunk size in stream
stream = resp.stream(min_chunk_size=20 * 1024 * 1024)
for buf in stream:
    print(len(buf))

Note that after calling bytes, bytes_async, or stream, you will no longer be able to call other methods on this object, such as the meta attribute.

attributes property

attributes: Attributes

Additional object attributes.

This must be accessed before calling stream, bytes, or bytes_async.

meta property

meta: ObjectMeta

The ObjectMeta for this object.

This must be accessed before calling stream, bytes, or bytes_async.

range property

range: Tuple[int, int]

The range of bytes returned by this request.

Note that this is (start, stop) not (start, length).

This must be accessed before calling stream, bytes, or bytes_async.

__aiter__

__aiter__() -> BytesStream

Return a chunked stream over the result's bytes with the default (10MB) chunk size.

__iter__

__iter__() -> BytesStream

Return a chunked stream over the result's bytes with the default (10MB) chunk size.

bytes

bytes() -> Bytes

Collects the data into a Bytes object, which implements the Python buffer protocol. You can copy the buffer to Python memory by passing to bytes.

bytes_async async

bytes_async() -> Bytes

Collects the data into a Bytes object, which implements the Python buffer protocol. You can copy the buffer to Python memory by passing to bytes.

stream

stream(min_chunk_size: int = 10 * 1024 * 1024) -> BytesStream

Return a chunked stream over the result's bytes.

Parameters:

  • min_chunk_size (int, default: 10 * 1024 * 1024 ) –

    The minimum size in bytes for each chunk in the returned BytesStream. All chunks except for the last chunk will be at least this size. Defaults to 10*1024*1024 (10MB).

Returns:

obstore.BytesStream

An async stream of bytes.

Request timeouts

The underlying stream needs to stay alive until the last chunk is polled. If the file is large, it may exceed the default timeout of 30 seconds. In this case, you may see an error like:

GenericError: Generic {
    store: "HTTP",
    source: reqwest::Error {
        kind: Decode,
        source: reqwest::Error {
            kind: Body,
            source: TimedOut,
        },
    },
}

To fix this, set the timeout parameter in the client_options passed when creating the store.

__aiter__

__aiter__() -> BytesStream

Return Self as an async iterator.

__anext__ async

__anext__() -> bytes

Return the next chunk of bytes in the stream.

__iter__

__iter__() -> BytesStream

Return Self as an async iterator.

__next__

__next__() -> bytes

Return the next chunk of bytes in the stream.

obstore.Bytes

Bases: Buffer

A buffer implementing the Python buffer protocol, allowing zero-copy access to underlying Rust memory.

You can pass this to memoryview for a zero-copy view into the underlying data or to bytes to copy the underlying data into a Python bytes.

Many methods from the Python bytes class are implemented on this,

isalnum

isalnum() -> bool

Return True if all bytes in the sequence are alphabetical ASCII characters or ASCII decimal digits and the sequence is not empty, False otherwise.

Alphabetic ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'. ASCII decimal digits are those byte values in the sequence b'0123456789'.

isalpha

isalpha() -> bool

Return True if all bytes in the sequence are alphabetic ASCII characters and the sequence is not empty, False otherwise.

Alphabetic ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'.

isascii

isascii() -> bool

Return True if the sequence is empty or all bytes in the sequence are ASCII, False otherwise.

ASCII bytes are in the range 0-0x7F.

isdigit

isdigit() -> bool

Return True if all bytes in the sequence are ASCII decimal digits and the sequence is not empty, False otherwise.

ASCII decimal digits are those byte values in the sequence b'0123456789'.

islower

islower() -> bool

Return True if there is at least one lowercase ASCII character in the sequence and no uppercase ASCII characters, False otherwise.

isspace

isspace() -> bool
 Return `True` if all bytes in the sequence are ASCII whitespace and the sequence
 is not empty, `False` otherwise.

 ASCII whitespace characters are those byte values
 in the sequence `b'

'` (space, tab, newline, carriage return, vertical tab, form feed).

isupper

isupper() -> bool

Return True if there is at least one uppercase alphabetic ASCII character in the sequence and no lowercase ASCII characters, False otherwise.

lower

lower() -> Bytes

Return a copy of the sequence with all the uppercase ASCII characters converted to their corresponding lowercase counterpart.

removeprefix

removeprefix(prefix: Buffer) -> Bytes

If the binary data starts with the prefix string, return bytes[len(prefix):]. Otherwise, return the original binary data.

removesuffix

removesuffix(suffix: Buffer) -> Bytes

If the binary data ends with the suffix string and that suffix is not empty, return bytes[:-len(suffix)]. Otherwise, return the original binary data.

to_bytes

to_bytes() -> bytes

Copy this buffer's contents into a Python bytes object.

upper

upper() -> Bytes

Return a copy of the sequence with all the lowercase ASCII characters converted to their corresponding uppercase counterpart.

obstore.OffsetRange

Bases: TypedDict

Request all bytes starting from a given byte offset

offset instance-attribute

offset: int

The byte offset for the offset range request.

obstore.SuffixRange

Bases: TypedDict

Request up to the last n bytes

suffix instance-attribute

suffix: int

The number of bytes from the suffix to request.