Get¶
obstore.get ¶
get(
store: ObjectStore, path: str, *, options: GetOptions | None = None
) -> GetResult
Return the bytes that are stored at the specified location.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
path
(str
) –The path within ObjectStore to retrieve.
-
options
(GetOptions | None
, default:None
) –options for accessing the file. Defaults to None.
Returns:
-
GetResult
–GetResult
obstore.get_async
async
¶
get_async(
store: ObjectStore, path: str, *, options: GetOptions | None = None
) -> GetResult
Call get
asynchronously.
Refer to the documentation for get.
obstore.get_range ¶
get_range(store: ObjectStore, path: str, start: int, end: int) -> Bytes
Return the bytes that are stored at the specified location in the given byte range.
If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
path
(str
) –The path within ObjectStore to retrieve.
-
start
(int
) –The start of the byte range.
-
end
(int
) –The end of the byte range (exclusive).
Returns:
-
Bytes
–A
Bytes
object implementing the Python buffer protocol, allowing zero-copy access to the underlying memory provided by Rust.
obstore.get_range_async
async
¶
get_range_async(store: ObjectStore, path: str, start: int, end: int) -> Bytes
Call get_range
asynchronously.
Refer to the documentation for get_range.
obstore.get_ranges ¶
get_ranges(
store: ObjectStore, path: str, starts: Sequence[int], ends: Sequence[int]
) -> List[Bytes]
Return the bytes that are stored at the specified location in the given byte ranges
To improve performance this will:
- Combine ranges less than 10MB apart into a single call to
fetch
- Make multiple
fetch
requests in parallel (up to maximum of 10)
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
path
(str
) –The path within ObjectStore to retrieve.
-
starts
(Sequence[int]
) –A sequence of
int
where each offset starts. -
ends
(Sequence[int]
) –A sequence of
int
where each offset ends (exclusive).
Returns:
obstore.get_ranges_async
async
¶
get_ranges_async(
store: ObjectStore, path: str, starts: Sequence[int], ends: Sequence[int]
) -> List[Bytes]
Call get_ranges
asynchronously.
Refer to the documentation for get_ranges.
obstore.GetOptions ¶
Bases: TypedDict
Options for a get request.
All options are optional.
if_match
instance-attribute
¶
if_match: str | None
Request will succeed if the ObjectMeta::e_tag
matches
otherwise returning PreconditionError
.
See datatracker.ietf.org/doc/html/rfc9110#name-if-match
Examples:
If-Match: "xyzzy"
If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz"
If-Match: *
if_modified_since
instance-attribute
¶
if_modified_since: datetime | None
Request will succeed if the object has not been modified since
otherwise returning PreconditionError
.
Some stores, such as S3, will only return NotModified
for exact
timestamp matches, instead of for any timestamp greater than or equal.
if_none_match
instance-attribute
¶
if_none_match: str | None
Request will succeed if the ObjectMeta::e_tag
does not match
otherwise returning NotModifiedError
.
See datatracker.ietf.org/doc/html/rfc9110#section-13.1.2
Examples:
If-None-Match: "xyzzy"
If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz"
If-None-Match: *
if_unmodified_since
instance-attribute
¶
if_unmodified_since: datetime | None
Request will succeed if the object has been modified since
range
instance-attribute
¶
range: Tuple[int, int] | List[int] | OffsetRange | SuffixRange
Request transfer of only the specified range of bytes
otherwise returning NotModifiedError
.
The semantics of this tuple are:
-
(int, int)
: Request a specific range of bytes(start, end)
.If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
The
end
offset is exclusive. -
{"offset": int}
: Request all bytes starting from a given byte offset.This is equivalent to
bytes={int}-
as an HTTP header. -
{"suffix": int}
: Request the lastint
bytes. Note that here,int
is the size of the request, not the byte offset. This is equivalent tobytes=-{int}
as an HTTP header.
obstore.GetResult ¶
Result for a get request.
You can materialize the entire buffer by using either bytes
or bytes_async
, or
you can stream the result using stream
. __iter__
and __aiter__
are implemented
as aliases to stream
, so you can alternatively call iter()
or aiter()
on
GetResult
to start an iterator.
Using as an async iterator:
resp = await obs.get_async(store, path)
# 5MB chunk size in stream
stream = resp.stream(min_chunk_size=5 * 1024 * 1024)
async for buf in stream:
print(len(buf))
Using as a sync iterator:
resp = obs.get(store, path)
# 20MB chunk size in stream
stream = resp.stream(min_chunk_size=20 * 1024 * 1024)
for buf in stream:
print(len(buf))
Note that after calling bytes
, bytes_async
, or stream
, you will no longer be
able to call other methods on this object, such as the meta
attribute.
attributes
property
¶
attributes: Attributes
Additional object attributes.
This must be accessed before calling stream
, bytes
, or bytes_async
.
meta
property
¶
meta: ObjectMeta
The ObjectMeta for this object.
This must be accessed before calling stream
, bytes
, or bytes_async
.
range
property
¶
The range of bytes returned by this request.
Note that this is (start, stop)
not (start, length)
.
This must be accessed before calling stream
, bytes
, or bytes_async
.
__aiter__ ¶
__aiter__() -> BytesStream
Return a chunked stream over the result's bytes with the default (10MB) chunk size.
__iter__ ¶
__iter__() -> BytesStream
Return a chunked stream over the result's bytes with the default (10MB) chunk size.
bytes ¶
bytes() -> Bytes
Collects the data into a Bytes
object, which implements the Python buffer
protocol. You can copy the buffer to Python memory by passing to bytes
.
bytes_async
async
¶
bytes_async() -> Bytes
Collects the data into a Bytes
object, which implements the Python buffer
protocol. You can copy the buffer to Python memory by passing to bytes
.
stream ¶
stream(min_chunk_size: int = 10 * 1024 * 1024) -> BytesStream
Return a chunked stream over the result's bytes.
Parameters:
-
min_chunk_size
(int
, default:10 * 1024 * 1024
) –The minimum size in bytes for each chunk in the returned
BytesStream
. All chunks except for the last chunk will be at least this size. Defaults to 10*1024*1024 (10MB).
Returns:
-
BytesStream
–A chunked stream
obstore.BytesStream ¶
An async stream of bytes.
Request timeouts
The underlying stream needs to stay alive until the last chunk is polled. If the file is large, it may exceed the default timeout of 30 seconds. In this case, you may see an error like:
GenericError: Generic {
store: "HTTP",
source: reqwest::Error {
kind: Decode,
source: reqwest::Error {
kind: Body,
source: TimedOut,
},
},
}
To fix this, set the timeout
parameter in the
client_options
passed when creating the store.
obstore.Bytes ¶
Bases: Buffer
A buffer implementing the Python buffer protocol, allowing zero-copy access to underlying Rust memory.
You can pass this to memoryview
for a zero-copy view into the underlying
data or to bytes
to copy the underlying data into a Python bytes
.
Many methods from the Python bytes
class are implemented on this,
isalnum ¶
isalnum() -> bool
Return True
if all bytes in the sequence are alphabetical ASCII characters or
ASCII decimal digits and the sequence is not empty, False
otherwise.
Alphabetic ASCII characters are those byte values in the sequence
b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
. ASCII decimal digits
are those byte values in the sequence b'0123456789'
.
isalpha ¶
isalpha() -> bool
Return True
if all bytes in the sequence are alphabetic ASCII characters and
the sequence is not empty, False
otherwise.
Alphabetic ASCII characters are those byte values in the sequence
b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
.
isascii ¶
isascii() -> bool
Return True
if the sequence is empty or all bytes in the sequence are ASCII,
False
otherwise.
ASCII bytes are in the range 0-0x7F
.
isdigit ¶
isdigit() -> bool
Return True
if all bytes in the sequence are ASCII decimal digits and the
sequence is not empty, False
otherwise.
ASCII decimal digits are those byte values in the sequence b'0123456789'
.
islower ¶
islower() -> bool
Return True
if there is at least one lowercase ASCII character in the sequence
and no uppercase ASCII characters, False
otherwise.
isspace ¶
isspace() -> bool
Return `True` if all bytes in the sequence are ASCII whitespace and the sequence
is not empty, `False` otherwise.
ASCII whitespace characters are those byte values
in the sequence `b'
'` (space, tab, newline, carriage return, vertical tab, form feed).
isupper ¶
isupper() -> bool
Return True
if there is at least one uppercase alphabetic ASCII character in
the sequence and no lowercase ASCII characters, False
otherwise.
lower ¶
lower() -> Bytes
Return a copy of the sequence with all the uppercase ASCII characters converted to their corresponding lowercase counterpart.
removeprefix ¶
removeprefix(prefix: Buffer) -> Bytes
If the binary data starts with the prefix string, return bytes[len(prefix):]
.
Otherwise, return the original binary data.