List¶
obspec.List ¶
Bases: Protocol
list ¶
list(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[True],
) -> ListIterator[ArrowArrayExportable]
list(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[False] = False,
) -> ListIterator[list[ObjectMeta]]
list(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: bool = False,
) -> ListIterator[ArrowArrayExportable] | ListIterator[list[ObjectMeta]]
List all the objects with the given prefix.
Prefixes are evaluated on a path segment basis, i.e. foo/bar/
is a prefix of
foo/bar/x
but not of foo/bar_baz/x
. List is recursive, i.e. foo/bar/more/x
will be included.
Examples:
Synchronously iterate through list results:
import obstore as obs
from obstore.store import MemoryStore
store = MemoryStore()
for i in range(100):
obs.put(store, f"file{i}.txt", b"foo")
stream = obs.list(store, chunk_size=10)
for list_result in stream:
print(list_result[0])
# {'path': 'file0.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 19, 28, 781723, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '0', 'version': None}
break
Return large list results as Arrow. This is most
useful with large list operations. In this case you may want to increase the
chunk_size
parameter.
stream = obs.list(store, chunk_size=1000, return_arrow=True)
# Stream is now an iterable/async iterable of `RecordBatch`es
for batch in stream:
print(batch.num_rows) # 100
# If desired, convert to a pyarrow RecordBatch (zero-copy) with
# `pyarrow.record_batch(batch)`
break
Collect all list results into a single Arrow RecordBatch
.
stream = obs.list(store, return_arrow=True)
batch = stream.collect()
Note
The order of returned ObjectMeta
is not
guaranteed
Parameters:
-
prefix
(str | None
, default:None
) –The prefix within ObjectStore to use for listing. Defaults to None.
Other Parameters:
-
offset
(str | None
) –If provided, list all the objects with the given prefix and a location greater than
offset
. Defaults toNone
. -
chunk_size
(int
) –The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in
collect
. -
return_arrow
(bool
) –If
True
, return each batch of list items as an ArrowRecordBatch
, not as a list of Pythondict
s. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults toFalse
.
Returns:
-
ListIterator[ArrowArrayExportable] | ListIterator[list[ObjectMeta]]
–A ListStream, which you can iterate through to access list results.
obspec.ListAsync ¶
Bases: Protocol
list_async ¶
list_async(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[True],
) -> ListStream[ArrowArrayExportable]
list_async(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[False] = False,
) -> ListStream[list[ObjectMeta]]
list_async(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: bool = False,
) -> ListStream[ArrowArrayExportable] | ListStream[list[ObjectMeta]]
List all the objects with the given prefix.
Note that this method itself is not async. It's a synchronous method but returns an async iterator.
Refer to obspec.List for more information about list semantics.
Examples:
Asynchronously iterate through list results. Just change for
to async for
:
stream = obs.list_async(store, chunk_size=10)
async for list_result in stream:
print(list_result[2])
# {'path': 'file10.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 21, 46, 224725, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '10', 'version': None}
break
Note
The order of returned ObjectMeta
is not
guaranteed
Parameters:
-
prefix
(str | None
, default:None
) –The prefix within ObjectStore to use for listing. Defaults to None.
Other Parameters:
-
offset
(str | None
) –If provided, list all the objects with the given prefix and a location greater than
offset
. Defaults toNone
. -
chunk_size
(int
) –The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in
collect_async
. -
return_arrow
(bool
) –If
True
, return each batch of list items as an ArrowRecordBatch
, not as a list of Pythondict
s. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults toFalse
.
Returns:
-
ListStream[ArrowArrayExportable] | ListStream[list[ObjectMeta]]
–A ListStream, which you can iterate through to access list results.
obspec.ListWithDelimiter ¶
Bases: Protocol
list_with_delimiter ¶
list_with_delimiter(
prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[ArrowStreamExportable]
list_with_delimiter(
prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter(
prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]]
List objects with the given prefix and an implementation specific delimiter.
Returns common prefixes (directories) in addition to object metadata.
Prefixes are evaluated on a path segment basis, i.e. foo/bar/
is a prefix of
foo/bar/x
but not of foo/bar_baz/x
. This list is not recursive, i.e.
foo/bar/more/x
will not be included.
Note
Any prefix supplied to this prefix
parameter will not be stripped off
the paths in the result.
Parameters:
-
prefix
(str | None
, default:None
) –The prefix within ObjectStore to use for listing. Defaults to None.
Other Parameters:
-
return_arrow
(bool
) –If
True
, return list results as an ArrowTable
, not as a list of Pythondict
s. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults toFalse
.
Returns:
-
ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]]
–ListResult
obspec.ListWithDelimiterAsync ¶
Bases: Protocol
list_with_delimiter_async
async
¶
list_with_delimiter_async(
prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[ArrowStreamExportable]
list_with_delimiter_async(
prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter_async(
prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]]
Call list_with_delimiter
asynchronously.
Refer to the documentation for ListWithDelimiter.
obspec.ListResult ¶
Bases: TypedDict
, Generic[ListChunkType_co]
Result of a list_with_delimiter
call.
Includes objects, prefixes (directories) and a token for the next set of results. Individual result sets may be limited to 1,000 objects based on the underlying object storage's limitations.
common_prefixes
instance-attribute
¶
Prefixes that are common (like directories)
obspec.ListIterator ¶
Bases: Protocol[ListChunkType_co]
A stream of ObjectMeta that can be polled synchronously.
collect ¶
collect() -> ListChunkType_co
Collect all remaining ObjectMeta objects in the stream.
This ignores the chunk_size
parameter from the list
call and collects all
remaining data into a single chunk.
obspec.ListStream ¶
Bases: Protocol[ListChunkType_co]
A stream of ObjectMeta that can be polled asynchronously.
__anext__
async
¶
__anext__() -> ListChunkType_co
Return the next chunk of ObjectMeta in the stream.
collect_async
async
¶
collect_async() -> ListChunkType_co
Collect all remaining ObjectMeta objects in the stream.
This ignores the chunk_size
parameter from the list
call and collects all
remaining data into a single chunk.
obspec.ListChunkType_co
module-attribute
¶
ListChunkType_co = TypeVar(
"ListChunkType_co",
list[ObjectMeta],
ArrowArrayExportable,
ArrowStreamExportable,
covariant=True,
)
The data structure used for holding list results.
By default, listing APIs return a list
of ObjectMeta
. However
for improved performance when listing large buckets, you can pass return_arrow=True
.
Then an Arrow RecordBatch
will be returned instead.