Skip to content

List

obspec.List

Bases: Protocol

list

list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[True],
) -> ListIterator[ArrowArrayExportable]
list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[False] = False,
) -> ListIterator[list[ObjectMeta]]
list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: bool = False,
) -> ListIterator[ArrowArrayExportable] | ListIterator[list[ObjectMeta]]

List all the objects with the given prefix.

Prefixes are evaluated on a path segment basis, i.e. foo/bar/ is a prefix of foo/bar/x but not of foo/bar_baz/x. List is recursive, i.e. foo/bar/more/x will be included.

Examples:

Synchronously iterate through list results:

import obstore as obs
from obstore.store import MemoryStore

store = MemoryStore()
for i in range(100):
    obs.put(store, f"file{i}.txt", b"foo")

stream = obs.list(store, chunk_size=10)
for list_result in stream:
    print(list_result[0])
    # {'path': 'file0.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 19, 28, 781723, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '0', 'version': None}
    break

Return large list results as Arrow. This is most useful with large list operations. In this case you may want to increase the chunk_size parameter.

stream = obs.list(store, chunk_size=1000, return_arrow=True)
# Stream is now an iterable/async iterable of `RecordBatch`es
for batch in stream:
    print(batch.num_rows) # 100

    # If desired, convert to a pyarrow RecordBatch (zero-copy) with
    # `pyarrow.record_batch(batch)`
    break

Collect all list results into a single Arrow RecordBatch.

stream = obs.list(store, return_arrow=True)
batch = stream.collect()

Note

The order of returned ObjectMeta is not guaranteed

Parameters:

  • prefix (str | None, default: None ) –

    The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

  • offset (str | None) –

    If provided, list all the objects with the given prefix and a location greater than offset. Defaults to None.

  • chunk_size (int) –

    The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in collect.

  • return_arrow (bool) –

    If True, return each batch of list items as an Arrow RecordBatch, not as a list of Python dicts. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults to False.

Returns:

obspec.ListAsync

Bases: Protocol

list_async

list_async(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[True],
) -> ListStream[ArrowArrayExportable]
list_async(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[False] = False,
) -> ListStream[list[ObjectMeta]]
list_async(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: bool = False,
) -> ListStream[ArrowArrayExportable] | ListStream[list[ObjectMeta]]

List all the objects with the given prefix.

Note that this method itself is not async. It's a synchronous method but returns an async iterator.

Refer to obspec.List for more information about list semantics.

Examples:

Asynchronously iterate through list results. Just change for to async for:

stream = obs.list_async(store, chunk_size=10)
async for list_result in stream:
    print(list_result[2])
    # {'path': 'file10.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 21, 46, 224725, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '10', 'version': None}
    break

Note

The order of returned ObjectMeta is not guaranteed

Parameters:

  • prefix (str | None, default: None ) –

    The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

  • offset (str | None) –

    If provided, list all the objects with the given prefix and a location greater than offset. Defaults to None.

  • chunk_size (int) –

    The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in collect_async.

  • return_arrow (bool) –

    If True, return each batch of list items as an Arrow RecordBatch, not as a list of Python dicts. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults to False.

Returns:

obspec.ListWithDelimiter

Bases: Protocol

list_with_delimiter

list_with_delimiter(
    prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[ArrowStreamExportable]
list_with_delimiter(
    prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter(
    prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]]

List objects with the given prefix and an implementation specific delimiter.

Returns common prefixes (directories) in addition to object metadata.

Prefixes are evaluated on a path segment basis, i.e. foo/bar/ is a prefix of foo/bar/x but not of foo/bar_baz/x. This list is not recursive, i.e. foo/bar/more/x will not be included.

Note

Any prefix supplied to this prefix parameter will not be stripped off the paths in the result.

Parameters:

  • prefix (str | None, default: None ) –

    The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

  • return_arrow (bool) –

    If True, return list results as an Arrow Table, not as a list of Python dicts. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults to False.

Returns:

obspec.ListWithDelimiterAsync

Bases: Protocol

list_with_delimiter_async async

list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[ArrowStreamExportable]
list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]]

Call list_with_delimiter asynchronously.

Refer to the documentation for ListWithDelimiter.

obspec.ListResult

Bases: TypedDict, Generic[ListChunkType_co]

Result of a list_with_delimiter call.

Includes objects, prefixes (directories) and a token for the next set of results. Individual result sets may be limited to 1,000 objects based on the underlying object storage's limitations.

common_prefixes instance-attribute

common_prefixes: list[str]

Prefixes that are common (like directories)

objects instance-attribute

Object metadata for the listing

obspec.ListIterator

Bases: Protocol[ListChunkType_co]

A stream of ObjectMeta that can be polled synchronously.

__iter__

__iter__() -> Self

Return Self as an async iterator.

__next__

__next__() -> ListChunkType_co

Return the next chunk of ObjectMeta in the stream.

collect

collect() -> ListChunkType_co

Collect all remaining ObjectMeta objects in the stream.

This ignores the chunk_size parameter from the list call and collects all remaining data into a single chunk.

obspec.ListStream

Bases: Protocol[ListChunkType_co]

A stream of ObjectMeta that can be polled asynchronously.

__aiter__

__aiter__() -> Self

Return Self as an async iterator.

__anext__ async

__anext__() -> ListChunkType_co

Return the next chunk of ObjectMeta in the stream.

collect_async async

collect_async() -> ListChunkType_co

Collect all remaining ObjectMeta objects in the stream.

This ignores the chunk_size parameter from the list call and collects all remaining data into a single chunk.

obspec.ListChunkType_co module-attribute

ListChunkType_co = TypeVar(
    "ListChunkType_co",
    list[ObjectMeta],
    ArrowArrayExportable,
    ArrowStreamExportable,
    covariant=True,
)

The data structure used for holding list results.

By default, listing APIs return a list of ObjectMeta. However for improved performance when listing large buckets, you can pass return_arrow=True. Then an Arrow RecordBatch will be returned instead.