List¶

obspec.List ¶

Bases: Protocol

list ¶

list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[True],
) -> ListIterator[ArrowArrayExportable]

list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[False] = False,
) -> ListIterator[list[ObjectMeta]]

list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: bool = False,
) -> ListIterator[ArrowArrayExportable] | ListIterator[list[ObjectMeta]]

List all the objects with the given prefix.

Prefixes are evaluated on a path segment basis, i.e. foo/bar/ is a prefix of foo/bar/x but not of foo/bar_baz/x. List is recursive, i.e. foo/bar/more/x will be included.

Examples:

Synchronously iterate through list results:

import obstore as obs
from obstore.store import MemoryStore

store = MemoryStore()
for i in range(100):
    obs.put(store, f"file{i}.txt", b"foo")

stream = obs.list(store, chunk_size=10)
for list_result in stream:
    print(list_result[0])
    # {'path': 'file0.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 19, 28, 781723, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '0', 'version': None}
    break

Return large list results as Arrow. This is most useful with large list operations. In this case you may want to increase the chunk_size parameter.

stream = obs.list(store, chunk_size=1000, return_arrow=True)
# Stream is now an iterable/async iterable of `RecordBatch`es
for batch in stream:
    print(batch.num_rows) # 100

    # If desired, convert to a pyarrow RecordBatch (zero-copy) with
    # `pyarrow.record_batch(batch)`
    break

Collect all list results into a single Arrow RecordBatch.

stream = obs.list(store, return_arrow=True)
batch = stream.collect()

Note

The order of returned ObjectMeta is not guaranteed

Parameters:

prefix (str | None, default: None ) –

The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

offset (str | None) –

If provided, list all the objects with the given prefix and a location greater than offset. Defaults to None.
chunk_size (int) –

The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in collect.
return_arrow (bool) –

If True, return each batch of list items as an Arrow RecordBatch, not as a list of Python dicts. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults to False.

Returns:

ListIterator[ArrowArrayExportable] | ListIterator[list[ObjectMeta]] –

A ListStream, which you can iterate through to access list results.

obspec.ListAsync ¶

Bases: Protocol

list_async ¶

list_async(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[True],
) -> ListStream[ArrowArrayExportable]

list_async(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[False] = False,
) -> ListStream[list[ObjectMeta]]

list_async(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: bool = False,
) -> ListStream[ArrowArrayExportable] | ListStream[list[ObjectMeta]]

List all the objects with the given prefix.

Note that this method itself is not async. It's a synchronous method but returns an async iterator.

Refer to obspec.List for more information about list semantics.

Examples:

Asynchronously iterate through list results. Just change for to async for:

stream = obs.list_async(store, chunk_size=10)
async for list_result in stream:
    print(list_result[2])
    # {'path': 'file10.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 21, 46, 224725, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '10', 'version': None}
    break

Note

The order of returned ObjectMeta is not guaranteed

Parameters:

prefix (str | None, default: None ) –

The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

offset (str | None) –

If provided, list all the objects with the given prefix and a location greater than offset. Defaults to None.
chunk_size (int) –

The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in collect_async.
return_arrow (bool) –

If True, return each batch of list items as an Arrow RecordBatch, not as a list of Python dicts. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults to False.

Returns:

ListStream[ArrowArrayExportable] | ListStream[list[ObjectMeta]] –

A ListStream, which you can iterate through to access list results.

obspec.ListWithDelimiter ¶

Bases: Protocol

list_with_delimiter ¶

list_with_delimiter(
    prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[ArrowStreamExportable]

list_with_delimiter(
    prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]

list_with_delimiter(
    prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]]

List objects with the given prefix and an implementation specific delimiter.

Returns common prefixes (directories) in addition to object metadata.

Prefixes are evaluated on a path segment basis, i.e. foo/bar/ is a prefix of foo/bar/x but not of foo/bar_baz/x. This list is not recursive, i.e. foo/bar/more/x will not be included.

Note

Any prefix supplied to this prefix parameter will not be stripped off the paths in the result.

Parameters:

prefix (str | None, default: None ) –

The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

return_arrow (bool) –

If True, return list results as an Arrow Table, not as a list of Python dicts. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults to False.

Returns:

ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]] –

ListResult

obspec.ListWithDelimiterAsync ¶

Bases: Protocol

list_with_delimiter_async `async` ¶

list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[ArrowStreamExportable]

list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]

list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[ArrowStreamExportable] | ListResult[list[ObjectMeta]]

Call list_with_delimiter asynchronously.

Refer to the documentation for ListWithDelimiter.

obspec.ListResult ¶

Bases: TypedDict, Generic[ListChunkType_co]

Result of a list_with_delimiter call.

Includes objects, prefixes (directories) and a token for the next set of results. Individual result sets may be limited to 1,000 objects based on the underlying object storage's limitations.

common_prefixes `instance-attribute` ¶

common_prefixes: list[str]

Prefixes that are common (like directories)

objects `instance-attribute` ¶

objects: ListChunkType_co

Object metadata for the listing

obspec.ListIterator ¶

Bases: Protocol[ListChunkType_co]

A stream of ObjectMeta that can be polled synchronously.

iter ¶

__iter__() -> Self

Return Self as an async iterator.

next ¶

__next__() -> ListChunkType_co

Return the next chunk of ObjectMeta in the stream.

collect ¶

collect() -> ListChunkType_co

Collect all remaining ObjectMeta objects in the stream.

This ignores the chunk_size parameter from the list call and collects all remaining data into a single chunk.

obspec.ListStream ¶

Bases: Protocol[ListChunkType_co]

A stream of ObjectMeta that can be polled asynchronously.

aiter ¶

__aiter__() -> Self

Return Self as an async iterator.

anext `async` ¶

__anext__() -> ListChunkType_co

Return the next chunk of ObjectMeta in the stream.

collect_async `async` ¶

collect_async() -> ListChunkType_co

Collect all remaining ObjectMeta objects in the stream.

This ignores the chunk_size parameter from the list call and collects all remaining data into a single chunk.

obspec.ListChunkType_co `module-attribute` ¶

ListChunkType_co = TypeVar(
    "ListChunkType_co",
    list[ObjectMeta],
    ArrowArrayExportable,
    ArrowStreamExportable,
    covariant=True,
)

The data structure used for holding list results.

By default, listing APIs return a list of ObjectMeta. However for improved performance when listing large buckets, you can pass return_arrow=True. Then an Arrow RecordBatch will be returned instead.

List¶

obspec.List ¶

list ¶

obspec.ListAsync ¶

list_async ¶

obspec.ListWithDelimiter ¶

list_with_delimiter ¶

obspec.ListWithDelimiterAsync ¶

list_with_delimiter_async async ¶

obspec.ListResult ¶

common_prefixes instance-attribute ¶

objects instance-attribute ¶

obspec.ListIterator ¶

__iter__ ¶

__next__ ¶

collect ¶

obspec.ListStream ¶

__aiter__ ¶

__anext__ async ¶

collect_async async ¶

obspec.ListChunkType_co module-attribute ¶

list_with_delimiter_async `async` ¶

common_prefixes `instance-attribute` ¶

objects `instance-attribute` ¶

iter ¶

next ¶

aiter ¶

anext `async` ¶

collect_async `async` ¶

obspec.ListChunkType_co `module-attribute` ¶