List¶

obstore.list ¶

list(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
) -> ListStream[Sequence[ObjectMeta]]

list(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[True],
) -> ListStream[RecordBatch]

list(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[False],
) -> ListStream[Sequence[ObjectMeta]]

list(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: bool,
) -> ListStream[RecordBatch] | ListStream[Sequence[ObjectMeta]]

list(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: bool = False,
) -> ListStream[RecordBatch] | ListStream[Sequence[ObjectMeta]]

List all the objects with the given prefix.

Prefixes are evaluated on a path segment basis, i.e. foo/bar/ is a prefix of foo/bar/x but not of foo/bar_baz/x. List is recursive, i.e. foo/bar/more/x will be included.

Examples:

Synchronously iterate through list results:

import obstore as obs
from obstore.store import MemoryStore

store = MemoryStore()
for i in range(100):
    obs.put(store, f"file{i}.txt", b"foo")

stream = obs.list(store, chunk_size=10)
for list_result in stream:
    print(list_result[0])
    # {'path': 'file0.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 19, 28, 781723, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '0', 'version': None}
    break

Asynchronously iterate through list results. Just change for to async for:

stream = obs.list(store, chunk_size=10)
async for list_result in stream:
    print(list_result[2])
    # {'path': 'file10.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 21, 46, 224725, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '10', 'version': None}
    break

Return large list results as Arrow. This is only a performance optimization; it returns the same information in columnar table form. This is most useful with large list operations (you may also want to increase the chunk_size parameter to increase the number of items per emitted record batch).

stream = obs.list(store, chunk_size=1000, return_arrow=True)
# Stream is now an iterable/async iterable of `RecordBatch`es
for batch in stream:
    print(batch.num_rows) # 100

    # If desired, convert to a pyarrow RecordBatch (zero-copy) with
    # `pyarrow.record_batch(batch)`
    break

Collect all list results into a single Arrow RecordBatch.

stream = obs.list(store, return_arrow=True)
batch = stream.collect()

Note

The order of returned ObjectMeta is not guaranteed

Note

There is no async version of this method, because list is not async under the hood, rather it only instantiates a stream, which can be polled in synchronous or asynchronous fashion. See ListStream.

Parameters:

store (ObjectStore) –

The ObjectStore instance to use.
prefix (str | None, default: None ) –

The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

offset (str | None) –

If provided, list all the objects with the given prefix and a location greater than offset. Defaults to None.
chunk_size (int) –

The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in the collect and collect_async methods of ListStream.
return_arrow (bool) –

If True, return each batch of list items as an Apache Arrow RecordBatch instead of a list of Python dicts. This is a performance optimization. Arrow removes serialization overhead between Rust and Python and so setting return_arrow=True can significantly reduce Python interpreter overhead for large list operations. Defaults to False.

If this is True, the arro3-core Python package must be installed.

Returns:

ListStream[RecordBatch] | ListStream[Sequence[ObjectMeta]] –

A ListStream, which you can iterate through to access list results.

obstore.list_with_delimiter ¶

list_with_delimiter(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    return_arrow: Literal[True],
) -> ListResult[Table]

list_with_delimiter(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    return_arrow: Literal[False] = False,
) -> ListResult[Sequence[ObjectMeta]]

list_with_delimiter(
    store: ObjectStore, prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[Table] | ListResult[Sequence[ObjectMeta]]

List objects with the given prefix and an implementation specific delimiter.

Returns common prefixes (directories) in addition to object metadata.

Prefixes are evaluated on a path segment basis, i.e. foo/bar/ is a prefix of foo/bar/x but not of foo/bar_baz/x. This list is not recursive, i.e. foo/bar/more/x will not be included.

Note

Any prefix supplied to this prefix parameter will not be stripped off the paths in the result.

Parameters:

store (ObjectStore) –

The ObjectStore instance to use.
prefix (str | None, default: None ) –

The prefix within ObjectStore to use for listing. Defaults to None.

Other Parameters:

return_arrow (bool) –

If True, return each batch of list items as an Apache Arrow RecordBatch instead of a list of Python dicts. This is a performance optimization. Arrow removes serialization overhead between Rust and Python and so setting return_arrow=True can significantly reduce Python interpreter overhead for large list operations. Defaults to False.

If this is True, the arro3-core Python package must be installed.

Returns:

ListResult[Table] | ListResult[Sequence[ObjectMeta]] –

ListResult

obstore.list_with_delimiter_async `async` ¶

list_with_delimiter_async(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    return_arrow: Literal[True],
) -> ListResult[Table]

list_with_delimiter_async(
    store: ObjectStore,
    prefix: str | None = None,
    *,
    return_arrow: Literal[False] = False,
) -> ListResult[Sequence[ObjectMeta]]

list_with_delimiter_async(
    store: ObjectStore, prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[Table] | ListResult[Sequence[ObjectMeta]]

Call list_with_delimiter asynchronously.

Refer to the documentation for list_with_delimiter.

obstore.ObjectMeta ¶

Bases: TypedDict

The metadata that describes an object.

Not importable at runtime

To use this type hint in your code, import it within a TYPE_CHECKING block:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from obstore import ObjectMeta

e_tag `instance-attribute` ¶

e_tag: str | None

The unique identifier for the object datatracker.ietf.org/doc/html/rfc9110#name-etag

last_modified `instance-attribute` ¶

last_modified: datetime

The last modified time

path `instance-attribute` ¶

path: str

The full path to the object

size `instance-attribute` ¶

size: int

The size in bytes of the object

version `instance-attribute` ¶

version: str | None

A version indicator for this object

obstore.ListResult ¶

Bases: TypedDict, Generic[ListChunkType]

Result of a list call.

Includes objects, prefixes (directories) and a token for the next set of results. Individual result sets may be limited to 1,000 objects based on the underlying object storage's limitations.

This implements obstore.ListResult.

Not importable at runtime

To use this type hint in your code, import it within a TYPE_CHECKING block:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from obstore import ListResult

common_prefixes `instance-attribute` ¶

common_prefixes: Sequence[str]

Prefixes that are common (like directories)

objects `instance-attribute` ¶

objects: ListChunkType

Object metadata for the listing

obstore.ListStream ¶

Bases: Generic[ListChunkType]

A stream of ObjectMeta that can be polled in a sync or async fashion.

This implements obstore.ListStream.

Not importable at runtime

To use this type hint in your code, import it within a TYPE_CHECKING block:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from obstore import ListStream

aiter ¶

__aiter__() -> Self

Return Self as an async iterator.

anext `async` ¶

__anext__() -> ListChunkType

Return the next chunk of ObjectMeta in the stream.

iter ¶

__iter__() -> Self

Return Self as an async iterator.

next ¶

__next__() -> ListChunkType

Return the next chunk of ObjectMeta in the stream.

collect ¶

collect() -> ListChunkType

Collect all remaining ObjectMeta objects in the stream.

This ignores the chunk_size parameter from the list call and collects all remaining data into a single chunk.

collect_async `async` ¶

collect_async() -> ListChunkType

Collect all remaining ObjectMeta objects in the stream.

This ignores the chunk_size parameter from the list call and collects all remaining data into a single chunk.

obstore.ListChunkType `module-attribute` ¶

ListChunkType = TypeVar(
    "ListChunkType", Sequence[ObjectMeta], RecordBatch, Table, covariant=True
)

The data structure used for holding list results.

By default, listing APIs return a list of ObjectMeta. However for improved performance when listing large buckets, you can pass return_arrow=True. Then an Arrow RecordBatch will be returned instead, with columns containing the same information as would be contained in the Python ObjectMeta.

Not importable at runtime

To use this type hint in your code, import it within a TYPE_CHECKING block:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from obstore import ListChunkType

List¶

obstore.list ¶

obstore.list_with_delimiter ¶

obstore.list_with_delimiter_async async ¶

obstore.ObjectMeta ¶

e_tag instance-attribute ¶

last_modified instance-attribute ¶

path instance-attribute ¶

size instance-attribute ¶

version instance-attribute ¶

obstore.ListResult ¶

common_prefixes instance-attribute ¶

objects instance-attribute ¶

obstore.ListStream ¶

__aiter__ ¶

__anext__ async ¶

__iter__ ¶

__next__ ¶

collect ¶

collect_async async ¶

obstore.ListChunkType module-attribute ¶

obstore.list_with_delimiter_async `async` ¶

e_tag `instance-attribute` ¶

last_modified `instance-attribute` ¶

path `instance-attribute` ¶

size `instance-attribute` ¶

version `instance-attribute` ¶

common_prefixes `instance-attribute` ¶

objects `instance-attribute` ¶

aiter ¶

anext `async` ¶

iter ¶

next ¶

collect_async `async` ¶

obstore.ListChunkType `module-attribute` ¶