List¶
obstore.list ¶
list(
store: ObjectStore,
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[True]
) -> ListStream[RecordBatch]
list(
store: ObjectStore,
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[False] = False
) -> ListStream[List[ObjectMeta]]
list(
store: ObjectStore,
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: bool = False
) -> ListStream[RecordBatch] | ListStream[List[ObjectMeta]]
List all the objects with the given prefix.
Prefixes are evaluated on a path segment basis, i.e. foo/bar/
is a prefix of
foo/bar/x
but not of foo/bar_baz/x
. List is recursive, i.e. foo/bar/more/x
will be included.
Examples:
Synchronously iterate through list results:
import obstore as obs
from obstore.store import MemoryStore
store = MemoryStore()
for i in range(100):
obs.put(store, f"file{i}.txt", b"foo")
stream = obs.list(store, chunk_size=10)
for list_result in stream:
print(list_result[0])
# {'path': 'file0.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 19, 28, 781723, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '0', 'version': None}
break
Asynchronously iterate through list results. Just change for
to async for
:
stream = obs.list(store, chunk_size=10)
async for list_result in stream:
print(list_result[2])
# {'path': 'file10.txt', 'last_modified': datetime.datetime(2024, 10, 23, 19, 21, 46, 224725, tzinfo=datetime.timezone.utc), 'size': 3, 'e_tag': '10', 'version': None}
break
Return large list results as Arrow. This is most useful
with large list operations. In this case you may want to increase the chunk_size
parameter.
stream = obs.list(store, chunk_size=1000, return_arrow=True)
# Stream is now an iterable/async iterable of `RecordBatch`es
for batch in stream:
print(batch.num_rows) # 100
# If desired, convert to a pyarrow RecordBatch (zero-copy) with
# `pyarrow.record_batch(batch)`
break
Collect all list results into a single Arrow RecordBatch
.
stream = obs.list(store, return_arrow=True)
batch = stream.collect()
Note
The order of returned ObjectMeta
is not
guaranteed
Note
There is no async version of this method, because list
is not async under the
hood, rather it only instantiates a stream, which can be polled in synchronous
or asynchronous fashion. See ListStream
.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
prefix
(str | None
, default:None
) –The prefix within ObjectStore to use for listing. Defaults to None.
Other Parameters:
-
offset
(str | None
) –If provided, list all the objects with the given prefix and a location greater than
offset
. Defaults toNone
. -
chunk_size
(int
) –The number of items to collect per chunk in the returned (async) iterator. All chunks except for the last one will have this many items. This is ignored in the
collect
andcollect_async
methods ofListStream
. -
return_arrow
(bool
) –If
True
, return each batch of list items as an ArrowRecordBatch
, not as a list of Pythondict
s. Arrow removes serialization overhead between Rust and Python and so this can be significantly faster for large list operations. Defaults toFalse
.If this is
True
, thearro3-core
Python package must be installed.
Returns:
-
ListStream[RecordBatch] | ListStream[List[ObjectMeta]]
–A ListStream, which you can iterate through to access list results.
obstore.list_with_delimiter ¶
list_with_delimiter(
store: ObjectStore, prefix: str | None = None
) -> ListResult
List objects with the given prefix and an implementation specific delimiter. Returns common prefixes (directories) in addition to object metadata.
Prefixes are evaluated on a path segment basis, i.e. foo/bar/
is a prefix of
foo/bar/x
but not of foo/bar_baz/x
. List is not recursive, i.e. foo/bar/more/x
will not be included.
Parameters:
-
store
(ObjectStore
) –The ObjectStore instance to use.
-
prefix
(str | None
, default:None
) –The prefix within ObjectStore to use for listing. Defaults to None.
Returns:
-
ListResult
–ListResult
obstore.list_with_delimiter_async
async
¶
list_with_delimiter_async(
store: ObjectStore, prefix: str | None = None
) -> ListResult
Call list_with_delimiter
asynchronously.
Refer to the documentation for list_with_delimiter.
obstore.ObjectMeta ¶
obstore.ListResult ¶
Bases: TypedDict
Result of a list call that includes objects, prefixes (directories) and a token for the next set of results. Individual result sets may be limited to 1,000 objects based on the underlying object storage's limitations.
obstore.ListStream ¶
Bases: Generic[ChunkType]
A stream of ObjectMeta that can be polled in a sync or async fashion.
collect ¶
collect() -> ChunkType
Collect all remaining ObjectMeta objects in the stream.
This ignores the chunk_size
parameter from the list
call and collects all
remaining data into a single chunk.
collect_async
async
¶
collect_async() -> ChunkType
Collect all remaining ObjectMeta objects in the stream.
This ignores the chunk_size
parameter from the list
call and collects all
remaining data into a single chunk.