Skip to content

Google Cloud Storage

obstore.store.GCSStore

Interface to Google Cloud Storage.

All constructors will check for environment variables. Refer to GCSConfig for valid environment variables.

If no credentials are explicitly provided, they will be sourced from the environment as documented here.

client_options property

client_options: ClientConfig | None

Get the store's client configuration.

config property

config: GCSConfig

Get the underlying GCS config parameters.

credential_provider property

credential_provider: GCSCredentialProvider | None

Get the store's credential provider.

prefix property

prefix: str | None

Get the prefix applied to all operations in this store, if any.

retry_config property

retry_config: RetryConfig | None

Get the store's retry configuration.

__init__

__init__(
    bucket: str | None = None,
    *,
    prefix: str | None = None,
    config: GCSConfig | None = None,
    client_options: ClientConfig | None = None,
    retry_config: RetryConfig | None = None,
    credential_provider: GCSCredentialProvider | None = None,
    **kwargs: Unpack[GCSConfig],
) -> None

Construct a new GCSStore.

Parameters:

  • bucket (str | None, default: None ) –

    The GCS bucket to use.

Other Parameters:

  • prefix (str | None) –

    A prefix within the bucket to use for all operations.

  • config (GCSConfig | None) –

    GCS Configuration. Values in this config will override values inferred from the environment. Defaults to None.

  • client_options (ClientConfig | None) –

    HTTP Client options. Defaults to None.

  • retry_config (RetryConfig | None) –

    Retry configuration. Defaults to None.

  • credential_provider (GCSCredentialProvider | None) –

    A callback to provide custom Google credentials.

  • kwargs (Unpack[GCSConfig]) –

    GCS configuration values. Supports the same values as config, but as named keyword args.

Returns:

  • None

    GCSStore

copy

copy(from_: str, to: str, *, overwrite: bool = True) -> None

Copy an object from one path to another in the same object store.

Refer to the documentation for copy.

copy_async async

copy_async(from_: str, to: str, *, overwrite: bool = True) -> None

Call copy asynchronously.

Refer to the documentation for copy.

delete

delete(paths: str | Sequence[str]) -> None

Delete the object at the specified location(s).

Refer to the documentation for delete.

delete_async async

delete_async(paths: str | Sequence[str]) -> None

Call delete asynchronously.

Refer to the documentation for delete.

from_url classmethod

from_url(
    url: str,
    *,
    prefix: str | None = None,
    config: GCSConfig | None = None,
    client_options: ClientConfig | None = None,
    retry_config: RetryConfig | None = None,
    credential_provider: GCSCredentialProvider | None = None,
    **kwargs: Unpack[GCSConfig],
) -> Self

Construct a new GCSStore with values populated from a well-known storage URL.

The supported url schemes are:

  • gs://<bucket>/<path>

Parameters:

  • url (str) –

    well-known storage URL.

Other Parameters:

  • prefix (str | None) –

    A prefix within the bucket to use for all operations.

  • config (GCSConfig | None) –

    GCS Configuration. Values in this config will override values inferred from the url. Defaults to None.

  • client_options (ClientConfig | None) –

    HTTP Client options. Defaults to None.

  • retry_config (RetryConfig | None) –

    Retry configuration. Defaults to None.

  • credential_provider (GCSCredentialProvider | None) –

    A callback to provide custom Google credentials.

  • kwargs (Unpack[GCSConfig]) –

    GCS configuration values. Supports the same values as config, but as named keyword args.

Returns:

  • Self

    GCSStore

get

get(path: str, *, options: GetOptions | None = None) -> GetResult

Return the bytes that are stored at the specified location.

Refer to the documentation for get.

get_async async

get_async(path: str, *, options: GetOptions | None = None) -> GetResult

Call get asynchronously.

Refer to the documentation for get.

get_range

get_range(
    path: str, *, start: int, end: int | None = None, length: int | None = None
) -> Bytes

Return the bytes stored at the specified location in the given byte range.

Refer to the documentation for get_range.

get_range_async async

get_range_async(
    path: str, *, start: int, end: int | None = None, length: int | None = None
) -> Bytes

Call get_range asynchronously.

Refer to the documentation for get_range.

get_ranges

get_ranges(
    path: str,
    *,
    starts: Sequence[int],
    ends: Sequence[int] | None = None,
    lengths: Sequence[int] | None = None,
) -> list[Bytes]

Return the bytes stored at the specified location in the given byte ranges.

Refer to the documentation for get_ranges.

get_ranges_async async

get_ranges_async(
    path: str,
    *,
    starts: Sequence[int],
    ends: Sequence[int] | None = None,
    lengths: Sequence[int] | None = None,
) -> list[Bytes]

Call get_ranges asynchronously.

Refer to the documentation for get_ranges.

head

head(path: str) -> ObjectMeta

Return the metadata for the specified location.

Refer to the documentation for head.

head_async async

head_async(path: str) -> ObjectMeta

Call head asynchronously.

Refer to the documentation for head_async.

list

list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[True],
) -> ListStream[RecordBatch]
list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: Literal[False] = False,
) -> ListStream[list[ObjectMeta]]
list(
    prefix: str | None = None,
    *,
    offset: str | None = None,
    chunk_size: int = 50,
    return_arrow: bool = False,
) -> ListStream[RecordBatch] | ListStream[list[ObjectMeta]]

List all the objects with the given prefix.

Refer to the documentation for list.

list_with_delimiter

list_with_delimiter(
    prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[Table]
list_with_delimiter(
    prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter(
    prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[Table] | ListResult[list[ObjectMeta]]

List objects with the given prefix and an implementation specific delimiter.

Refer to the documentation for list_with_delimiter.

list_with_delimiter_async async

list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[Table]
list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter_async(
    prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[Table] | ListResult[list[ObjectMeta]]

Call list_with_delimiter asynchronously.

Refer to the documentation for list_with_delimiter.

put

put(
    path: str,
    file: IO[bytes]
    | Path
    | bytes
    | Buffer
    | Iterator[Buffer]
    | Iterable[Buffer],
    *,
    attributes: Attributes | None = None,
    tags: dict[str, str] | None = None,
    mode: PutMode | None = None,
    use_multipart: bool | None = None,
    chunk_size: int = 5 * 1024 * 1024,
    max_concurrency: int = 12,
) -> PutResult

Save the provided bytes to the specified location.

Refer to the documentation for put.

put_async async

put_async(
    path: str,
    file: IO[bytes]
    | Path
    | bytes
    | Buffer
    | AsyncIterator[Buffer]
    | AsyncIterable[Buffer]
    | Iterator[Buffer]
    | Iterable[Buffer],
    *,
    attributes: Attributes | None = None,
    tags: dict[str, str] | None = None,
    mode: PutMode | None = None,
    use_multipart: bool | None = None,
    chunk_size: int = 5 * 1024 * 1024,
    max_concurrency: int = 12,
) -> PutResult

Call put asynchronously.

Refer to the documentation for put. In addition to what the synchronous put allows for the file parameter, this also supports an async iterator or iterable of objects implementing the Python buffer protocol.

This means, for example, you can pass the result of get_async directly to put_async, and the request will be streamed through Python during the put operation:

import obstore as obs

# This only constructs the stream, it doesn't materialize the data in memory
resp = await obs.get_async(store1, path1)
# A streaming upload is created to copy the file to path2
await obs.put_async(store2, path2)

rename

rename(from_: str, to: str, *, overwrite: bool = True) -> None

Move an object from one path to another in the same object store.

Refer to the documentation for rename.

rename_async async

rename_async(from_: str, to: str, *, overwrite: bool = True) -> None

Call rename asynchronously.

Refer to the documentation for rename.

obstore.store.GCSConfig

Bases: TypedDict

Configuration parameters for GCSStore.

Not importable at runtime

To use this type hint in your code, import it within a TYPE_CHECKING block:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from obstore.store import GCSConfig

application_credentials instance-attribute

application_credentials: str

Application credentials path.

See cloud.google.com/docs/authentication/provide-credentials-adc.

Environment variable: GOOGLE_APPLICATION_CREDENTIALS.

bucket instance-attribute

bucket: str

Bucket name. (required)

Environment variables:

  • GOOGLE_BUCKET
  • GOOGLE_BUCKET_NAME

service_account instance-attribute

service_account: str

Path to the service account file.

This or service_account_key must be set.

Example value "/tmp/gcs.json". Example contents of gcs.json:

{
   "gcs_base_url": "https://localhost:4443",
   "disable_oauth": true,
   "client_email": "",
   "private_key": ""
}

Environment variables:

  • GOOGLE_SERVICE_ACCOUNT
  • GOOGLE_SERVICE_ACCOUNT_PATH

service_account_key instance-attribute

service_account_key: str

The serialized service account key.

The service account must be in the JSON format. This or with_service_account_path must be set.

Environment variable: GOOGLE_SERVICE_ACCOUNT_KEY.

obstore.store.GCSCredential

Bases: TypedDict

A Google Cloud Storage Credential.

Not importable at runtime

To use this type hint in your code, import it within a TYPE_CHECKING block:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from obstore.store import GCSCredential

expires_at instance-attribute

expires_at: datetime | None

Expiry datetime of credential. The datetime should have time zone set.

If None, the credential will never expire.

token instance-attribute

token: str

An HTTP bearer token.

obstore.store.GCSCredentialProvider

Bases: Protocol

A type hint for a synchronous or asynchronous callback to provide custom Google Cloud Storage credentials.

This should be passed into the credential_provider parameter of GCSStore.

Not importable at runtime

To use this type hint in your code, import it within a TYPE_CHECKING block:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from obstore.store import GCSCredentialProvider

__call__ staticmethod

Return a GCSCredential.