Google Cloud Storage¶
obstore.store.GCSStore ¶
Interface to Google Cloud Storage.
All constructors will check for environment variables. Refer to
GCSConfig
for valid environment variables.
If no credentials are explicitly provided, they will be sourced from the environment as documented here.
credential_provider
property
¶
credential_provider: GCSCredentialProvider | None
Get the store's credential provider.
prefix
property
¶
prefix: str | None
Get the prefix applied to all operations in this store, if any.
__init__ ¶
__init__(
bucket: str | None = None,
*,
prefix: str | None = None,
config: GCSConfig | None = None,
client_options: ClientConfig | None = None,
retry_config: RetryConfig | None = None,
credential_provider: GCSCredentialProvider | None = None,
**kwargs: Unpack[GCSConfig],
) -> None
Construct a new GCSStore.
Parameters:
-
bucket
(str | None
, default:None
) –The GCS bucket to use.
Other Parameters:
-
prefix
(str | None
) –A prefix within the bucket to use for all operations.
-
config
(GCSConfig | None
) –GCS Configuration. Values in this config will override values inferred from the environment. Defaults to None.
-
client_options
(ClientConfig | None
) –HTTP Client options. Defaults to None.
-
retry_config
(RetryConfig | None
) –Retry configuration. Defaults to None.
-
credential_provider
(GCSCredentialProvider | None
) –A callback to provide custom Google credentials.
-
kwargs
(Unpack[GCSConfig]
) –GCS configuration values. Supports the same values as
config
, but as named keyword args.
Returns:
-
None
–GCSStore
copy ¶
Copy an object from one path to another in the same object store.
Refer to the documentation for copy.
from_url
classmethod
¶
from_url(
url: str,
*,
prefix: str | None = None,
config: GCSConfig | None = None,
client_options: ClientConfig | None = None,
retry_config: RetryConfig | None = None,
credential_provider: GCSCredentialProvider | None = None,
**kwargs: Unpack[GCSConfig],
) -> Self
Construct a new GCSStore with values populated from a well-known storage URL.
The supported url schemes are:
gs://<bucket>/<path>
Parameters:
-
url
(str
) –well-known storage URL.
Other Parameters:
-
prefix
(str | None
) –A prefix within the bucket to use for all operations.
-
config
(GCSConfig | None
) –GCS Configuration. Values in this config will override values inferred from the url. Defaults to None.
-
client_options
(ClientConfig | None
) –HTTP Client options. Defaults to None.
-
retry_config
(RetryConfig | None
) –Retry configuration. Defaults to None.
-
credential_provider
(GCSCredentialProvider | None
) –A callback to provide custom Google credentials.
-
kwargs
(Unpack[GCSConfig]
) –GCS configuration values. Supports the same values as
config
, but as named keyword args.
Returns:
-
Self
–GCSStore
get ¶
get(path: str, *, options: GetOptions | None = None) -> GetResult
Return the bytes that are stored at the specified location.
Refer to the documentation for get.
get_async
async
¶
get_async(path: str, *, options: GetOptions | None = None) -> GetResult
Call get
asynchronously.
Refer to the documentation for get.
get_range ¶
Return the bytes stored at the specified location in the given byte range.
Refer to the documentation for get_range.
get_range_async
async
¶
get_range_async(
path: str, *, start: int, end: int | None = None, length: int | None = None
) -> Bytes
Call get_range
asynchronously.
Refer to the documentation for get_range.
get_ranges ¶
get_ranges(
path: str,
*,
starts: Sequence[int],
ends: Sequence[int] | None = None,
lengths: Sequence[int] | None = None,
) -> list[Bytes]
Return the bytes stored at the specified location in the given byte ranges.
Refer to the documentation for get_ranges.
get_ranges_async
async
¶
get_ranges_async(
path: str,
*,
starts: Sequence[int],
ends: Sequence[int] | None = None,
lengths: Sequence[int] | None = None,
) -> list[Bytes]
Call get_ranges
asynchronously.
Refer to the documentation for get_ranges.
head ¶
head(path: str) -> ObjectMeta
Return the metadata for the specified location.
Refer to the documentation for head.
head_async
async
¶
head_async(path: str) -> ObjectMeta
Call head
asynchronously.
Refer to the documentation for head_async.
list ¶
list(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[True],
) -> ListStream[RecordBatch]
list(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: Literal[False] = False,
) -> ListStream[list[ObjectMeta]]
list(
prefix: str | None = None,
*,
offset: str | None = None,
chunk_size: int = 50,
return_arrow: bool = False,
) -> ListStream[RecordBatch] | ListStream[list[ObjectMeta]]
List all the objects with the given prefix.
Refer to the documentation for list.
list_with_delimiter ¶
list_with_delimiter(
prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[Table]
list_with_delimiter(
prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter(
prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[Table] | ListResult[list[ObjectMeta]]
List objects with the given prefix and an implementation specific delimiter.
Refer to the documentation for list_with_delimiter.
list_with_delimiter_async
async
¶
list_with_delimiter_async(
prefix: str | None = None, *, return_arrow: Literal[True]
) -> ListResult[Table]
list_with_delimiter_async(
prefix: str | None = None, *, return_arrow: Literal[False] = False
) -> ListResult[list[ObjectMeta]]
list_with_delimiter_async(
prefix: str | None = None, *, return_arrow: bool = False
) -> ListResult[Table] | ListResult[list[ObjectMeta]]
Call list_with_delimiter
asynchronously.
Refer to the documentation for list_with_delimiter.
put ¶
put(
path: str,
file: IO[bytes]
| Path
| bytes
| Buffer
| Iterator[Buffer]
| Iterable[Buffer],
*,
attributes: Attributes | None = None,
tags: dict[str, str] | None = None,
mode: PutMode | None = None,
use_multipart: bool | None = None,
chunk_size: int = 5 * 1024 * 1024,
max_concurrency: int = 12,
) -> PutResult
Save the provided bytes to the specified location.
Refer to the documentation for put.
put_async
async
¶
put_async(
path: str,
file: IO[bytes]
| Path
| bytes
| Buffer
| AsyncIterator[Buffer]
| AsyncIterable[Buffer]
| Iterator[Buffer]
| Iterable[Buffer],
*,
attributes: Attributes | None = None,
tags: dict[str, str] | None = None,
mode: PutMode | None = None,
use_multipart: bool | None = None,
chunk_size: int = 5 * 1024 * 1024,
max_concurrency: int = 12,
) -> PutResult
Call put
asynchronously.
Refer to the documentation for put
. In addition to what the
synchronous put
allows for the file
parameter, this also supports an async
iterator or iterable of objects implementing the Python buffer protocol.
This means, for example, you can pass the result of get_async
directly to
put_async
, and the request will be streamed through Python during the put
operation:
import obstore as obs
# This only constructs the stream, it doesn't materialize the data in memory
resp = await obs.get_async(store1, path1)
# A streaming upload is created to copy the file to path2
await obs.put_async(store2, path2)
obstore.store.GCSConfig ¶
Bases: TypedDict
Configuration parameters for GCSStore.
Not importable at runtime
To use this type hint in your code, import it within a TYPE_CHECKING
block:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from obstore.store import GCSConfig
application_credentials
instance-attribute
¶
application_credentials: str
Application credentials path.
See cloud.google.com/docs/authentication/provide-credentials-adc.
Environment variable: GOOGLE_APPLICATION_CREDENTIALS
.
bucket
instance-attribute
¶
bucket: str
Bucket name. (required)
Environment variables:
GOOGLE_BUCKET
GOOGLE_BUCKET_NAME
service_account
instance-attribute
¶
service_account: str
Path to the service account file.
This or service_account_key
must be set.
Example value "/tmp/gcs.json"
. Example contents of gcs.json
:
{
"gcs_base_url": "https://localhost:4443",
"disable_oauth": true,
"client_email": "",
"private_key": ""
}
Environment variables:
GOOGLE_SERVICE_ACCOUNT
GOOGLE_SERVICE_ACCOUNT_PATH
obstore.store.GCSCredential ¶
Bases: TypedDict
A Google Cloud Storage Credential.
Not importable at runtime
To use this type hint in your code, import it within a TYPE_CHECKING
block:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from obstore.store import GCSCredential
obstore.store.GCSCredentialProvider ¶
Bases: Protocol
A type hint for a synchronous or asynchronous callback to provide custom Google Cloud Storage credentials.
This should be passed into the credential_provider
parameter of GCSStore
.
Not importable at runtime
To use this type hint in your code, import it within a TYPE_CHECKING
block:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from obstore.store import GCSCredentialProvider
__call__
staticmethod
¶
__call__() -> GCSCredential | Coroutine[Any, Any, GCSCredential]
Return a GCSCredential
.