Authentication¶
Authentication tends to be among the trickiest but most important elements of connecting to object storage. Obstore supports various native and custom authentication methods.
Native Authentication¶
"Native" authentication refers to authentication methods that are natively supported by the underlying Rust object_store
library.
Native authentication is most efficient, as obstore never needs to call into Python to update credentials.
Order of application¶
Native authentication is checked in order:
- Environment variables
- Passed in via
config
/keyword parameters.
So any parameters passed by the config
parameter or by keyword parameters will override any values found from environment variables.
Native Authentication Variants¶
Note that many authentication variants are already supported natively.
AWS¶
- Basic authentication, where an access key ID, secret access key, and optionally token are passed in via environment variables or configuration parameters.
- WebIdentity. This requires the
AWS_WEB_IDENTITY_TOKEN_FILE
andAWS_ROLE_ARN
environment variables to be set. Additionally,AWS_ROLE_SESSION_NAME
can be set to specify a session name. - Container credentials. Ensure you pass
container_credentials_relative_uri
to theS3Store
. - Instance credentials.
(A transcription of this underlying code).
Google Cloud Storage¶
- Service account credentials
- Application default credentials
- Instance credentials
(A transcription of this underlying code).
Azure¶
- Fabric OAuth2, using
fabric_token_service_url
,fabric_workload_host
,fabric_session_token
, andfabric_cluster_identifier
passed in by the user - Workload identity OAuth2, using a
client_id
,tenant_id
, andfederated_token_file
passed in by the user - OAuth2, using a
client_id
,client_secret
, andtenant_id
passed in by the user - A SAS key passed in by the user.
- Azure CLI. (When constructing
AzureStore
you must setuse_azure_cli
toTrue
, either by passinguse_azure_cli=True
or by setting theAZURE_USE_AZURE_CLI
environment variable to"TRUE"
). - IMDS Managed Identity Provider.
(A transcription of this underlying code).
Credential Providers¶
Credential providers are Python callbacks that allow for full control over credential generation. Passing in a credential provider will override any native credentials.
"Official" SDK credential providers¶
boto3¶
You can use the Boto3CredentialProvider
to use boto3.Session
to handle credentials.
from boto3 import Session
from obstore.auth.boto3 import Boto3CredentialProvider
from obstore.store import S3Store
session = Session(...)
credential_provider = Boto3CredentialProvider(session)
store = S3Store("bucket_name", credential_provider=credential_provider)
Refer to obstore.auth.boto3
.
google.auth¶
You can use the GoogleCredentialProvider
to use google.auth
to handle credentials.
from obstore.auth.google import GoogleCredentialProvider
from obstore.store import GCSStore
credential_provider = GoogleCredentialProvider(credentials=...)
store = GCSStore("bucket_name", credential_provider=credential_provider)
Refer to obstore.auth.google
.
azure.identity
¶
You can use the AzureCredentialProvider
to use azure.identity
to handle credentials.
import obstore as obs
from obstore.auth.azure import AzureCredentialProvider
from obstore.store import AzureStore
credential_provider = AzureAsyncCredentialProvider(credential=...)
store = AzureStore("container", credential_provider=credential_provider)
print(obs.list(store).collect())
Alternatively, you can use AzureAsyncCredentialProvider
with the async API:
import asyncio
import obstore as obs
from obstore.auth.azure import AzureCredentialProvider
from obstore.store import AzureStore
credential_provider = AzureAsyncCredentialProvider(credential=...)
store = AzureStore("container", credential_provider=credential_provider)
async def fetch_blobs():
blobs = await obs.list(store).collect_async()
print(blobs)
asyncio.run(fetch_blobs())
Refer to [`obstore.auth.azure`](api/auth/azure.md).
Other credential providers¶
NasaEarthdataCredentialProvider
: A credential provider for accessing NASA Earthdata to be used with S3Store.PlanetaryComputerCredentialProvider
: A credential provider for accessing Planetary Computer data resources to be used with AzureStore .
Microsoft Planetary Computer¶
The Microsoft Planetary Computer hosts a multi-petabyte catalog of global environmental data.
The contained data is publicly accessible, but requires the user to fetch short-lived access tokens. But accessing and refreshing these tokens every hour can be confusing and annoying.
The PlanetaryComputerCredentialProvider
handles all token access and refresh automatically.
As a quick example, we'll read data from the NAIP dataset:
from obstore.store import AzureStore
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
url = "https://naipeuwest.blob.core.windows.net/naip/v002/mt/2023/mt_060cm_2023/"
# Construct an AzureStore with this credential provider.
#
# The account, container, and container prefix are passed down to AzureStore
# automatically.
store = AzureStore(credential_provider=PlanetaryComputerCredentialProvider(url))
Then, for example, list some items in the container (the prefix v002/mt/2023/mt_060cm_2023
was automatically set as the prefix on the AzureStore
):
items = next(store.list())
print(items[:2])
[{'path': '44104/m_4410401_ne_13_060_20230811_20240103.200.jpg',
'last_modified': datetime.datetime(2025, 1, 13, 18, 18, 1, tzinfo=datetime.timezone.utc),
'size': 14459,
'e_tag': '0x8DD33FE9DB7A24D',
'version': None},
{'path': '44104/m_4410401_ne_13_060_20230811_20240103.tif',
'last_modified': datetime.datetime(2025, 1, 13, 16, 39, 6, tzinfo=datetime.timezone.utc),
'size': 400422790,
'e_tag': '0x8DD33F0CC1D1752',
'version': None}]
And we can fetch an image thumbnail:
path = "44106/m_4410602_nw_13_060_20230712_20240103.200.jpg"
image_content = store.get(path).bytes()
# Write out the image content to a file in the current directory
with open("thumbnail.jpg", "wb") as f:
f.write(image_content)
And voilĂ :
Custom Authentication¶
There's a long tail of possible authentication mechanisms. Obstore allows you to provide your own custom authentication callback.
You can provide either a synchronous or asynchronous callback for your custom authentication function.
- A custom AWS credential provider, passed in to
S3Store
must return anS3Credential
. - A custom GCS credential provider, passed in to
GCSStore
must return aGCSCredential
. - A custom Azure credential provider, passed in to
AzureStore
must return anAzureCredential
.
Warning
Asynchronous credential providers can be more performant but are only supported when using obstore's asynchronous APIs. (In particular, there must be an event loop running.)
Basic Example¶
The simplest custom credential provider can be just a synchronous or asynchronous function callback:
from datetime import datetime, timedelta, UTC
def get_credentials() -> S3Credential:
return {
"access_key_id": "...",
"secret_access_key": "...",
# Not always required
"token": "...",
"expires_at": datetime.now(UTC) + timedelta(minutes=30),
}
Then just pass that function into credential_provider
:
S3Store(..., credential_provider=get_credentials)
Advanced Example¶
More advanced credential providers can be class based. A class can act as a
callable, just like a function callback, as long as it implements a __call__
method.
Below is an example custom credential provider for accessing NASA Earthdata.
NASA Earthdata supports public in-region direct S3 access. This credential provider automatically manages refreshing the S3 credentials before they expire.
Note that you must be in the same AWS region (us-west-2
) to use this provider.
from __future__ import annotations
from datetime import datetime
from typing import TYPE_CHECKING
import requests
if TYPE_CHECKING:
from obstore.store import S3Credential
CREDENTIALS_API = "https://archive.podaac.earthdata.nasa.gov/s3credentials"
class NasaEarthdataCredentialProvider:
"""A credential provider for accessing [NASA Earthdata].
NASA Earthdata supports public [in-region direct S3
access](https://archive.podaac.earthdata.nasa.gov/s3credentialsREADME). This
credential provider automatically manages the S3 credentials.
!!! note
Note that you must be in the same AWS region (`us-west-2`) to use the
credentials returned from this provider.
[NASA Earthdata]: https://www.earthdata.nasa.gov/
"""
def __init__(
self,
username: str,
password: str,
) -> None:
"""Create a new NasaEarthdataCredentialProvider.
Args:
username: Username to NASA Earthdata.
password: Password to NASA Earthdata.
"""
self.session = requests.Session()
self.session.auth = (username, password)
def __call__(self) -> S3Credential:
"""Request updated credentials."""
resp = self.session.get(CREDENTIALS_API, allow_redirects=True, timeout=15)
auth_resp = self.session.get(resp.url, allow_redirects=True, timeout=15)
creds = auth_resp.json()
return {
"access_key_id": creds["accessKeyId"],
"secret_access_key": creds["secretAccessKey"],
"token": creds["sessionToken"],
"expires_at": datetime.fromisoformat(creds["expiration"]),
}
Or asynchronously:
from __future__ import annotations
import json
from datetime import datetime
from typing import TYPE_CHECKING
from aiohttp import BasicAuth, ClientSession
if TYPE_CHECKING:
from obstore.store import S3Credential
CREDENTIALS_API = "https://archive.podaac.earthdata.nasa.gov/s3credentials"
class NasaEarthdataAsyncCredentialProvider:
"""A credential provider for accessing [NASA Earthdata].
NASA Earthdata supports public [in-region direct S3
access](https://archive.podaac.earthdata.nasa.gov/s3credentialsREADME). This
credential provider automatically manages the S3 credentials.
!!! note
Note that you must be in the same AWS region (`us-west-2`) to use the
credentials returned from this provider.
[NASA Earthdata]: https://www.earthdata.nasa.gov/
"""
def __init__(
self,
username: str,
password: str,
) -> None:
"""Create a new NasaEarthdataAsyncCredentialProvider.
Args:
username: Username to NASA Earthdata.
password: Password to NASA Earthdata.
"""
self.session = ClientSession(auth=BasicAuth(username, password))
async def __call__(self) -> S3Credential:
"""Request updated credentials."""
async with self.session.get(CREDENTIALS_API, allow_redirects=True) as resp:
auth_url = resp.url
async with self.session.get(auth_url, allow_redirects=True) as auth_resp:
# Note: We parse the JSON manually instead of using `resp.json()` because
# the response mimetype is incorrectly set to text/html.
creds = json.loads(await auth_resp.text())
return {
"access_key_id": creds["accessKeyId"],
"secret_access_key": creds["secretAccessKey"],
"token": creds["sessionToken"],
"expires_at": datetime.fromisoformat(creds["expiration"]),
}
async def close(self):
"""Close the underlying session.
You should call this method after you've finished all obstore calls.
"""
await self.session.close()
Then call it with
credential_provider = NasaEarthdataCredentialProvider(username="...", password="...")
store = S3Store("bucket_name", credential_provider=credential_provider)