HLS Example (us-west-2 only)¶
The Harmonized Landsat Sentinel 2 dataset contains grid-aligned cloud-optimized geotiffs that are great for deep time series analysis because it contains the combined archives from Landsat 8/9 and Sentinel 2A/B/C. NASA MAAP maintains a stac-geoparquet archive of all of the HLS STAC records. This archive solves a common bottleneck that users have encountered when trying to work with HLS STAC records at scale by eliminating the need to query the CMR STAC API!
This notebook demonstrates how to use lazycogs to interact with the HLS STAC Geoparquet Archive in the optimal way.
Note: This will only work in the AWS us-west-2 region! The tutorial relies on direct bucket access to the lp-prod-protected S3 bucket which is accessible if you have NASA Earthdata credentials in your environment, but you can only do direct S3 access if your code is running in us-west-2.
from urllib.parse import urlparse
import boto3
from async_geotiff import GeoTIFF
from obstore.auth.earthdata import NasaEarthdataCredentialProvider
from obstore.store import S3Store
from pyproj import Transformer
from rustac import DuckdbClient
import lazycogs
HLS_STAC_GEOPARQUET_HREF = "s3://nasa-maap-data-store/file-staging/nasa-map/hls-stac-geoparquet-archive/v2/{collection}/**/*.parquet"
Configure the DuckdbClient for rustac¶
Note: The HLS STAC Geoparquet Archive is publicly available but using the wildcard search across all of the hive-partitioned parquet files with duckdb requires ListBucket privileges that are not provided to the public. You can anonymously search a single stac-geoparquet file using this href format:
s3://nasa-maap-data-store/file-staging/nasa-map/hls-stac-geoparquet-archive/v2/{collection}/year={year}/month={month}/{collection}-{year}-{month}.parquet
where collection is either HLSS30_2.0 or HLSL30_2.0 and month is an integer 1-12.
You could also copy the parquet files to your own location to take advantage of the hive-partitioned query capabilities.
duckdb_client = DuckdbClient(use_hive_partitioning=True)
aws_session = boto3.Session()
creds = aws_session.get_credentials().get_frozen_credentials()
_ = duckdb_client.execute(
f"""
CREATE OR REPLACE SECRET secret (
TYPE S3,
REGION '{aws_session.region_name}',
KEY_ID '{creds.access_key}',
SECRET '{creds.secret_key}',
SESSION_TOKEN '{creds.token}'
);
""",
)
Define the array extent¶
The HLS granules are projected in UTM coordinate reference systems. To access the underlying data in an undistorted efficient manner the best way is to build an array for a single UTM coordinate system that is aligned with the grid of the COGs themselves.
We can define an AOI for an entire UTM zone like this:
epsg_code = "32615"
utm_zone = 15
dst_crs = f"epsg:{epsg_code}"
east_edge = (utm_zone * 6) - 180
west_edge = east_edge - 6
bbox_4326 = (west_edge, 0, east_edge, 84)
# transform to epsg:4326 for STAC search
transformer = Transformer.from_crs("epsg:4326", dst_crs, always_xy=True)
dst_bbox = transformer.transform_bounds(*bbox_4326)
print(bbox_4326)
print(dst_bbox)
(-96, 0, -90, 84) (166021.44308053772, 0.0, 833978.5569194595, 9329005.182447437)
Retrieve a single STAC item so we can snap our approximate bounding box to the HLS granule grid.
sample_item = duckdb_client.search(
HLS_STAC_GEOPARQUET_HREF.format(collection="HLSS30_2.0"),
bbox=bbox_4326,
filter={"op": "=", "args": [{"property": "proj:epsg"}, epsg_code]},
max_items=1,
)[0]
sample_item
{'type': 'Feature',
'stac_version': '1.1.0',
'id': 'HLS.S30.T15MXV.2015333T161622.v2.0',
'geometry': {'type': 'MultiPolygon',
'coordinates': [[[[-92.101362, 0.0],
[-91.114923, 0.0],
[-91.114642, -0.992853],
[-92.101228, -0.993271],
[-92.101362, 0.0]]]]},
'stac_extensions': ['https://stac-extensions.github.io/eo/v1.0.0/schema.json',
'https://stac-extensions.github.io/projection/v1.0.0/schema.json',
'https://stac-extensions.github.io/view/v1.0.0/schema.json',
'https://stac-extensions.github.io/scientific/v1.0.0/schema.json'],
'bbox': [-92.101362, -0.993271, -91.114642, 0.0],
'links': [{'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0_stac.json',
'rel': 'self',
'type': 'application/json'},
{'href': 'https://doi.org/10.5067/HLS/HLSS30.002', 'rel': 'cite-as'}],
'assets': {'B01': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B01.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B01',
'common_name': 'coastal',
'center_wavelength': 0.4439,
'full_width_half_max': 0.027}]},
'B02': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B02.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B02',
'common_name': 'blue',
'center_wavelength': 0.4966,
'full_width_half_max': 0.098}]},
'B03': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B03.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B03',
'common_name': 'green',
'center_wavelength': 0.56,
'full_width_half_max': 0.045}]},
'B04': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B04.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B04',
'common_name': 'red',
'center_wavelength': 0.6645,
'full_width_half_max': 0.038}]},
'B05': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B05.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B05',
'center_wavelength': 0.7039,
'full_width_half_max': 0.019}]},
'B06': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B06.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B06',
'center_wavelength': 0.7402,
'full_width_half_max': 0.018}]},
'B07': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B07.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B07',
'center_wavelength': 0.7825,
'full_width_half_max': 0.028}]},
'B08': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B08.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B08',
'common_name': 'nir',
'center_wavelength': 0.8351,
'full_width_half_max': 0.145}]},
'B8A': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B8A.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B8A',
'center_wavelength': 0.8648,
'full_width_half_max': 0.033}]},
'B09': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B09.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B09',
'center_wavelength': 0.945,
'full_width_half_max': 0.026}]},
'B10': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B10.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B10',
'common_name': 'cirrus',
'center_wavelength': 1.3735,
'full_width_half_max': 0.075}]},
'B11': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B11.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B11',
'common_name': 'swir16',
'center_wavelength': 1.6137,
'full_width_half_max': 0.143}]},
'B12': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B12.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'B12',
'common_name': 'swir22',
'center_wavelength': 2.22024,
'full_width_half_max': 0.242}]},
'Fmask': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.Fmask.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'Fmask'}]},
'SZA': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.SZA.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'SZA'}]},
'SAA': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.SAA.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'SAA'}]},
'VZA': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.VZA.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'VZA'}]},
'VAA': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.VAA.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'eo:bands': [{'name': 'VAA'}]},
'thumbnail': {'href': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.jpg',
'type': 'image/jpeg',
'roles': ['thumbnail']}},
'collection': 'HLSS30_2.0',
'properties': {'sci:doi': '10.5067/HLS/HLSS30.002',
'view:azimuth': 100.40816448,
'datetime': '2015-11-29T16:26:44.990+00:00',
'start_datetime': '2015-11-29T16:26:44.990+00:00',
'end_datetime': '2015-11-29T16:26:44.990+00:00',
'platform': 'sentinel-2a',
'instruments': ['msi'],
'eo:cloud_cover': 99.0,
'proj:transform': [30.0, 0.0, 600000.0, 0.0, -30.0, 0.0, 0.0, 0.0, 1.0],
'proj:shape': [3660, 3660],
'proj:epsg': 32615,
'view:sun_azimuth': 138.09596371,
'month': 11,
'year': 2015}}
Use the proj:transform property to align our rough bounding box with the HLS grid
dst_bbox_aligned = lazycogs.align_bbox(
sample_item["properties"]["proj:transform"],
dst_bbox,
)
print(f"aligned bbox: {dst_bbox_aligned}")
aligned bbox: (166020.0, 0.0, 834000.0, 9329010.0)
Configure direct access to the HLS COGs in S3¶
The STAC assets in the stac-geoparquet file contain https hrefs, but the HLS COGs are accessible directly from S3 if you are operating in the us-west-2 AWS region. lazycogs has a mechanism for modifying the asset hrefs with a callable function. We will provide a function that can be used to convert the https href in to an S3 key relative to the bucket root.
This will work for you if you have your NASA Earthdata credentials stored in a ~/.netrc file or via the EARTHDATA_USERNAME and EARTHDATA_PASSWORD environment variables.
cp = NasaEarthdataCredentialProvider(
credentials_url="https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials",
)
store = S3Store(
bucket="lp-prod-protected",
credential_provider=cp,
)
def strip_bucket(href: str) -> str:
# href: https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/path/to/file.tif
# store is rooted at the bucket, so the path is just path/to/file.tif
return urlparse(href).path.lstrip("/").removeprefix("lp-prod-protected/")
strip_bucket(sample_item["assets"]["B04"]["href"])
'HLSS30.020/HLS.S30.T15MXV.2015333T161622.v2.0/HLS.S30.T15MXV.2015333T161622.v2.0.B04.tif'
Inspect a sample COG¶
We can use async-geotiff to inspect a sample COG and its overviews:
geotiff = await GeoTIFF.open(
strip_bucket(sample_item["assets"]["B04"]["href"]),
store=store,
)
resolutions = [geotiff.res[0]]
print("full resolution:", geotiff.res)
for i, overview in enumerate(geotiff.overviews, start=1):
resolutions.append(overview.res[0])
print("overview", i, overview.res)
print("nodata value:", geotiff.nodata)
print("data type:", geotiff.dtype)
full resolution: (30.0, 30.0) overview 1 (60.0, 60.0) overview 2 (120.0, 120.0) overview 3 (239.73799126637556, 239.73799126637556) overview 4 (479.4759825327511, 479.4759825327511) nodata value: -9999.0 data type: int16
Open a massive array¶
Now open an array for the full resolution data for the year 2025. To load a lower-resolution array you can select a different value from the resolutions list and lazycogs will use the COGs' overviews instead of the full-resolution data. If you pick something in between one of the listed resolutions lazycogs will have to warp the arrays and do a nearest-neighbor interpolation.
stac_search_args = {
"href": HLS_STAC_GEOPARQUET_HREF.format(collection="HLSS30_2.0"),
"datetime": "2025-01-01T00:00:00Z/2025-12-31T23:59:59Z",
"filter": {"op": "=", "args": [{"property": "proj:epsg"}, epsg_code]},
"duckdb_client": duckdb_client,
}
da = lazycogs.open(
crs=dst_crs,
bbox=dst_bbox_aligned,
resolution=30,
time_period="P1D", # one time coordinate per day
bands=["B04", "B03", "B02"],
dtype=geotiff.dtype,
nodata=geotiff.nodata,
store=store,
path_from_href=strip_bucket,
**stac_search_args,
)
da
<xarray.DataArray (band: 3, time: 365, y: 310967, x: 22266)> Size: 15TB
[7581770388090 values with dtype=int16]
Coordinates:
* band (band) <U3 36B 'B04' 'B03' 'B02'
* time (time) datetime64[s] 3kB 2025-01-01 2025-01-02 ... 2025-12-31
* y (y) float64 2MB 15.0 45.0 75.0 ... 9.329e+06 9.329e+06 9.329e+06
* x (x) float64 178kB 1.66e+05 1.661e+05 ... 8.34e+05 8.34e+05
Attributes:
_stac_backend: MultiBandStacBackendArray(bands=['B04', 'B03', 'B02']...
_stac_time_coords: 2025-01-01 … 2025-12-31 (n=365)Extract all values for September 2025 for a single point.
%%time
pt = (
da.sel(
x=576465,
y=5196159,
method="nearest",
)
.sel(time=slice("2025-09-01", "2025-09-30"))
.load()
)
pt
CPU times: user 54.2 s, sys: 1.42 s, total: 55.7 s Wall time: 20.1 s
<xarray.DataArray (band: 3, time: 30)> Size: 180B
array([[-9999, 8349, -9999, -9999, 7260, -9999, 269, -9999, -9999,
361, -9999, -9999, -9999, 387, 619, -9999, 492, -9999,
-9999, -9999, -9999, 6863, -9999, -9999, 389, -9999, 408,
-9999, -9999, -9999],
[-9999, 8770, -9999, -9999, 7539, -9999, 353, -9999, -9999,
554, -9999, -9999, -9999, 557, 670, -9999, 506, -9999,
-9999, -9999, -9999, 6760, -9999, -9999, 507, -9999, 515,
-9999, -9999, -9999],
[-9999, 9235, -9999, -9999, 7686, -9999, 120, -9999, -9999,
240, -9999, -9999, -9999, 218, 281, -9999, 186, -9999,
-9999, -9999, -9999, 6760, -9999, -9999, 244, -9999, 224,
-9999, -9999, -9999]], dtype=int16)
Coordinates:
* band (band) <U3 36B 'B04' 'B03' 'B02'
* time (time) datetime64[s] 240B 2025-09-01 2025-09-02 ... 2025-09-30
y float64 8B 5.196e+06
x float64 8B 5.765e+05
Attributes:
_stac_backend: MultiBandStacBackendArray(bands=['B04', 'B03', 'B02']...
_stac_time_coords: 2025-01-01 … 2025-12-31 (n=365)Operate on a subset in space and time¶
Select a spatial subset for the days with non-nodata values in the point extraction.
subset = da.sel(
x=slice(550_000, 700_000),
y=slice(5_150_000, 5_250_000),
time=pt.time.values[pt.isel(band=0).values != geotiff.nodata],
)
subset
<xarray.DataArray (band: 3, time: 10, y: 3333, x: 5000)> Size: 1000MB
[499950000 values with dtype=int16]
Coordinates:
* band (band) <U3 36B 'B04' 'B03' 'B02'
* time (time) datetime64[s] 80B 2025-09-02 2025-09-05 ... 2025-09-27
* y (y) float64 27kB 5.15e+06 5.15e+06 5.15e+06 ... 5.25e+06 5.25e+06
* x (x) float64 40kB 5.5e+05 5.5e+05 5.501e+05 ... 6.999e+05 7e+05
Attributes:
_stac_backend: MultiBandStacBackendArray(bands=['B04', 'B03', 'B02']...
_stac_time_coords: 2025-01-01 … 2025-12-31 (n=365)%%time
_ = subset.load()
CPU times: user 1min 53s, sys: 15.8 s, total: 2min 8s Wall time: 35.1 s
(
subset.plot.imshow(
rgb="band",
col="time",
col_wrap=2,
vmin=0,
vmax=2000,
size=4,
aspect=subset.shape[3] / subset.shape[2],
)
)
<xarray.plot.facetgrid.FacetGrid at 0x7ff33c4ca2d0>