Inspecting read plans¶
Before computing an array, you can ask lazycogs exactly what DuckDB queries and COG reads would fire — without touching any pixel data. This is useful for:
- Diagnosing sparse time series (chunks with no matching items)
- Understanding scene overlap in your area of interest
- Choosing chunk sizes
- Verifying that the right overview level will be read for your resolution
The da.lazycogs.explain() method runs the same spatial DuckDB queries as .compute() but stops before any pixel I/O.
from pathlib import Path
import rustac
from pyproj import Transformer
import lazycogs
dst_crs = "EPSG:5070"
dst_bbox = (-400_000, 2_500_000, -200_000, 2_700_000)
transformer = Transformer.from_crs(dst_crs, "epsg:4326", always_xy=True)
bbox_4326 = transformer.transform_bounds(*dst_bbox)
PARQUET = "data/midwest_summer_2025.parquet"
if not Path(PARQUET).exists():
Path(PARQUET).parent.mkdir(exist_ok=True)
await rustac.search_to(
PARQUET,
href="https://earth-search.aws.element84.com/v1",
collections=["sentinel-2-c1-l2a"],
datetime="2025-06-01/2025-09-30",
bbox=bbox_4326,
limit=100,
)
store = lazycogs.store_for(PARQUET, skip_signature=True)
Basic usage¶
Open the DataArray with chunks={"time": 1} so the explain output reflects how dask will partition the work.
da = lazycogs.open(
PARQUET,
bbox=dst_bbox,
crs=dst_crs,
resolution=10.0,
store=store,
bands=["red", "green", "blue"],
chunks={"time": 1},
)
plan = da.lazycogs.explain()
print(plan.summary())
=== ExplainPlan === Parquet: data/midwest_summer_2025.parquet CRS: EPSG:5070 | Resolution: 10.0 units/px | Grid: 20000 x 20000 px Bands (3): red, green, blue Time steps: 55 (2025-06-01 - 2025-09-29) Chunks: 20000 x 20000 px -> 1x1 spatial tiles Total chunk reads: 165 (3 band(s) x 55 time step(s) x 1 spatial tile(s)) Total COG reads: 1932 Chunks with 0 COGs: 0 (0.0%) Chunks with 1 COG: 0 (0.0%) Chunks with 2+ COGs: 165 (100.0%) Max COGs per chunk: 23 (Pass fetch_headers=True to see overview levels and pixel windows.)
The summary shows:
- chunks: total number of dask chunks across all dimensions
- empty chunks: chunks that returned zero matching items from DuckDB — these are time steps with no data in your area
- items per chunk: distribution of how many STAC items (COGs) overlap each chunk
A high empty chunk fraction means your time series is sparse — consider using a longer time_period to aggregate into fewer, fuller steps. A high max_items value means some time steps have many overlapping scenes; the mosaic method and max_concurrent_reads are most relevant there.
Explaining a single time step¶
You can explain a subset of the array. This is useful for inspecting a specific time step interactively without running queries for the whole time series.
plan_one = da.isel(time=0).lazycogs.explain()
print(plan_one.summary())
=== ExplainPlan === Parquet: data/midwest_summer_2025.parquet CRS: EPSG:5070 | Resolution: 10.0 units/px | Grid: 20000 x 20000 px Bands (3): red, green, blue Time steps: 1 (2025-06-01) Chunks: 20000 x 20000 px -> 1x1 spatial tiles Total chunk reads: 3 (3 band(s) x 1 time step(s) x 1 spatial tile(s)) Total COG reads: 33 Chunks with 0 COGs: 0 (0.0%) Chunks with 1 COG: 0 (0.0%) Chunks with 2+ COGs: 3 (100.0%) Max COGs per chunk: 11 (Pass fetch_headers=True to see overview levels and pixel windows.)
Converting to a DataFrame¶
plan.to_dataframe() returns one row per (chunk, COG read). This is useful for programmatic analysis — for example, checking the distribution of items per band.
df = plan.to_dataframe()
df.head()
| band | time_index | date_filter | time_coord | chunk_row | chunk_col | chunk_width | chunk_height | n_cog_reads | item_id | asset_key | href | overview_level | overview_resolution | window_col_off | window_row_off | window_width | window_height | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | red | 0 | 2025-06-01 | 2025-06-01 | 0 | 0 | 20000 | 20000 | 11 | S2B_T13TGL_20250601T174840_L2A | red | https://e84-earth-search-sentinel-data.s3.us-w... | None | None | None | None | None | None |
| 1 | red | 0 | 2025-06-01 | 2025-06-01 | 0 | 0 | 20000 | 20000 | 11 | S2B_T14TLR_20250601T174840_L2A | red | https://e84-earth-search-sentinel-data.s3.us-w... | None | None | None | None | None | None |
| 2 | red | 0 | 2025-06-01 | 2025-06-01 | 0 | 0 | 20000 | 20000 | 11 | S2B_T14TMR_20250601T174840_L2A | red | https://e84-earth-search-sentinel-data.s3.us-w... | None | None | None | None | None | None |
| 3 | red | 0 | 2025-06-01 | 2025-06-01 | 0 | 0 | 20000 | 20000 | 11 | S2B_T13TGM_20250601T174840_L2A | red | https://e84-earth-search-sentinel-data.s3.us-w... | None | None | None | None | None | None |
| 4 | red | 0 | 2025-06-01 | 2025-06-01 | 0 | 0 | 20000 | 20000 | 11 | S2B_T14TLS_20250601T174840_L2A | red | https://e84-earth-search-sentinel-data.s3.us-w... | None | None | None | None | None | None |
df.groupby("band")["n_cog_reads"].describe()
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| band | ||||||||
| blue | 644.0 | 12.89441 | 4.43982 | 9.0 | 9.0 | 11.0 | 16.0 | 23.0 |
| green | 644.0 | 12.89441 | 4.43982 | 9.0 | 9.0 | 11.0 | 16.0 | 23.0 |
| red | 644.0 | 12.89441 | 4.43982 | 9.0 | 9.0 | 11.0 | 16.0 | 23.0 |
Fetching COG headers¶
Pass fetch_headers=True to also fetch the COG header for each matching item. This adds one HTTP request per unique COG file and returns additional information:
overview_level: which overview pyramid level would be read (0 = full resolution, higher = coarser)overview_resolution: the resolution of that overview levelwindow_width/window_height: the pixel window that would be read from each COG
This is the most accurate way to understand the actual I/O that .compute() would perform.
plan_headers = da.isel(time=0).lazycogs.explain(fetch_headers=True)
print(plan_headers.summary())
=== ExplainPlan === Parquet: data/midwest_summer_2025.parquet CRS: EPSG:5070 | Resolution: 10.0 units/px | Grid: 20000 x 20000 px Bands (3): red, green, blue Time steps: 1 (2025-06-01) Chunks: 20000 x 20000 px -> 1x1 spatial tiles Total chunk reads: 3 (3 band(s) x 1 time step(s) x 1 spatial tile(s)) Total COG reads: 33 Chunks with 0 COGs: 0 (0.0%) Chunks with 1 COG: 0 (0.0%) Chunks with 2+ COGs: 3 (100.0%) Max COGs per chunk: 11 Overview levels: full: 33 Avg read window: 6526 x 7697 px (Use .to_dataframe() for per-item overview and window details.)
df_headers = plan_headers.to_dataframe()
df_headers[
[
"item_id",
"band",
"overview_level",
"overview_resolution",
"window_width",
"window_height",
]
]
| item_id | band | overview_level | overview_resolution | window_width | window_height | |
|---|---|---|---|---|---|---|
| 0 | S2B_T13TGL_20250601T174840_L2A | red | None | 10.0 | 2595 | 6623 |
| 1 | S2B_T14TLR_20250601T174840_L2A | red | None | 10.0 | 8048 | 7142 |
| 2 | S2B_T14TMR_20250601T174840_L2A | red | None | 10.0 | 10980 | 7142 |
| 3 | S2B_T13TGM_20250601T174840_L2A | red | None | 10.0 | 2595 | 10980 |
| 4 | S2B_T14TLS_20250601T174840_L2A | red | None | 10.0 | 8048 | 10980 |
| 5 | S2B_T14TMS_20250601T174840_L2A | red | None | 10.0 | 10980 | 10980 |
| 6 | S2B_T13TGN_20250601T174840_L2A | red | None | 10.0 | 2595 | 6433 |
| 7 | S2B_T14TLT_20250601T174840_L2A | red | None | 10.0 | 8048 | 4468 |
| 8 | S2B_T14TNS_20250601T174840_L2A | red | None | 10.0 | 3459 | 10980 |
| 9 | S2B_T14TMT_20250601T174840_L2A | red | None | 10.0 | 10980 | 4468 |
| 10 | S2B_T14TNT_20250601T174840_L2A | red | None | 10.0 | 3459 | 4468 |
| 11 | S2B_T13TGL_20250601T174840_L2A | green | None | 10.0 | 2595 | 6623 |
| 12 | S2B_T14TLR_20250601T174840_L2A | green | None | 10.0 | 8048 | 7142 |
| 13 | S2B_T14TMR_20250601T174840_L2A | green | None | 10.0 | 10980 | 7142 |
| 14 | S2B_T13TGM_20250601T174840_L2A | green | None | 10.0 | 2595 | 10980 |
| 15 | S2B_T14TLS_20250601T174840_L2A | green | None | 10.0 | 8048 | 10980 |
| 16 | S2B_T14TMS_20250601T174840_L2A | green | None | 10.0 | 10980 | 10980 |
| 17 | S2B_T13TGN_20250601T174840_L2A | green | None | 10.0 | 2595 | 6433 |
| 18 | S2B_T14TLT_20250601T174840_L2A | green | None | 10.0 | 8048 | 4468 |
| 19 | S2B_T14TNS_20250601T174840_L2A | green | None | 10.0 | 3459 | 10980 |
| 20 | S2B_T14TMT_20250601T174840_L2A | green | None | 10.0 | 10980 | 4468 |
| 21 | S2B_T14TNT_20250601T174840_L2A | green | None | 10.0 | 3459 | 4468 |
| 22 | S2B_T13TGL_20250601T174840_L2A | blue | None | 10.0 | 2595 | 6623 |
| 23 | S2B_T14TLR_20250601T174840_L2A | blue | None | 10.0 | 8048 | 7142 |
| 24 | S2B_T14TMR_20250601T174840_L2A | blue | None | 10.0 | 10980 | 7142 |
| 25 | S2B_T13TGM_20250601T174840_L2A | blue | None | 10.0 | 2595 | 10980 |
| 26 | S2B_T14TLS_20250601T174840_L2A | blue | None | 10.0 | 8048 | 10980 |
| 27 | S2B_T14TMS_20250601T174840_L2A | blue | None | 10.0 | 10980 | 10980 |
| 28 | S2B_T13TGN_20250601T174840_L2A | blue | None | 10.0 | 2595 | 6433 |
| 29 | S2B_T14TLT_20250601T174840_L2A | blue | None | 10.0 | 8048 | 4468 |
| 30 | S2B_T14TNS_20250601T174840_L2A | blue | None | 10.0 | 3459 | 10980 |
| 31 | S2B_T14TMT_20250601T174840_L2A | blue | None | 10.0 | 10980 | 4468 |
| 32 | S2B_T14TNT_20250601T174840_L2A | blue | None | 10.0 | 3459 | 4468 |
Common diagnosis patterns¶
High empty chunk fraction — the time series is sparse. Consider increasing time_period (e.g. "P1W" instead of "P1D") to group more items into each step.
High max_items in the summary — some time steps have many overlapping COGs. This is common at high latitudes where satellite orbits overlap. FirstMethod (the default) will short-circuit once all pixels are filled, so the actual I/O may be much less than the item count suggests.
overview_level > 0 — lazycogs is reading a coarser overview pyramid level rather than full-resolution pixels. This is expected and efficient behavior — it means you're not reading more data than your output resolution requires.