Inspecting read plans¶

Before computing an array, you can ask lazycogs exactly what DuckDB queries and COG reads would fire — without touching any pixel data. This is useful for:

Diagnosing sparse time series (chunks with no matching items)
Understanding scene overlap in your area of interest
Choosing chunk sizes
Verifying that the right overview level will be read for your resolution

The da.lazycogs.explain() method runs the same spatial DuckDB queries as .compute() but stops before any pixel I/O.

In [1]:

Copied!

from pathlib import Path

import rustac
from pyproj import Transformer

import lazycogs
from pathlib import Path

import rustac
from pyproj import Transformer

import lazycogs

In [2]:

Copied!





dst_crs = "EPSG:5070"
dst_bbox = (-400_000, 2_500_000, -200_000, 2_700_000)

transformer = Transformer.from_crs(dst_crs, "epsg:4326", always_xy=True)
bbox_4326 = transformer.transform_bounds(*dst_bbox)

PARQUET = "data/midwest_summer_2025.parquet"

if not Path(PARQUET).exists():
    Path(PARQUET).parent.mkdir(exist_ok=True)
    await rustac.search_to(
        PARQUET,
        href="https://earth-search.aws.element84.com/v1",
        collections=["sentinel-2-c1-l2a"],
        datetime="2025-06-01/2025-09-30",
        bbox=bbox_4326,
        limit=100,
    )

store = lazycogs.store_for(PARQUET, skip_signature=True)
dst_crs = "EPSG:5070"
dst_bbox = (-400_000, 2_500_000, -200_000, 2_700_000)

transformer = Transformer.from_crs(dst_crs, "epsg:4326", always_xy=True)
bbox_4326 = transformer.transform_bounds(*dst_bbox)

PARQUET = "data/midwest_summer_2025.parquet"

if not Path(PARQUET).exists():
    Path(PARQUET).parent.mkdir(exist_ok=True)
    await rustac.search_to(
        PARQUET,
        href="https://earth-search.aws.element84.com/v1",
        collections=["sentinel-2-c1-l2a"],
        datetime="2025-06-01/2025-09-30",
        bbox=bbox_4326,
        limit=100,
    )

store = lazycogs.store_for(PARQUET, skip_signature=True)

Basic usage¶

Open the DataArray with chunks={"time": 1} so the explain output reflects how dask will partition the work.

In [3]:

Copied!





da = lazycogs.open(
    PARQUET,
    bbox=dst_bbox,
    crs=dst_crs,
    resolution=10.0,
    store=store,
    bands=["red", "green", "blue"],
    chunks={"time": 1},
)

plan = da.lazycogs.explain()
print(plan.summary())
da = lazycogs.open(
    PARQUET,
    bbox=dst_bbox,
    crs=dst_crs,
    resolution=10.0,
    store=store,
    bands=["red", "green", "blue"],
    chunks={"time": 1},
)

plan = da.lazycogs.explain()
print(plan.summary())

=== ExplainPlan ===
Parquet:    data/midwest_summer_2025.parquet
CRS:        EPSG:5070  |  Resolution: 10.0 units/px  |  Grid: 20000 x 20000 px
Bands (3):  red, green, blue
Time steps: 55 (2025-06-01 - 2025-09-29)
Chunks:     20000 x 20000 px -> 1x1 spatial tiles

Total chunk reads:     165 (3 band(s) x 55 time step(s) x 1 spatial tile(s))
Total COG reads:       1932
Chunks with 0 COGs:       0 (0.0%)
Chunks with 1 COG:        0 (0.0%)
Chunks with 2+ COGs:    165 (100.0%)
Max COGs per chunk:    23
(Pass fetch_headers=True to see overview levels and pixel windows.)

The summary shows:

chunks: total number of dask chunks across all dimensions
empty chunks: chunks that returned zero matching items from DuckDB — these are time steps with no data in your area
items per chunk: distribution of how many STAC items (COGs) overlap each chunk

A high empty chunk fraction means your time series is sparse — consider using a longer time_period to aggregate into fewer, fuller steps. A high max_items value means some time steps have many overlapping scenes; the mosaic method and max_concurrent_reads are most relevant there.

Explaining a single time step¶

You can explain a subset of the array. This is useful for inspecting a specific time step interactively without running queries for the whole time series.

In [4]:

Copied!

plan_one = da.isel(time=0).lazycogs.explain()
print(plan_one.summary())
plan_one = da.isel(time=0).lazycogs.explain()
print(plan_one.summary())

=== ExplainPlan ===
Parquet:    data/midwest_summer_2025.parquet
CRS:        EPSG:5070  |  Resolution: 10.0 units/px  |  Grid: 20000 x 20000 px
Bands (3):  red, green, blue
Time steps: 1 (2025-06-01)
Chunks:     20000 x 20000 px -> 1x1 spatial tiles

Total chunk reads:     3 (3 band(s) x 1 time step(s) x 1 spatial tile(s))
Total COG reads:       33
Chunks with 0 COGs:       0 (0.0%)
Chunks with 1 COG:        0 (0.0%)
Chunks with 2+ COGs:      3 (100.0%)
Max COGs per chunk:    11
(Pass fetch_headers=True to see overview levels and pixel windows.)

Converting to a DataFrame¶

plan.to_dataframe() returns one row per (chunk, COG read). This is useful for programmatic analysis — for example, checking the distribution of items per band.

In [5]:

Copied!

df = plan.to_dataframe()
df.head()
df = plan.to_dataframe()
df.head()

Out[5]:

	band	date_filter	time_coord	chunk_width	chunk_height	n_cog_reads	item_id	asset_key	href	overview_level	overview_resolution	window_col_off	window_row_off	window_width	window_height
0	red	2025-06-01	2025-06-01	20000	20000	11	S2B_T13TGL_20250601T174840_L2A	red	https://e84-earth-search-sentinel-data.s3.us-w...	None	None	None	None	None	None
1	red	2025-06-01	2025-06-01	20000	20000	11	S2B_T14TLR_20250601T174840_L2A	red	https://e84-earth-search-sentinel-data.s3.us-w...	None	None	None	None	None	None
2	red	2025-06-01	2025-06-01	20000	20000	11	S2B_T14TMR_20250601T174840_L2A	red	https://e84-earth-search-sentinel-data.s3.us-w...	None	None	None	None	None	None
3	red	2025-06-01	2025-06-01	20000	20000	11	S2B_T13TGM_20250601T174840_L2A	red	https://e84-earth-search-sentinel-data.s3.us-w...	None	None	None	None	None	None
4	red	2025-06-01	2025-06-01	20000	20000	11	S2B_T14TLS_20250601T174840_L2A	red	https://e84-earth-search-sentinel-data.s3.us-w...	None	None	None	None	None	None

In [6]:

Copied!

df.groupby("band")["n_cog_reads"].describe()
df.groupby("band")["n_cog_reads"].describe()

Out[6]:

	count	mean	std	min	25%	50%	75%	max
band
blue	644.0	12.89441	4.43982	9.0	9.0	11.0	16.0	23.0
green	644.0	12.89441	4.43982	9.0	9.0	11.0	16.0	23.0
red	644.0	12.89441	4.43982	9.0	9.0	11.0	16.0	23.0

Fetching COG headers¶

Pass fetch_headers=True to also fetch the COG header for each matching item. This adds one HTTP request per unique COG file and returns additional information:

overview_level: which overview pyramid level would be read (0 = full resolution, higher = coarser)
overview_resolution: the resolution of that overview level
window_width / window_height: the pixel window that would be read from each COG

This is the most accurate way to understand the actual I/O that .compute() would perform.

In [7]:

Copied!

plan_headers = da.isel(time=0).lazycogs.explain(fetch_headers=True)
print(plan_headers.summary())
plan_headers = da.isel(time=0).lazycogs.explain(fetch_headers=True)
print(plan_headers.summary())

=== ExplainPlan ===
Parquet:    data/midwest_summer_2025.parquet
CRS:        EPSG:5070  |  Resolution: 10.0 units/px  |  Grid: 20000 x 20000 px
Bands (3):  red, green, blue
Time steps: 1 (2025-06-01)
Chunks:     20000 x 20000 px -> 1x1 spatial tiles

Total chunk reads:     3 (3 band(s) x 1 time step(s) x 1 spatial tile(s))
Total COG reads:       33
Chunks with 0 COGs:       0 (0.0%)
Chunks with 1 COG:        0 (0.0%)
Chunks with 2+ COGs:      3 (100.0%)
Max COGs per chunk:    11
Overview levels:       full: 33
Avg read window:       6526 x 7697 px
(Use .to_dataframe() for per-item overview and window details.)

In [8]:

Copied!





df_headers = plan_headers.to_dataframe()
df_headers[
    [
        "item_id",
        "band",
        "overview_level",
        "overview_resolution",
        "window_width",
        "window_height",
    ]
]
df_headers = plan_headers.to_dataframe()
df_headers[
    [
        "item_id",
        "band",
        "overview_level",
        "overview_resolution",
        "window_width",
        "window_height",
    ]
]

Out[8]:

	item_id	band	overview_level	overview_resolution	window_width	window_height
0	S2B_T13TGL_20250601T174840_L2A	red	None	10.0	2595	6623
1	S2B_T14TLR_20250601T174840_L2A	red	None	10.0	8048	7142
2	S2B_T14TMR_20250601T174840_L2A	red	None	10.0	10980	7142
3	S2B_T13TGM_20250601T174840_L2A	red	None	10.0	2595	10980
4	S2B_T14TLS_20250601T174840_L2A	red	None	10.0	8048	10980
5	S2B_T14TMS_20250601T174840_L2A	red	None	10.0	10980	10980
6	S2B_T13TGN_20250601T174840_L2A	red	None	10.0	2595	6433
7	S2B_T14TLT_20250601T174840_L2A	red	None	10.0	8048	4468
8	S2B_T14TNS_20250601T174840_L2A	red	None	10.0	3459	10980
9	S2B_T14TMT_20250601T174840_L2A	red	None	10.0	10980	4468
10	S2B_T14TNT_20250601T174840_L2A	red	None	10.0	3459	4468
11	S2B_T13TGL_20250601T174840_L2A	green	None	10.0	2595	6623
12	S2B_T14TLR_20250601T174840_L2A	green	None	10.0	8048	7142
13	S2B_T14TMR_20250601T174840_L2A	green	None	10.0	10980	7142
14	S2B_T13TGM_20250601T174840_L2A	green	None	10.0	2595	10980
15	S2B_T14TLS_20250601T174840_L2A	green	None	10.0	8048	10980
16	S2B_T14TMS_20250601T174840_L2A	green	None	10.0	10980	10980
17	S2B_T13TGN_20250601T174840_L2A	green	None	10.0	2595	6433
18	S2B_T14TLT_20250601T174840_L2A	green	None	10.0	8048	4468
19	S2B_T14TNS_20250601T174840_L2A	green	None	10.0	3459	10980
20	S2B_T14TMT_20250601T174840_L2A	green	None	10.0	10980	4468
21	S2B_T14TNT_20250601T174840_L2A	green	None	10.0	3459	4468
22	S2B_T13TGL_20250601T174840_L2A	blue	None	10.0	2595	6623
23	S2B_T14TLR_20250601T174840_L2A	blue	None	10.0	8048	7142
24	S2B_T14TMR_20250601T174840_L2A	blue	None	10.0	10980	7142
25	S2B_T13TGM_20250601T174840_L2A	blue	None	10.0	2595	10980
26	S2B_T14TLS_20250601T174840_L2A	blue	None	10.0	8048	10980
27	S2B_T14TMS_20250601T174840_L2A	blue	None	10.0	10980	10980
28	S2B_T13TGN_20250601T174840_L2A	blue	None	10.0	2595	6433
29	S2B_T14TLT_20250601T174840_L2A	blue	None	10.0	8048	4468
30	S2B_T14TNS_20250601T174840_L2A	blue	None	10.0	3459	10980
31	S2B_T14TMT_20250601T174840_L2A	blue	None	10.0	10980	4468
32	S2B_T14TNT_20250601T174840_L2A	blue	None	10.0	3459	4468

Common diagnosis patterns¶

High empty chunk fraction — the time series is sparse. Consider increasing time_period (e.g. "P1W" instead of "P1D") to group more items into each step.

High max_items in the summary — some time steps have many overlapping COGs. This is common at high latitudes where satellite orbits overlap. FirstMethod (the default) will short-circuit once all pixels are filled, so the actual I/O may be much less than the item count suggests.

overview_level > 0 — lazycogs is reading a coarser overview pyramid level rather than full-resolution pixels. This is expected and efficient behavior — it means you're not reading more data than your output resolution requires.

In [ ]: