API Reference¶

Download¶

pixelverse.download.sentinel1 ¶

Retrieve and process Sentinel-1 Ground Range Detected (GRD) data.

get_s1_monthly_time_series ¶

get_s1_monthly_time_series(
    bbox: tuple[
        int | float, int | float, int | float, int | float
    ],
    year: int,
    stac_host: str = "https://planetarycomputer.microsoft.com/api/stac/v1",
) -> xarray.Dataset

Fetch Sentinel-1 imagery for a bounding box and return average monthly values.

Parameters:

Name	Type	Description	Default
`bbox`	`tuple[float]`	Bounding box coordinates (min_lon, min_lat, max_lon, max_lat).	required
`year`	`int`	Year for which to fetch the imagery.	required
`stac_host`	`str`	STAC host URL. Defaults to Microsoft Planetary Computer.	`'https://planetarycomputer.microsoft.com/api/stac/v1'`

Returns:

Type	Description
`xarray.Dataset`	An xarray Dataset containing a time series of Sentinel-1, with the lowest cloud cover image per month selected.

Source code in src/pixelverse/download/sentinel1.py

def get_s1_monthly_time_series(
    bbox: tuple[int | float, int | float, int | float, int | float],
    year: int,
    stac_host: str = "https://planetarycomputer.microsoft.com/api/stac/v1",
) -> xr.Dataset:
    """
    Fetch Sentinel-1 imagery for a bounding box and return average monthly values.

    Parameters
    ----------
    bbox : tuple[float]
        Bounding box coordinates (min_lon, min_lat, max_lon, max_lat).
    year : int
        Year for which to fetch the imagery.
    stac_host : str, optional
        STAC host URL. Defaults to Microsoft Planetary Computer.

    Returns
    -------
    xr.Dataset
        An xarray Dataset containing a time series of Sentinel-1,
        with the lowest cloud cover image per month selected.
    """
    client = Client.open(stac_host)

    search = client.search(
        collections=["sentinel-1-grd"],
        bbox=bbox,
        datetime=f"{year}-01-01T00:00:00Z/{year}-12-31T23:59:59Z",
    )

    # Load the selected items into an xarray dataset, combining overlapping tiles
    dset = stac_load(
        search.items(),
        bbox=bbox,
        chunks={"time": 1, "x": 2048, "y": 2048},
        bands=["vv", "vh"],
        groupby="solar_day",
        resampling="bilinear",
    )

    # get monthly means and represent with 1st day of each month as the datetime
    dset_monthly = dset.groupby("time.month").mean()
    dset_monthly["month"] = pd.date_range(f"{year}-01", periods=12, freq="MS")
    dset_monthly = dset_monthly.rename({"month": "time"})

    return dset_monthly

linear_to_decibel ¶

linear_to_decibel(
    dataarray: xarray.DataArray,
) -> xarray.DataArray

Transform VV or VH values from linear to decibel scale.

Parameters:

Name	Type	Description	Default
`dataarray`	`xarray.DataArray`	Input DataArray with VV or VH values in linear scale.	required

Returns:

Type	Description
`xarray.DataArray`	DataArray with values converted to decibel scale using 10 * log_10(x).

Source code in src/pixelverse/download/sentinel1.py

def linear_to_decibel(dataarray: xr.DataArray) -> xr.DataArray:
    """
    Transform VV or VH values from linear to decibel scale.

    Parameters
    ----------
    dataarray : xr.DataArray
        Input DataArray with VV or VH values in linear scale.

    Returns
    -------
    xr.DataArray
        DataArray with values converted to decibel scale using 10 * log_10(x).
    """
    # Mask out areas with 0 so that np.log10 is not undefined
    da_linear = dataarray.where(cond=dataarray != 0)
    da_decibel = 10 * xr.ufuncs.log10(da_linear)
    return da_decibel

process_s1_dataset ¶

process_s1_dataset(dset: xarray.Dataset) -> xarray.Dataset

Process the Sentinel-1 xarray.Dataset by converting VV and VH bands from linear to decibel scale.

Parameters:

Name	Type	Description	Default
`dset`	`xarray.Dataset`	Input xarray Dataset containing 'vv' and 'vh' bands.	required

Returns:

Type	Description
`xarray.Dataset`	Processed xarray Dataset with 'vv_processed' and 'vh_processed' bands.

Source code in src/pixelverse/download/sentinel1.py

def process_s1_dataset(dset: xr.Dataset) -> xr.Dataset:
    """
    Process the Sentinel-1 xarray.Dataset by converting VV and VH bands
    from linear to decibel scale.

    Parameters
    ----------
    dset : xr.Dataset
        Input xarray Dataset containing 'vv' and 'vh' bands.

    Returns
    -------
    xr.Dataset
        Processed xarray Dataset with 'vv_processed' and 'vh_processed' bands.
    """
    dset["vv_processed"] = linear_to_decibel(dset["vv"])
    dset["vh_processed"] = linear_to_decibel(dset["vh"])
    return dset

pixelverse.download.sentinel2 ¶

Retrieve and process Sentinel-2 Collection 1 Level 2A multispectral data.

fill_missing_months_and_format ¶

fill_missing_months_and_format(
    dset: xarray.Dataset,
) -> xarray.Dataset

Fill missing months in the time series by forward filling previous data.

Parameters:

Name	Type	Description	Default
`dset`	`xarray.Dataset`	Input xarray Dataset with a time dimension representing months. Expected to be the output of `get_s2_time_series`.	required

Returns:

Type	Description
`xarray.Dataset`	An xarray Dataset wit day of year data variable added and missing months filled.

Source code in src/pixelverse/download/sentinel2.py

def fill_missing_months_and_format(dset: xr.Dataset) -> xr.Dataset:
    """
    Fill missing months in the time series by forward filling previous data.

    Parameters
    ----------
    dset : xr.Dataset
        Input xarray Dataset with a time dimension representing months.
        Expected to be the output of `get_s2_time_series`.

    Returns
    -------
    xr.Dataset
        An xarray Dataset wit day of year data variable added and missing months filled.
    """

    # add doy variable to format for model inference
    dset["doy"] = dset.time.dt.dayofyear

    existing_times = pd.DatetimeIndex(dset.time.values)

    missing_months = sorted(set(range(1, 13)) - set(existing_times.month))

    new_dates = pd.DatetimeIndex(
        [
            pd.Timestamp(year=pd.Timestamp(dset.time.values[0]).year, month=m, day=15)
            for m in missing_months
        ]
    )

    combined_times = existing_times.append(new_dates).sort_values()
    dset_filled = dset.reindex(time=combined_times, method="ffill")

    return dset_filled

get_s2_time_series ¶

get_s2_time_series(
    bbox: tuple[
        int | float, int | float, int | float, int | float
    ],
    year: int,
    stac_host: str = "https://earth-search.aws.element84.com/v1",
    cloudcover_max: int = 20,
) -> xarray.Dataset

Fetch Sentinel-2 imagery for a bounding box for each month of a specified year.

Parameters:

Name	Type	Description	Default
`bbox`	`tuple[float]`	Bounding box coordinates (min_lon, min_lat, max_lon, max_lat).	required
`year`	`int`	Year for which to fetch the imagery.	required
`stac_host`	`str`	STAC host URL. Defaults to Earth Search AWS.	`'https://earth-search.aws.element84.com/v1'`
`cloudcover_max`	`int`	Maximum cloud cover percentage for filtering images. Defaults to 50.	`20`

Returns:

Type	Description
`xarray.Dataset`	An xarray Dataset containing a time series of Sentinel-2, with the lowest cloud cover image per month selected.

Source code in src/pixelverse/download/sentinel2.py

def get_s2_time_series(
    bbox: tuple[int | float, int | float, int | float, int | float],
    year: int,
    stac_host: str = "https://earth-search.aws.element84.com/v1",
    cloudcover_max: int = 20,
) -> xr.Dataset:
    """
    Fetch Sentinel-2 imagery for a bounding box for each month of a specified year.

    Parameters
    ----------
    bbox : tuple[float]
        Bounding box coordinates (min_lon, min_lat, max_lon, max_lat).
    year : int
        Year for which to fetch the imagery.
    stac_host : str, optional
        STAC host URL. Defaults to Earth Search AWS.
    cloudcover_max : int
        Maximum cloud cover percentage for filtering images. Defaults to 50.

    Returns
    -------
    xr.Dataset
        An xarray Dataset containing a time series of Sentinel-2,
        with the lowest cloud cover image per month selected.
    """
    client = Client.open(stac_host)

    selected_items = []

    # Query each month separately
    for month in range(1, 13):
        # Calculate start and end dates for this month
        start_date = pd.Timestamp(year=year, month=month, day=1)
        if month == 12:
            end_date = pd.Timestamp(year=year + 1, month=1, day=1) - pd.Timedelta(seconds=1)
        else:
            end_date = pd.Timestamp(year=year, month=month + 1, day=1) - pd.Timedelta(seconds=1)

        search = client.search(
            collections=["sentinel-2-l2a"],
            bbox=bbox,
            datetime=f"{start_date.isoformat()}Z/{end_date.isoformat()}Z",
            query={"eo:cloud_cover": {"lt": cloudcover_max}},
            sortby=["+properties.eo:cloud_cover"],
        )

        # select lowest cloud cover for each unique MGRS tile
        items = list(search.items())
        if items:
            tiles = defaultdict(list)
            for item in items:
                mgrs_tile = item.id.split("_")[1]
                tiles[mgrs_tile].append(item)

        selected_items.extend(
            min(tile_items, key=lambda x: x.properties.get("eo:cloud_cover", 100))
            for tile_items in tiles.values()
        )
        if not items:
            print(
                f"No images found for {year}-{month} with cloud cover < {cloudcover_max}% "
                "using previous months' data"
            )

    if not selected_items:
        raise ValueError(f"No Sentinel-2 images found for bbox {bbox} in year {year}")

    # Load the selected items into an xarray dataset
    dset = stac_load(
        selected_items,
        bbox=bbox,
        chunks={"time": 1, "x": 2048, "y": 2048},
        bands=[
            "blue",
            "green",
            "red",
            "rededge1",
            "rededge2",
            "rededge3",
            "nir",
            "nir08",
            "swir16",
            "swir22",
        ],
        resolution=10,  # 10m resolution,
        dtype="uint16",
        nodata=0,
    )

    # time dim is first day of each month that appeared in the dataset
    dset_monthly = dset.groupby("time.month").mean(dtype="uint16")
    dset_monthly["month"] = dset.time.groupby("time.month").min().values
    dset_monthly = dset_monthly.rename({"month": "time"})

    return dset_monthly

Embedding creation¶

pixelverse.generate_embeddings ¶

Embeddings generation and quantization functions.

generate_embeddings ¶

generate_embeddings(
    s2_dset: xarray.Dataset,
    model_name: str = "tessera_s2_encoder",
) -> xarray.Dataset

Generate embeddings for a given Sentinel-2 dataset using the specified model.

Note: MPV function designed to work with small areas.

Parameters:

Name	Type	Description	Default
`s2_dset`	`xarray.Dataset`	Sentinel-2 dataset containing spectral bands and temporal information.	required
`model_name`	`str`	Name of the model to use for generating embeddings. Default is "tessera_s2_encoder".	`'tessera_s2_encoder'`

Returns:

Type	Description
`xarray.Dataset`	Dataset containing generated embeddings with spatial coordinates and CRS information.

Source code in src/pixelverse/generate_embeddings.py

def generate_embeddings(s2_dset: xr.Dataset, model_name: str = "tessera_s2_encoder") -> xr.Dataset:
    """
    Generate embeddings for a given Sentinel-2 dataset using the specified model.

    Note: MPV function designed to work with small areas.

    Parameters
    ----------
    s2_dset : xr.Dataset
        Sentinel-2 dataset containing spectral bands and temporal information.
    model_name : str, optional
        Name of the model to use for generating embeddings. Default is "tessera_s2_encoder".

    Returns
    -------
    xr.Dataset
        Dataset containing generated embeddings with spatial coordinates and CRS information.

    """
    model, transforms = create_model(model_name, pretrained=True)
    model.eval()

    # add day of year variable if not present
    if "doy" not in s2_dset:
        s2_dset["doy"] = s2_dset.time.dt.dayofyear

    # Stack spatial dimensions into pixels
    s2_stacked = s2_dset[S2_BANDS].to_array(dim="band").stack(pixel=["y", "x"])

    # Prepare tensors
    # s2_stacked.values shape: (band, time, pixel) -> transpose to (pixel, time, band)
    s2_tensor = torch.from_numpy(
        s2_stacked.values.transpose(2, 1, 0)
    ).float()  # (pixel, time, band)

    # DOY tensor: expand to match number of pixels
    doy_tensor = (
        torch.from_numpy(s2_dset.doy.values).unsqueeze(0).expand(s2_tensor.shape[0], -1).float()
    )  # (pixel, time)

    # Concatenate bands and DOY along last dimension
    x = torch.cat([s2_tensor, doy_tensor.unsqueeze(-1)], dim=-1)  # (pixel, time, 11)

    # Run through model
    with torch.no_grad():
        embeddings = model(transforms(x))

    # Convert back to xarray with spatial structure
    dset_embeddings = (
        xr.DataArray(
            embeddings.numpy(),
            dims=["pixel", "feature"],
            coords={"pixel": s2_stacked.pixel},
        )
        .unstack("pixel")
        .to_dataset(name="embedding")
    )

    # Add back spatial info
    dset_embeddings.rio.write_crs(s2_dset.rio.crs, inplace=True)

    return dset_embeddings

quantize_embeddings ¶

quantize_embeddings(
    embeddings: xarray.DataArray,
) -> xarray.Dataset

Quantize embeddings from float32 to int8 to save space (4x compression).

Normalizes each feature independently to the int8 range [-128, 127]. The quantized embeddings can be used directly for similarity comparisons and other operations without dequantization.

Parameters:

Name	Type	Description	Default
`embeddings`	`xarray.DataArray`	Float32 embeddings with shape (feature, y, x)	required

Returns:

Type	Description
`xarray.Dataset`	Dataset containing quantized int8 embeddings with 'embedding' variable and spatial coordinates preserved

Source code in src/pixelverse/generate_embeddings.py

def quantize_embeddings(embeddings: xr.DataArray) -> xr.Dataset:
    """
    Quantize embeddings from float32 to int8 to save space (4x compression).

    Normalizes each feature independently to the int8 range [-128, 127].
    The quantized embeddings can be used directly for similarity comparisons
    and other operations without dequantization.

    Parameters
    ----------
    embeddings : xr.DataArray
        Float32 embeddings with shape (feature, y, x)

    Returns
    -------
    xr.Dataset
        Dataset containing quantized int8 embeddings with 'embedding' variable
        and spatial coordinates preserved
    """

    min_vals = embeddings.min(dim=["y", "x"]).values
    max_vals = embeddings.max(dim=["y", "x"]).values

    scale = (max_vals - min_vals) / 255.0
    scale = np.where(scale == 0, 1, scale)

    quantized_values = np.round(
        (embeddings.values - min_vals[:, None, None]) / scale[:, None, None]
    )
    quantized_values = np.clip(quantized_values, 0, 255).astype(np.uint8)

    # Create Dataset directly - most efficient
    return xr.Dataset(
        data_vars={"embedding": (embeddings.dims, quantized_values)},
        coords=embeddings.coords,
        attrs={"quantized": True},
    )