COG Talk — Part 2 Mosaics

Blog

Estimated
13 min read

This blog is the second in a series called COG Talk, which looks at ways to use Cloud Optimized GeoTIFF, and why we use them.

The first post is a refresh on the COG format and announces the release of version 1.0.0 of rio-tiler and rio-cogeo. Here, we’ll see how we can use them to build mosaics for web maps.

Multiple high resolution Cloud Optimized GeoTIFF hosted on OpenAerialMap ([link](https://map.openaerialmap.org/#/-84.45327758789061,40.55763465737646,9/user/5ca3a1d0e5635f000690af69?_k=71blxk)) Multiple high resolution Cloud Optimized GeoTIFF hosted on OpenAerialMap (link)

COG vs Map Tiles

Cloud Optimized GeoTIFF files, as the name implies, are specifically designed for easily accessing remote raster data. Because of the internal tiling and internal overviews, people often ask: can COGs replace map tiles? The usual response is: yes, but

Cloud Optimized GeoTIFF can replace .mbtiles or statically generated map tiles by using a proxy to render tiles dynamically (e.g lambda tiler ). But when it comes to large datasets, the GeoTIFF files become too big and lose the advantage of fast remote reads.

There is no size limit for GeoTIFF and when working with a country or worldwide dataset, COGs can get quite large. Even using compression to create a reasonably sized file, it’s likely that the resulting GeoTIFF header— used to look up internal data tile locations — will be very large and will slow down the dynamic tiling processes.

If we instead decide to read a large collection of files and create a mosaic, we have a new set of issues: how can we decide which pixel to display given overlapping tiles? How can we update this pixel choice decision if we have daily updated data?

To help solve those problems, we are releasing rio-tiler-mosaic , a rio-tiler plugin allowing Mercator tile creation from multiple observations, and the associated mosaicJSON specification.

rio-tiler-mosaic

bbbf474e66df  03eD O2fwLVz7kV8x

Creating a mosaic, in its simplest form, involves choosing pixels from multiple images to create a single image. To provide a dynamic map tile endpoint, we’ll need to repeat this process for each individual Web-Mercator tile. rio-tiler-mosaic provides two important methods to make this simple: pixel selection for merging the tile arrays together and smart multi-threading for quickly managing a large number of images.

Pixel selection

Creating map tiles using COGs means we are dynamically generating the image array in response to a tile request. When working with mosaics, it also means we need to choose which pixel we display when we have an overlapping dataset. For a given tile, we iterate over all intersecting input images to decide which pixel to choose from. By default, rio-tiler-mosaic provides four different pixel selections rules:

  • First: take the pixel in the first matching image and return when the tile is full
  • Brightest: loop through all images and return the highest pixel value
  • Darkest: loop through all images and return the lowest pixel value
  • Last: take the pixel in the last matching image and return when the tile is full

Pixel selection methods applied on Landsat-8 NDVI values for all 2018 observations over Montreal area. Pixel selection methods applied on Landsat-8 NDVI values for all 2018 observations over Montreal area.

Smart multi-threading

For each pixel selection method, we need to either process the whole stack of images or just the first few until the array is full. We are using multi-threading to fetch and read data in parallel to speed up the process. Because the First and Last pixel-selection methods should return as soon as the tile is totally filled, we implemented a partial multi-threading approach by processing chunks of assets in parallel instead of the full list (see code). This is particularly handy when the list of assets is long.

Usage

The mosaic tile handler is designed to roughly match rio-tiler’s tile handlers and returns tile and mask arrays. Here is an example of generating a mosaic tile.

from rio_tiler.main import tile as cog_tiler
from rio_tiler_mosaic import mosaic

assets = ["myfile1.tif", "myfile2.tif"]

tile, mask = mosaic.mosaic_tiler(
  assets,  # List of image assets
  x,  # Mercator tile X index
  y,  # Mercator tile Y index
  z,  # Mercator tile Z index
  cog_tiler,  # rio-tiler's tiler
  pixel_selection="first". # Pixel selection method
)

We have published a first beta version of rio-tiler-mosaic on Pypi and the source code is available on Github.

mosaicJSON

In addition to rio-tiler-mosaic, today we are releasing a specification for representing Web Mercator mosaics constructed from multiple observations. The mosaicJSON specification can be seen as the GDAL VirtualRaster ( VRT), but for indexing files by Web Mercator quadkeys. Quadkeys (a contraction of quadtree and key) are “one-dimensional strings” representing unique Z-X-Y (level-row-column) Web Mercator map tiles.

COG footprint and Quadkey index COG footprint and Quadkey index

The goal of the metadata specification is to provide a simple spatial index linking the COGs to the XYZ Web Mercator map tile to render. Note that in the rio-tiler-mosaic examples above, we assume that the list of input image assets for a given tile is already “known”: the mosaicJSON file can provide that input.

The most important requirement is that each quadkey has a zoom level equal to the mosaicJSON minzoom. We use this to calculate the parent tile for a given input tile between the mosaic’s minzoom and maxzoom.

While this is still a Work in Progress the main features are:

  • quadkey based file index
  • simple JSON format (enabling high ratio compression)

Specification

{
    // REQUIRED. A semver.org style version number. Describes the version of
    // the MosaicJSON spec that is implemented by this JSON object.
    "mosaicjson": "0.0.1",

    // OPTIONAL. Default: null. A name describing the tileset. The name can
    // contain any legal character. Implementations SHOULD NOT interpret the
    // name as HTML.
    "name": "compositing",

    // OPTIONAL. Default: null. A text description of the tileset. The
    // description can contain any legal character. Implementations SHOULD NOT
    // interpret the description as HTML.
    "description": "A simple, light grey world.",

    // OPTIONAL. Default: "1.0.0". A semver.org style version number. When
    // changes across files are introduced, the minor version MUST change.
    // This may lead to cut off labels. Therefore, implementors can decide to
    // clean their cache when the minor version changes. Changes to the patch
    // level MUST only have changes to tiles that are contained within one tile.
    // When tiles change significantly, the major version MUST be increased.
    // Implementations MUST NOT use tiles with different major versions.
    "version": "1.0.0",

    // OPTIONAL. Default: null. Contains an attribution to be displayed
    // when the map is shown to a user. Implementations MAY decide to treat this
    // as HTML or literal text. For security reasons, make absolutely sure that
    // this field can't be abused as a vector for XSS or beacon tracking.
    "attribution": "<a href='http://openstreetmap.org'>OSM contributors</a>",

    // REQUIRED.
    // An integer specifying the minimum zoom level.
    "minzoom": 0,

    // REQUIRED.
    // An integer specifying the maximum zoom level. MUST be >= minzoom.
    "maxzoom": 11,

    // OPTIONAL. Default: [-180, -90, 180, 90].
    // The maximum extent of available map tiles. Bounds MUST define an area
    // covered by all zoom levels. The bounds are represented in WGS:84
    // latitude and longitude values, in the order left, bottom, right, top.
    // Values may be integers or floating point numbers.
    "bounds": [ -180, -85.05112877980659, 180, 85.0511287798066 ],

    // OPTIONAL. Default: null.
    // The first value is the longitude, the second is latitude (both in
    // WGS:84 values), the third value is the zoom level as an integer.
    // Longitude and latitude MUST be within the specified bounds.
    // The zoom level MUST be between minzoom and maxzoom.
    // Implementations can use this value to set the default location. If the
    // value is null, implementations may use their own algorithm for
    // determining a default location.
    "center": [ -76.275329586789, 39.153492567373, 8 ]

    // REQUIRED. A dictionary of per quadkey dataset in form of {quadkeys: [datasets]} pairs.
    // Keys MUST be valid quadkeys index with zoom level equal to mosaic `minzoom`.
    // Values MUST be arrays of strings (url or sceneid) pointing to a
    // Cloud Optimized dataset with bounds intersecting with the quadkey bounds.
    "tiles": {
        "030130": [
            "s3://my-bucket/dir/file1.tif",
            "s3://my-bucket/dir/file2.tif",
        ]
    }
}

A complete example of a mosaic definition based on mosaicJSON specification can be found here .

Example of implementation

Here is a simple implementation of mosaic tiling using the specification. On each tiler’s call, our handler ( my_mosaic_handler) function fetches the mosaic file and looks for the assets indexed by the Web Mercator parent tile (at minzoom level) for the input XYZ tile. We then use the rio-tiler-mosaic to combine the different image assets into one tile.

"""mosaic handler: Fetch mosaicJSON definition and get asset list."""

import mercantile
from rio_tiler.main import tile as cogeo_tiler
from rio_tiler_mosaic.mosaic import mosaic_tiler

def my_mosaic_handler(z, x, y, mosaic_url):
    """Get assets and pass them to the mosaic_tiler."""
    mercator_tile = mercantile.Tile(x=x, y=y, z=z)
    mosaic_def = fetch_mosaic_definition(mosaic_url)
    quadkey_zoom = mosaic_def["minzoom"]

    # Get the parent mercator tile for the given zxy tile
    # Simple mosaic definition where tile zoom level is >= to the quadkey zoom level
    if mercator_tile.z > quadkey_zoom:
        depth = mercator_tile.z - quadkey_zoom
        for i in range(depth):
            mercator_tile = mercantile.parent(mercator_tile)

    # Convert XYZ tile to quadkey
    quadkey = mercantile.quadkey(*mercator_tile)

    assets = mosaic_def["tiles"].get(quadkey)

    # Pass assets to the mosaic tiler
    tile, mask = mosaic_tiler(assets, x, y, z, cogeo_tiler)

Creating a mosaicJSON definition

Now that we know how to use it, let’s create a new mosaicJSON definition. Let’s say we have 28 files (e.g. Facebook population density) covering almost the whole continent of Africa.

High-resolution population density maps COGs from Facebook AI ([link](https://ai.facebook.com/blog/mapping-the-world-to-help-aid-workers-with-weakly-semi-supervised-learning)) High-resolution population density maps COGs from Facebook AI (link)

Quadkey base zoom (or mosaic min-zoom):

To construct the spatial index we need to define a minimal set of quadkeys. By definition, all COGs intersect with the tile 0-0-0 (zoom 0) but we can do a little math to minimize the set further. Inspecting the input TIFs, we see that the native resolution is around 0.00027 degrees with 8 levels of overviews. The native resolution corresponds to Web Mercator zoom level 12 so we can guess that we should start tiling at minzoom = 4 (12 - 8 = 4). We’ll use this as the base zoom for quadkey keys.

`# Create Footprint`

$ parallel -j4 rio bounds ::: $(aws s3 ls opendata.remotepixel.ca/facebook/ --recursive | grep ".tif" | awk '{print "s3://opendata.remotepixel.ca/"$NF}') | jq -c '.features[0]' | fio collect > facebook.geojson

# Find ourZoom 4 quadkeys(there are 13) 

$ cat facebook.geojson | supermercado burn 4 | mercantile quadkey | paste -s -d"," -

0330,0331,1220,1221,0333,1222,1223,3000,3001,3010,3002,3003,3012

Zoom 4 Mercator tiles intersecting with the facebook population dataset. Zoom 4 Mercator tiles intersecting with the facebook population dataset.

The mosaic definition is then built by finding the COGs intersecting with each of the 13 zoom 4 tiles:

"""Create mosaicJSON definition file."""
from concurrent import futures

import mercantile
from shapely.geometry import shape, box
from shapely.ops import cascaded_union
from supermercado import burntiles, uniontiles

import rasterio
from rasterio.warp import transform_bounds

max_zoom = 12
min_zoom = 4
dataset_list = ["my-file1.tif", ...]  # Facebook dataset

def _get_bounds(asset):
    """Use Rasterio to get the footprint of the dataset."""
    with rasterio.open(asset) as src_dst:
        bounds = transform_bounds(*[src_dst.crs, "epsg:4326"] + list(src_dst.bounds), densify_pts=21)
        return {
            "geometry": {"type": "Polygon", "coordinates": [[[bounds[0], bounds[3]], [bounds[0], bounds[1]], [bounds[2], bounds[1]], [bounds[2], bounds[3]], [bounds[0], bounds[3]]]]},
            "properties": {"path": src_path, "bounds": bounds,},
            "type": "Feature",
        }

# Get Bounds geometry for all files
with futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(_get_bounds, dataset_list))

dataset = [{"path": f["properties"]["path"], "geometry": shape(f["geometry"])} for f in filter(None, results)]

# find mercator tiles intersecting with all the footprint at min_zoom (quadkey tiles)
tiles = burntiles.burn(dataset, min_zoom)
tiles = ["{2}-{0}-{1}".format(*tile.tolist()) for tile in tiles]

# Merge the tile geometries to get the bounds and center
collection = cascaded_union([shape(f["geometry"]) for f in uniontiles.union(tiles, True)])

mosaic_definition = dict(
    minzoom=min_zoom,
    maxzoom=max_zoom,
    bounds=list(collection.bounds),
    center=[collection.centroid.x, collection.centroid.y],
    tiles={},
)

for parent in tiles:
    z, x, y = list(map(int, parent.split("-")))
    parent = mercantile.Tile(x=x, y=y, z=z)
    quadkey = mercantile.quadkey(*parent)
    tile_geometry = box(*mercantile.bounds(parent))

    # Find interesecting COG with parent tile
    fdataset = list(filter(lambda x: tile_geometry.intersects(x["geometry"]), dataset))
    if len(fdataset):
        mosaic_definition["tiles"][quad] = [f["path"] for f in fdataset]

cogeo-mosaic: a CLI and a Serverless stack to create and use mosaicJSON

[https://github.com/developmentseed/cogeo-mosaic](https://github.com/developmentseed/cogeo-mosaic) https://github.com/developmentseed/cogeo-mosaic

Wrapping up this new specification and the new rio-tiler-mosaic plugin we are also releasing cogeo-mosaic , a Serverless stack (based on AWS Lambda) to create and use the mosaicJSON specification. You can also install cogeo-mosaic locally and use the built-in CLI to create mosaicJSON from a set of files.

`$ pip install https://github.com/developmentseed/cogeo-mosaic`

$ cogeo-mosaic create my_list_of_files.txt -o mosaic.json

Is mosaicJSON appropriate for all mosaic map tile problems?

Obviously not, but we have been testing this solution on different projects and we find it simple and fast in most cases. That said, here are the pro/cons:

Pro

  • Less filesto manage: In the case of the Facebook dataset, we ended up having 29 files stored in the cloud instead of ~600 000 if we created all the Mercator tiles image files.
`# Get number of tiles from zoom 4 to zoom 12 using [https://github.com/mapbox/supermercado/pull/26`](https://github.com/mapbox/supermercado/pull/26)

$ cat facebook.geojson | supermercado burn 4..12 | sort | uniq | wc -l

  613 456
  • Flexibility: When we create the tile, we have the ability to decide whichpixel should have priority and be rendered on top.
  • mosaicJSON files can be relatively small (especiallyif using compression) and can be cached to increase performance.

Cons

  • Because we need a dynamic tiler, creating a tile on the fly will always be slightly slower than having the tile already ready to be served to the client.
  • Tile creation can take several seconds when using darkest/brightest methods (becauseit has to read all the COGs).
  • Only supports theWeb Mercator projection (as for rio-tiler).
  • Generating mosaicJSON files can be slowfor large areas with many images.

Community

Both the mosaicJSON specification and rio-tiler-mosaic are published on Github and we welcome any feedback and/or contributors. Please feel free to ping me on Twitter @VincentS if you have questions or want to hear more about the work we are doing to make open data more open and easier to use.

Demo

We built a simple demo page where you can explore some mosaicJSON examples and even share your own: https://bl.ocks.org/vincentsarago/raw/815884188c243b636ab8d927d8942a4d/

Hundreds of mercator tiles created dynamically based on mosaicJSON definition.Hundreds of mercator tiles created dynamically based on mosaicJSON definition.

What we're doing.

Latest