This blog is the second in a series called COG Talk, which looks at ways to use Cloud Optimized GeoTIFF, and why we use them.
The first post is a refresh on the COG format and announces the release of version 1.0.0 of rio-tiler and rio-cogeo. Here, we’ll see how we can use them to build mosaics for web maps.
Multiple high resolution Cloud Optimized GeoTIFF hosted on OpenAerialMap (link)
COG vs Map Tiles
Cloud Optimized GeoTIFF files, as the name implies, are specifically designed for easily accessing remote raster data. Because of the internal tiling and internal overviews, people often ask: can COGs replace map tiles? The usual response is: yes, but…
Cloud Optimized GeoTIFF can replace .mbtiles
or statically generated map tiles by using a proxy to render tiles dynamically (e.g lambda tiler ). But when it comes to large datasets, the GeoTIFF files become too big and lose the advantage of fast remote reads.
There is no size limit for GeoTIFF and when working with a country or worldwide dataset, COGs can get quite large. Even using compression to create a reasonably sized file, it’s likely that the resulting GeoTIFF header— used to look up internal data tile locations — will be very large and will slow down the dynamic tiling processes.
If we instead decide to read a large collection of files and create a mosaic, we have a new set of issues: how can we decide which pixel to display given overlapping tiles? How can we update this pixel choice decision if we have daily updated data?
To help solve those problems, we are releasing rio-tiler-mosaic , a rio-tiler plugin allowing Mercator tile creation from multiple observations, and the associated mosaicJSON specification.
rio-tiler-mosaic
Creating a mosaic, in its simplest form, involves choosing pixels from multiple images to create a single image. To provide a dynamic map tile endpoint, we’ll need to repeat this process for each individual Web-Mercator tile. rio-tiler-mosaic
provides two important methods to make this simple: pixel selection for merging the tile arrays together and smart multi-threading for quickly managing a large number of images.
Pixel selection
Creating map tiles using COGs means we are dynamically generating the image array in response to a tile request. When working with mosaics, it also means we need to choose which pixel we display when we have an overlapping dataset. For a given tile, we iterate over all intersecting input images to decide which pixel to choose from. By default, rio-tiler-mosaic
provides four different pixel selections rules:
-
First: take the pixel in the first matching image and return when the tile is full
-
Brightest: loop through all images and return the highest pixel value
-
Darkest: loop through all images and return the lowest pixel value
-
Last: take the pixel in the last matching image and return when the tile is full
Pixel selection methods applied on Landsat-8 NDVI values for all 2018 observations over Montreal area.
Smart multi-threading
For each pixel selection method, we need to either process the whole stack of images or just the first few until the array is full. We are using multi-threading to fetch and read data in parallel to speed up the process. Because the First
and Last
pixel-selection methods should return as soon as the tile is totally filled, we implemented a partial multi-threading approach by processing chunks
of assets in parallel instead of the full list (see code). This is particularly handy when the list of assets is long.
Usage
The mosaic tile handler is designed to roughly match rio-tiler’s tile handlers and returns tile and mask arrays. Here is an example of generating a mosaic tile.
from rio_tiler.main import tile as cog_tiler
from rio_tiler_mosaic import mosaic
assets = ["myfile1.tif", "myfile2.tif"]
tile, mask = mosaic.mosaic_tiler(
assets, # List of image assets
x, # Mercator tile X index
y, # Mercator tile Y index
z, # Mercator tile Z index
cog_tiler, # rio-tiler's tiler
pixel_selection="first". # Pixel selection method
)
We have published a first beta version of rio-tiler-mosaic on Pypi and the source code is available on Github.
mosaicJSON
In addition to rio-tiler-mosaic
, today we are releasing a specification for representing Web Mercator mosaics constructed from multiple observations. The mosaicJSON specification can be seen as the GDAL VirtualRaster ( VRT), but for indexing files by Web Mercator quadkeys. Quadkeys (a contraction of quadtree and key) are "one-dimensional strings" representing unique Z-X-Y (level-row-column) Web Mercator map tiles.
COG footprint and Quadkey index
The goal of the metadata specification is to provide a simple spatial index linking the COGs to the XYZ Web Mercator map tile to render. Note that in the rio-tiler-mosaic
examples above, we assume that the list of input image assets for a given tile is already "known": the mosaicJSON file can provide that input.
The most important requirement is that each quadkey has a zoom level equal to the mosaicJSON minzoom. We use this to calculate the parent tile for a given input tile between the mosaic’s minzoom and maxzoom.
While this is still a Work in Progress the main features are:
-
quadkey based file index
-
simple JSON format (enabling high ratio compression)
Specification
{
// REQUIRED. A semver.org style version number. Describes the version of
// the MosaicJSON spec that is implemented by this JSON object.
"mosaicjson": "0.0.1",
// OPTIONAL. Default: null. A name describing the tileset. The name can
// contain any legal character. Implementations SHOULD NOT interpret the
// name as HTML.
"name": "compositing",
// OPTIONAL. Default: null. A text description of the tileset. The
// description can contain any legal character. Implementations SHOULD NOT
// interpret the description as HTML.
"description": "A simple, light grey world.",
// OPTIONAL. Default: "1.0.0". A semver.org style version number. When
// changes across files are introduced, the minor version MUST change.
// This may lead to cut off labels. Therefore, implementors can decide to
// clean their cache when the minor version changes. Changes to the patch
// level MUST only have changes to tiles that are contained within one tile.
// When tiles change significantly, the major version MUST be increased.
// Implementations MUST NOT use tiles with different major versions.
"version": "1.0.0",
// OPTIONAL. Default: null. Contains an attribution to be displayed
// when the map is shown to a user. Implementations MAY decide to treat this
// as HTML or literal text. For security reasons, make absolutely sure that
// this field can't be abused as a vector for XSS or beacon tracking.
"attribution": "<a href='http://openstreetmap.org'>OSM contributors</a>",
// REQUIRED.
// An integer specifying the minimum zoom level.
"minzoom": 0,
// REQUIRED.
// An integer specifying the maximum zoom level. MUST be >= minzoom.
"maxzoom": 11,
// OPTIONAL. Default: [-180, -90, 180, 90].
// The maximum extent of available map tiles. Bounds MUST define an area
// covered by all zoom levels. The bounds are represented in WGS:84
// latitude and longitude values, in the order left, bottom, right, top.
// Values may be integers or floating point numbers.
"bounds": [ -180, -85.05112877980659, 180, 85.0511287798066 ],
// OPTIONAL. Default: null.
// The first value is the longitude, the second is latitude (both in
// WGS:84 values), the third value is the zoom level as an integer.
// Longitude and latitude MUST be within the specified bounds.
// The zoom level MUST be between minzoom and maxzoom.
// Implementations can use this value to set the default location. If the
// value is null, implementations may use their own algorithm for
// determining a default location.
"center": [ -76.275329586789, 39.153492567373, 8 ]
// REQUIRED. A dictionary of per quadkey dataset in form of {quadkeys: [datasets]} pairs.
// Keys MUST be valid quadkeys index with zoom level equal to mosaic `minzoom`.
// Values MUST be arrays of strings (url or sceneid) pointing to a
// Cloud Optimized dataset with bounds intersecting with the quadkey bounds.
"tiles": {
"030130": [
"s3://my-bucket/dir/file1.tif",
"s3://my-bucket/dir/file2.tif",
]
}
}
A complete example of a mosaic definition based on mosaicJSON specification can be found here .
Example of implementation
Here is a simple implementation of mosaic tiling using the specification. On each tiler
's call, our handler ( my_mosaic_handler
) function fetches the mosaic file and looks for the assets indexed by the Web Mercator parent tile (at minzoom level) for the input XYZ tile. We then use the rio-tiler-mosaic
to combine the different image assets into one tile.
"""mosaic handler: Fetch mosaicJSON definition and get asset list."""
import mercantile
from rio_tiler.main import tile as cogeo_tiler
from rio_tiler_mosaic.mosaic import mosaic_tiler
def my_mosaic_handler(z, x, y, mosaic_url):
"""Get assets and pass them to the mosaic_tiler."""
mercator_tile = mercantile.Tile(x=x, y=y, z=z)
mosaic_def = fetch_mosaic_definition(mosaic_url)
quadkey_zoom = mosaic_def["minzoom"]
# Get the parent mercator tile for the given zxy tile
# Simple mosaic definition where tile zoom level is >= to the quadkey zoom level
if mercator_tile.z > quadkey_zoom:
depth = mercator_tile.z - quadkey_zoom
for i in range(depth):
mercator_tile = mercantile.parent(mercator_tile)
# Convert XYZ tile to quadkey
quadkey = mercantile.quadkey(*mercator_tile)
assets = mosaic_def["tiles"].get(quadkey)
# Pass assets to the mosaic tiler
tile, mask = mosaic_tiler(assets, x, y, z, cogeo_tiler)
Creating a mosaicJSON definition
Now that we know how to use it, let’s create a new mosaicJSON definition. Let’s say we have 28 files (e.g. Facebook population density) covering almost the whole continent of Africa.
High-resolution population density maps COGs from Facebook AI (link)
Quadkey base zoom (or mosaic min-zoom):
To construct the spatial index we need to define a minimal set of quadkeys. By definition, all COGs intersect with the tile 0-0-0
(zoom 0) but we can do a little math to minimize the set further. Inspecting the input TIFs, we see that the native resolution is around 0.00027 degrees with 8 levels of overviews. The native resolution corresponds to Web Mercator zoom level 12 so we can guess that we should start tiling at minzoom = 4
(12 - 8 = 4). We'll use this as the base zoom for quadkey keys.
`# Create Footprint`
$ parallel -j4 rio bounds ::: $(aws s3 ls opendata.remotepixel.ca/facebook/ --recursive | grep ".tif" | awk '{print "s3://opendata.remotepixel.ca/"$NF}') | jq -c '.features[0]' | fio collect > facebook.geojson
# Find ourZoom 4 quadkeys(there are 13)
$ cat facebook.geojson | supermercado burn 4 | mercantile quadkey | paste -s -d"," -
0330,0331,1220,1221,0333,1222,1223,3000,3001,3010,3002,3003,3012
Zoom 4 Mercator tiles intersecting with the facebook population dataset.
The mosaic definition is then built by finding the COGs intersecting with each of the 13 zoom 4 tiles:
"""Create mosaicJSON definition file."""
from concurrent import futures
import mercantile
from shapely.geometry import shape, box
from shapely.ops import cascaded_union
from supermercado import burntiles, uniontiles
import rasterio
from rasterio.warp import transform_bounds
max_zoom = 12
min_zoom = 4
dataset_list = ["my-file1.tif", ...] # Facebook dataset
def _get_bounds(asset):
"""Use Rasterio to get the footprint of the dataset."""
with rasterio.open(asset) as src_dst:
bounds = transform_bounds(*[src_dst.crs, "epsg:4326"] + list(src_dst.bounds), densify_pts=21)
return {
"geometry": {"type": "Polygon", "coordinates": [[[bounds[0], bounds[3]], [bounds[0], bounds[1]], [bounds[2], bounds[1]], [bounds[2], bounds[3]], [bounds[0], bounds[3]]]]},
"properties": {"path": src_path, "bounds": bounds,},
"type": "Feature",
}
# Get Bounds geometry for all files
with futures.ThreadPoolExecutor() as executor:
results = list(executor.map(_get_bounds, dataset_list))
dataset = [{"path": f["properties"]["path"], "geometry": shape(f["geometry"])} for f in filter(None, results)]
# find mercator tiles intersecting with all the footprint at min_zoom (quadkey tiles)
tiles = burntiles.burn(dataset, min_zoom)
tiles = ["{2}-{0}-{1}".format(*tile.tolist()) for tile in tiles]
# Merge the tile geometries to get the bounds and center
collection = cascaded_union([shape(f["geometry"]) for f in uniontiles.union(tiles, True)])
mosaic_definition = dict(
minzoom=min_zoom,
maxzoom=max_zoom,
bounds=list(collection.bounds),
center=[collection.centroid.x, collection.centroid.y],
tiles={},
)
for parent in tiles:
z, x, y = list(map(int, parent.split("-")))
parent = mercantile.Tile(x=x, y=y, z=z)
quadkey = mercantile.quadkey(*parent)
tile_geometry = box(*mercantile.bounds(parent))
# Find interesecting COG with parent tile
fdataset = list(filter(lambda x: tile_geometry.intersects(x["geometry"]), dataset))
if len(fdataset):
mosaic_definition["tiles"][quad] = [f["path"] for f in fdataset]
cogeo-mosaic: a CLI and a Serverless stack to create and use mosaicJSON
https://github.com/developmentseed/cogeo-mosaic
Wrapping up this new specification and the new rio-tiler-mosaic plugin we are also releasing cogeo-mosaic , a Serverless stack (based on AWS Lambda) to create and use the mosaicJSON specification. You can also install cogeo-mosaic locally and use the built-in CLI to create mosaicJSON from a set of files.
`$ pip install https://github.com/developmentseed/cogeo-mosaic`
$ cogeo-mosaic create my_list_of_files.txt -o mosaic.json
Is mosaicJSON appropriate for all mosaic map tile problems?
Obviously not, but we have been testing this solution on different projects and we find it simple and fast in most cases. That said, here are the pro/cons:
Pro
- Less filesto manage: In the case of the Facebook dataset, we ended up having 29 files stored in the cloud instead of ~600 000 if we created all the Mercator tiles image files.
`# Get number of tiles from zoom 4 to zoom 12 using [https://github.com/mapbox/supermercado/pull/26`](https://github.com/mapbox/supermercado/pull/26)
$ cat facebook.geojson | supermercado burn 4..12 | sort | uniq | wc -l
613 456
-
Flexibility: When we create the tile, we have the ability to decide whichpixel should have priority and be rendered on top.
-
mosaicJSON files can be relatively small (especiallyif using compression) and can be cached to increase performance.
Cons
-
Because we need a dynamic tiler, creating a tile on the fly will always be slightly slower than having the tile already ready to be served to the client.
-
Tile creation can take several seconds when using darkest/brightest methods (becauseit has to read all the COGs).
-
Only supports theWeb Mercator projection (as for rio-tiler).
-
Generating mosaicJSON files can be slowfor large areas with many images.
Community
Both the mosaicJSON specification and rio-tiler-mosaic are published on Github and we welcome any feedback and/or contributors. Please feel free to ping me on Twitter @VincentS if you have questions or want to hear more about the work we are doing to make open data more open and easier to use.
Demo
We built a simple demo page where you can explore some mosaicJSON examples and even share your own: https://bl.ocks.org/vincentsarago/raw/815884188c243b636ab8d927d8942a4d/
Hundreds of mercator tiles created dynamically based on mosaicJSON definition.
What we're doing.
Latest