Tile Generation Benchmarks for COGs with GDAL Environment Variables

Explanation

In this notebook we demonstrate the importance of using GDAL environment variables when working with rasterio to read data from Cloud-Optimized GeoTIFFs. titiler-pgstac creates image tiles using rio-tiler which uses rasterio.

The GDAL variables set for optimizing titiler performance are documented in titiler documentation. The documentation is copied into cog_tile_test.py where the GDAL variables are set and unset before tests for ease of reference.

We run mulitiple iterations of generating tiles for various zooms for tiling when GDAL environment variables are set and when they are unset and display the results.

Setup

# External modules
import hvplot.pandas
import holoviews as hv
import json
import pandas as pd
pd.options.plotting.backend = 'holoviews'
import warnings
warnings.filterwarnings('ignore')

# Local modules
import sys; sys.path.append('..')
from cog_tile_test import CogTileTest
import helpers.dataframe as dataframe_helpers
import helpers.eodc_hub_role as eodc_hub_role
credentials = eodc_hub_role.fetch_and_set_credentials()
# Run 3 iterations of each setting
iterations = 5
zooms = range(6)
dataset_id, dataset = list(json.loads(open('../01-generate-datasets/cmip6-pgstac/cog-datasets.json').read()).items())[0]
dataset
{'example_query': {'collections': ['CMIP6_daily_GISS-E2-1-G_tas'],
  'filter': {'op': 't_intersects',
   'args': [{'property': 'datetime'}, {'interval': ['1950-04-01T00:00:00Z']}]},
  'filter-lang': 'cql2-json'}}

Run tests

shared_args = {
    'dataset_id': dataset_id,
    'lat_extent': [-59, 89],
    'lon_extent': [-179, 179],
    'extra_args': {
        'query': dataset['example_query'],
        'credentials': credentials
    }
}
# Create a test with gdal vars unset
shared_args['extra_args']['set_gdal_vars'] = False
cog_tile_test_unset = CogTileTest(**shared_args)
Caught exception: An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 35.93.112.139/32, TCP, from port: 5432, to port: 5432, ALLOW" already exists
Connected to database
for zoom in zooms:
    cog_tile_test_unset.run_batch({'zoom': zoom}, batch_size=iterations)
unset_results = cog_tile_test_unset.store_results(credentials)
Wrote instance data to s3://nasa-eodc-data-store/test-results/20230907005511_CogTileTest_CMIP6_daily_GISS-E2-1-G_tas.json
# Create a test with gdal vars SET
shared_args['extra_args']['set_gdal_vars'] = True
cog_tile_test_set = CogTileTest(**shared_args)
Caught exception: An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 35.93.112.139/32, TCP, from port: 5432, to port: 5432, ALLOW" already exists
Connected to database
for zoom in zooms:
    cog_tile_test_set.run_batch({'zoom': zoom}, batch_size=iterations)

set_results = cog_tile_test_set.store_results(credentials)
Wrote instance data to s3://nasa-eodc-data-store/test-results/20230907005649_CogTileTest_CMIP6_daily_GISS-E2-1-G_tas.json

Read + Plot results

results_urls = [unset_results, set_results]
results_df = dataframe_helpers.load_all_into_dataframe(credentials, results_urls)
expanded_df = dataframe_helpers.expand_timings(results_df)
expanded_df['set_gdal_vars'] = expanded_df['set_gdal_vars'].astype(str)
cmap = ["#E1BE6A", "#40B0A6"]
plt_opts = {"width": 300, "height": 250}

plts = []

for zoom_level in zooms:
    df_level = expanded_df[expanded_df["zoom"] == zoom_level]
    plts.append(
        expanded_df.hvplot.box(
            y="time",
            by=["set_gdal_vars"],
            c="set_gdal_vars",
            cmap=cmap,
            ylabel="Time to render (ms)",
            xlabel="GDAL Environment Variables Set/Unset",
            legend=False,
        ).opts(**plt_opts)
    )
hv.Layout(plts).cols(2)
expanded_df.to_csv('results-csvs/01-cog-gdal-results.csv')