from plotting import (
plot_duration_by_weboptimization,
plot_memory,
plot_memory_by_weboptimization,
plot_time,
plot_time_by_format,
)Summary
Implications for future development
- Virtualizing archival file formats greatly improves performance relative to archival file readers such as h5netcdf and motivates the generation of virtual references whenever possible.
- The Web-Optimized Zarr example shows the potential for Zarr overviews to enable highly performant visualization and motivates the development of the GeoZarr and multi-scales Zarr specifications.
- Pyinstrument showed a significant fraction of the total time when resampling Web-Optimized Zarr using rioxarray went towards Xarray importing Pandas and guessing the chunk manager. Both of these components could be improved or removed through future development.
- The dramatic difference between using XESMF with and without pre-generated weights raises the question of whether similar relative performance improvements could be gained by pre-generating weights for reprojection with GDAL. Given that pyinstrument shows only ~1/4 of the time is spent on the actual resampling operation when using COGs, building specifications for web-optimizing Zarr (i.e., GeoZarr and multi-scales), virtualizing existing datasets, and reducing import times would likely be much simpler and more fruitful activities.
Summary figures
Summary figures for comparing resampling methods
# Plot time required for resampling GPM IMERG
gpm_imerg_local = plot_time("gpm_imerg", local=True, format="netcdf")
gpm_imerg_remote = plot_time("gpm_imerg", local=False, format="netcdf")
(gpm_imerg_local + gpm_imerg_remote).cols(1)# Plot time required for resampling MUR SST
mur_sst_local = plot_time("mursst", local=True, format="netcdf")
mur_sst_remote = plot_time("mursst", local=False, format="netcdf")
(mur_sst_local + mur_sst_remote).cols(1)# Plot memory required for resampling GPM IMERG
gpm_imerg_local = plot_memory("gpm_imerg", local=True, format="netcdf")
gpm_imerg_remote = plot_memory("gpm_imerg", local=False, format="netcdf")
(gpm_imerg_local + gpm_imerg_remote).cols(1)# Plot memory required for resampling MUR SST
mur_sst_local = plot_memory("mursst", local=True, format="netcdf")
mur_sst_remote = plot_memory("mursst", local=False, format="netcdf")
(mur_sst_local + mur_sst_remote).cols(1)Summary figures for comparing storage formats and I/O libraries
plot_time_by_format("mursst")plot_time_by_format("gpm_imerg")Summary figures for exploring web-optimization
plot_duration_by_weboptimization()plot_memory_by_weboptimization()