from plotting import (
plot_duration_by_weboptimization,
plot_memory,
plot_memory_by_weboptimization,
plot_time,
plot_time_by_format, )
Summary
Implications for future development
- Virtualizing archival file formats greatly improves performance relative to archival file readers such as h5netcdf and motivates the generation of virtual references whenever possible.
- The Web-Optimized Zarr example shows the potential for Zarr overviews to enable highly performant visualization and motivates the development of the GeoZarr and multi-scales Zarr specifications.
- Pyinstrument showed a significant fraction of the total time when resampling Web-Optimized Zarr using rioxarray went towards Xarray importing Pandas and guessing the chunk manager. Both of these components could be improved or removed through future development.
- The dramatic difference between using XESMF with and without pre-generated weights raises the question of whether similar relative performance improvements could be gained by pre-generating weights for reprojection with GDAL. Given that pyinstrument shows only ~1/4 of the time is spent on the actual resampling operation when using COGs, building specifications for web-optimizing Zarr (i.e., GeoZarr and multi-scales), virtualizing existing datasets, and reducing import times would likely be much simpler and more fruitful activities.
Summary figures
Summary figures for comparing resampling methods
# Plot time required for resampling GPM IMERG
= plot_time("gpm_imerg", local=True, format="netcdf")
gpm_imerg_local = plot_time("gpm_imerg", local=False, format="netcdf")
gpm_imerg_remote + gpm_imerg_remote).cols(1) (gpm_imerg_local
# Plot time required for resampling MUR SST
= plot_time("mursst", local=True, format="netcdf")
mur_sst_local = plot_time("mursst", local=False, format="netcdf")
mur_sst_remote + mur_sst_remote).cols(1) (mur_sst_local
# Plot memory required for resampling GPM IMERG
= plot_memory("gpm_imerg", local=True, format="netcdf")
gpm_imerg_local = plot_memory("gpm_imerg", local=False, format="netcdf")
gpm_imerg_remote + gpm_imerg_remote).cols(1) (gpm_imerg_local
# Plot memory required for resampling MUR SST
= plot_memory("mursst", local=True, format="netcdf")
mur_sst_local = plot_memory("mursst", local=False, format="netcdf")
mur_sst_remote + mur_sst_remote).cols(1) (mur_sst_local
Summary figures for comparing storage formats and I/O libraries
"mursst") plot_time_by_format(
"gpm_imerg") plot_time_by_format(
Summary figures for exploring web-optimization
plot_duration_by_weboptimization()
plot_memory_by_weboptimization()