Geospatial reprojection in Python

Work-in-progress guidebook and profiling results (Sept. 2024).

Authors + Credits: Max Jones, Optimized Data Delivery team (especially Aimee), Pangeo Community (especially Justus and Michael)

A bit of background about me and this work

Caveats

  • Work in progress!
  • Recording will quickly become out-of-date
  • Verify/fix code before use

Definitions

Reprojection - changing the projection of a dataset from one coordinate reference system (CRS) to another

Definitions

Resampling/regridding - changing the grid structure (often resolution)

Definitions

Warp resampling - changing the resolution and projection of a dataset

Grid structures

  • Rectilinear - described by one-dimensional latitude and longitude coordinates
    • Regular - described by one x,y coordinate and the resolution
  • Curvilinear - described by two-dimensional latitude and longitude coordinates
  • Unstructured - Grids in which the grid coordinates require a list of nodes

Resampling algorithms

  • Nearest neighbor
  • Bilinear
  • Cubic
  • Spline
  • Inverse distance
  • Bucket / binning (average, min, max, mode, med, quartile, sum, rms)
  • Spectral
  • Triangulation
  • Conservative

Some of the many reasons to warp resample

Co-registering datasets

  • Mosaicing
  • Statistical analyses
  • Machine learning

Visualization

  • Rendering (minimize distortion)
  • Building overviews

Observations and opinions

  • Lots of kernels were killed in the making of this presentation
    • we need a demo using a bounded-memory approach (Cubed!)
  • There are some awesome data cube libraries in Python
    • let’s work with the developers to make them even better…and not build another one
  • Xarray’s data model is intuitive for a lot of people
    • use accessors to extend it’s functionality rather than a new data class

What’s next for the guide

  • Try caching weights
  • Small tile from a large dataset
  • Add information about grid structures supported
  • Add information about resampling methods supported
  • Test with virtualized data
  • Test with cloud optimized data
  • Test with other resampling algorithms

Thanks

  • Development Seed
  • Pangeo Community
    • special thanks to Justus, Michael, and Deepak
  • NASA IMPACT

What’s next for resampling in Python

Let’s discuss!