Two weeks ago we launched a new version of Landsat-util, v0.5, our utility for searching and processing Landsat satellite imagery. This version is now faster than it was before. To do this we rewrote most of the internals to use flexible python frameworks.

Below is a deep dive into how we’ve improved Landsat-util to make it faster and easier to use.

Dependency Hell

Landsat-util downloads Landsat files, pulls out the individual bands representing wavelengths of light, corrects contrast, warps them and combines them to make a colored image you can add on a web map.

Initially, landsat-util was written as a command line wrapper to existing pipelines:

  1. Scale bands with gdal-translate
  2. Warp with gdal-warp to the correct projection
  3. Combine bands with ImageMagick
  4. Contrast correct with OpenCV
  5. Pansharpen with orfeoToolbox
  6. Gamma correct with ImageMagick
  7. Add geographic information back with gdal_edit

We needed a lot of dependencies to process an image, and they’re not particularly optimized for scripting. ImageMagick, GDAL, orfeoToolbox and openCV are monolithic frameworks that don’t allow for cherry-picking functions. Installing all these frameworks can be a painful experience.

This toolchain combination is also tedious because it creates inherent bottlenecks. To communicate between all tools, we need to write temporary files to disk and read them back in at every step.

Enter rasterio

Rasterio is a great python library written by Sean Gillies at Mapbox to work with raster data. It wraps around gdal and abstracts the band data as Numpy arrays.

By standardizing the input, rasterio allows us to minimize our dependencies and use fast, in-memory, scientific libraries like scikit-image.

Here’s the guts of our new process:

Read in bands with rasterio

with rasterio.drivers():
    with rasterio.open('LC82040522013123LGN01_B4.TIF') as band4:
        with rasterio.open('LC82040522013123LGN01_B3.TIF') as band3:
            with rasterio.open('LC82040522013123LGN01_B2.TIF') as band2:
                with rasterio.open('LC82040522013123LGN01_B8.TIF') as band8:
                    band4_s = band4.read_band(1)
                    band3_s = band3.read_band(1)
                    band2_s = band2.read_band(1)
                    band8_s = band8.read_band(1)

Scale bands with scikit

from skimage import transform as sktransform

band4_s = sktransform.rescale(band4_s, 2)
band3_s = sktransform.rescale(band3_s, 2)
band2_s = sktransform.rescale(band2_s, 2)

Warp with rasterio

for color, band in zip([r,g,b,b8], [band4_s, band3_s, band2_s, band8_s]):
    reproject(band, color,
        src_transform = src.transform,
        src_crs = src.crs,
        dst_transform = dst_transform,
        dst_crs = dst_crs,
        resampling = RESAMPLING.nearest)

Pansharpen using numpy operations

m = r + b + g
pan = 1/m * b8
r = r * pan
b = b * pan
g = g * pan

Contrast-correct and gamma correct using scikit-image

# using CLAHE
from skimage.exposure import equalize_adapthist
for band in [r,g,b]:
    band = equalize_adapthist(band, clip_limit=0.02)

Write to disk using rasterio

with rasterio.open(
    tiffname,'w', driver='GTiff',
    width=dst_shape[1],height=dst_shape[0],
    count=3,dtype=numpy.uint8,
    nodata=0,
    transform=dst_transform,
    photometric='RGB',
    crs=dst_crs) as dst:
    for k, arr in [(1, r), (2, g), (3, b)]:
        dst.write_band(k, arr)

The toolchain consists of only python libraries, and no other dependencies. Rasterio inherently supports GeoTiff so we don’t lose geo-information along the way.

Landsat-util is open source, and we encourage developers to improve on our process. Fork our repo!

How fast?

By using rasterio, numpy and scikit, we eliminate the disk bottleneck, and we regain transparent control over the pipeline.

We ran tests on a couple of AWS machines. Each test was conducted 5 times and the resulting times were averaged.

  • R3.large: 2-core 2.5 GHz Intel Xeon (Ivy Bridge) with 15GB RAM and 32GB SSD.
  • C4.2xlarge: 8-core 2.9GHz Intel Xeon (Haswell) with 15GB RAM and 32GB SSD.

Results

instance type process type old-landsat-util new-landsat-util speedup
R3.large non-pansharpened 252.7s 122.05s 2x
R3.large pansharpened 846.9s 349.17s 2.4x
C4.2xlarge non-pansharpened 216.6s 106.67s 2x
C4.2xlarge pansharpened 438s 290s 1.5x

On all but the largest machine, the new landsat-util is at least twice as fast as previously.

And it gets better: a significant amount of time (about a minute on average) is used up to decompress the NASA bundle after downloading it. Landsat-util v0.5 takes advantage of AWS’s new landsat archive of unzipped bands, saving even more time.

We hope you enjoy the new landsat-util! Fork it, modify it, break it or just use it and tell us about all the ways you’re incorporating satellite data in your apps.

We're a team of engineers and designers working on big projects. We believe in open source and in building for lasting impact.

Learn more