Two weeks ago we launched a new version of Landsat-util, v0.5, our utility for searching and processing Landsat satellite imagery. This version is now faster than it was before. To do this we rewrote most of the internals to use flexible python frameworks.
Below is a deep dive into how we’ve improved Landsat-util to make it faster and easier to use.
Landsat-util downloads Landsat files, pulls out the individual bands representing wavelengths of light, corrects contrast, warps them and combines them to make a colored image you can add on a web map.
Initially, landsat-util was written as a command line wrapper to existing pipelines:
- Scale bands with
- Warp with
gdal-warpto the correct projection
- Combine bands with
- Contrast correct with
- Pansharpen with
- Gamma correct with
- Add geographic information back with
We needed a lot of dependencies to process an image, and they’re not particularly optimized for scripting. ImageMagick, GDAL, orfeoToolbox and openCV are monolithic frameworks that don’t allow for cherry-picking functions. Installing all these frameworks can be a painful experience.
This toolchain combination is also tedious because it creates inherent bottlenecks. To communicate between all tools, we need to write temporary files to disk and read them back in at every step.
By standardizing the input, rasterio allows us to minimize our dependencies and use fast, in-memory, scientific libraries like scikit-image.
Here’s the guts of our new process:
Read in bands with rasterio
with rasterio.drivers(): with rasterio.open('LC82040522013123LGN01_B4.TIF') as band4: with rasterio.open('LC82040522013123LGN01_B3.TIF') as band3: with rasterio.open('LC82040522013123LGN01_B2.TIF') as band2: with rasterio.open('LC82040522013123LGN01_B8.TIF') as band8: band4_s = band4.read_band(1) band3_s = band3.read_band(1) band2_s = band2.read_band(1) band8_s = band8.read_band(1)
Scale bands with
from skimage import transform as sktransform band4_s = sktransform.rescale(band4_s, 2) band3_s = sktransform.rescale(band3_s, 2) band2_s = sktransform.rescale(band2_s, 2)
for color, band in zip([r,g,b,b8], [band4_s, band3_s, band2_s, band8_s]): reproject(band, color, src_transform = src.transform, src_crs = src.crs, dst_transform = dst_transform, dst_crs = dst_crs, resampling = RESAMPLING.nearest)
m = r + b + g pan = 1/m * b8 r = r * pan b = b * pan g = g * pan
Contrast-correct and gamma correct using
# using CLAHE from skimage.exposure import equalize_adapthist for band in [r,g,b]: band = equalize_adapthist(band, clip_limit=0.02)
Write to disk using
with rasterio.open( tiffname,'w', driver='GTiff', width=dst_shape,height=dst_shape, count=3,dtype=numpy.uint8, nodata=0, transform=dst_transform, photometric='RGB', crs=dst_crs) as dst: for k, arr in [(1, r), (2, g), (3, b)]: dst.write_band(k, arr)
The toolchain consists of only python libraries, and no other dependencies. Rasterio inherently supports GeoTiff so we don’t lose geo-information along the way.
Landsat-util is open source, and we encourage developers to improve on our process. Fork our repo!
scikit, we eliminate the disk bottleneck, and we regain transparent control over the pipeline.
We ran tests on a couple of AWS machines. Each test was conducted 5 times and the resulting times were averaged.
- R3.large: 2-core 2.5 GHz Intel Xeon (Ivy Bridge) with 15GB RAM and 32GB SSD.
- C4.2xlarge: 8-core 2.9GHz Intel Xeon (Haswell) with 15GB RAM and 32GB SSD.
|instance type||process type||old-landsat-util||new-landsat-util||speedup|
On all but the largest machine, the new landsat-util is at least twice as fast as previously.
And it gets better: a significant amount of time (about a minute on average) is used up to decompress the NASA bundle after downloading it. Landsat-util v0.5 takes advantage of AWS’s new landsat archive of unzipped bands, saving even more time.
We hope you enjoy the new landsat-util! Fork it, modify it, break it or just use it and tell us about all the ways you’re incorporating satellite data in your apps.