Label Maker Documentation¶
Label Maker generates training data for ML algorithms focused on overhead imagery (e.g., from satellites or drones). It downloads OpenStreetMap QA Tile information and overhead imagery tiles and saves them as an Numpy .npz file for easy use in ML pipelines. For more details, see the inaugural blog post.
version: 0.9.0
Requirements¶
Standard pip install¶
pip install label-maker
Note
Label Maker requires tippecanoe
to be available from your command-line. Confirm this before proceeding.
Configuration¶
Before you can use Label Maker, you must specify inputs to the data-creation process within config.json
file. Below is a simple example. To see the complete list of parameters and options for imagery access, check out the parameters page.
{
"country": "togo",
"bounding_box": [1.09725, 6.05520, 1.34582, 6.30915],
"zoom": 12,
"classes": [
{ "name": "Roads", "filter": ["has", "highway"] },
{ "name": "Buildings", "filter": ["has", "building"] }
],
"imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=ACCESS_TOKEN",
"background_ratio": 1,
"ml_type": "classification"
}
Before using this configuration, make sure to replace ACCESS_TOKEN
with your Mapbox Access Token
Command line interface (CLI)¶
Label Maker is most easily used as a command line tool. There are five commands documented below. You should run them in order as each operation builds on the previous one and commands accept two flags:
-d
or--dest
: string- Directory for storing output files. Defaults to
'./data'
-c
or--config
: string- Location of
config.json
file. Defaults to'./config.json'
CLI Step 1: download¶
Download and unzip OSM QA tiles containing feature information.
$ label-maker download
Saving QA tiles to data/ghana.mbtiles
100% 18.6 MiB 1.8 MiB/s 0:00:00 ETA
CLI Step 2: labels¶
Retiles the OSM data to the desired zoom level, creates label data (labels.npz
), calculates class statistics, creates visual label files (either GeoJSON or PNG files depending upon ml_type
). Requires the mbtiles file from the label-maker download
step.
Accepts one additional flag:
-s
or--sparse
: boolean- Specifies if features in the class of interest are sparse. If
True
, only save labels for up ton
background tiles, wheren
is equal tobackground_ratio
times the number of tiles with a class label. Defaults toFalse
.
$ label-maker labels
Determining labels for each tile
---
Residential: 638 tiles
Total tiles: 1189
Write out labels to data/labels.npz
CLI Step 3: preview (optional)¶
Downloads example overhead images for each class. Requires the labels.npz
file from the label-maker labels
step.
Accepts one additional flag:
-n
or--number
: int- Specifies the number of examples images to create per class. Defaults to
5
.
$ label-maker preview -n 10
Writing example images to data/examples
Downloading 10 tiles for class Residential
CLI Step 4: images¶
Downloads all imagery tiles needed to create the training data. Requires the labels.npz
file from the label-maker labels
step.
The number of background tiles added depends on the background_ratio parameter specified in the config.json file.
A background_ratio of 0 will return no background tiles.
$ label-maker images
Downloading 1189 tiles to data/tiles
CLI Step 5: package¶
Bundles the images and OSM labels to create a final data.npz
file. Requires the labels.npz
file from the label-maker labels
step and downloaded image tiles from the label-maker images
step.
$ label-maker package
Saving packaged file to data/data.npz
Using the packaged data¶
Once you have a create data.npz
file using the above commands, you can use numpy.load to load it. For example, you can supply the created data to a Keras Model
as follows:
# Load the data, shuffled and split between train and test sets
npz = np.load('data.npz')
x_train = npz['x_train']
y_train = npz['y_train']
x_test = npz['x_test']
y_test = npz['y_test']
# Define your model here, example usage in Keras
model = Sequential()
# ...
model.compile(...)
# Train
model.fit(x_train, y_train, batch_size=16, epochs=50)
model.evaluate(x_test, y_test, batch_size=16)
For more detailed walkthroughs, see the examples page.
Acknowledgements¶
This library builds on the concepts of skynet-data. It wouldn’t be possible without the excellent data from OpenStreetMap and Mapbox under the following licenses:
- OSM QA tile data copyright OpenStreetMap contributors and licensed under ODbL.
- Mapbox Satellite data can be traced for noncommercial purposes.
- Marc Farra’s tilepie to asynchronously process vector tiles.