Using infrastructure insights from OpenStreetMap for validating ML


3 min read

Our Data Team regularly maps infrastructure at a country-wide scale, as in previous projects to build a comprehensive map of schools in Colombia or identify high voltage towers across Nigeria, Pakistan, and Zambia. We’re advocates of an AI-assisted mapping approach that combines machine learning (ML) processes and our team of human mappers to quickly produce highly accurate data. However, working at the country-level (with tens of millions of satellite images) means our ML models are bound to make some mistakes. To deal with these errors, we built, osm-coverage-tiles, which helps derive insights pertaining to infrastructure from OpenStreetMap without the burden of having to manually verify every ML prediction before it’s mapped. osm-coverage-tiles gathers aggregate information about infrastructure (specifically, buildings and roads) from OpenStreetMap and helps prioritize how and where to validate ML predictions. It prepares a tileset that can be visualized in a review application (like JOSM) or paired with a utility to filter out predicted tiles before manual review.

Many points of interest (POIs) are located in populated areas, which implies that there should be a road network and other infrastructure nearby. A school, for example, is likely to be accessible by at least one road and near a group of buildings. Therefore, we can assume that school detections inaccessible by roads and far away from other settlements is likely a false positive prediction. We built osm-coverage-tiles as a means to manipulate the ML predictions by adding a rule-based contextual layer, which removes POIs that are not close to roads. In this case, we looked at whether a satellite image contained some feature of interest, like a school. This layer then acts as a filter to remove predicted POIs that are far from both roads and settlements. Using this tool, it’s possible to greatly reduce false positive predictions and save time during manual validation steps. This is crucial when working at country-wide scales where checking ML predictions might require days or weeks of effort.

Tiles containing roads and buildings in OpenStreetMap data (at zoom 16) in Colombia.Tiles containing roads and buildings in OpenStreetMap data (at zoom 16) in Colombia.

osm-coverage-tiles is flexible enough to generate a coverage layer at different zoom levels. This is an important feature because it allows a user to tune the filter to their needs. Using lower zoom levels (and low spatial resolution) sets the filter to be relatively loose. At higher zoom levels (and high spatial resolution), we require that roads and buildings be relatively close to a proposed POI. We found it useful to test out different zooms like 14, 15, 16, etc., and quantitatively assess which level best filters out incorrect ML predictions.

The tool is open source and you can find a use case here. We want to hear from you if this is useful — connect with me on Twitter or GitHub! We hope to publish coverage tiles regularly in future blogs for different areas of the world.

What we're doing.