OSM for Machine Learning, Research and Teaching

Projects

Client
With DevSeed Labs

Deploy OpenStreetMap tools in the cloud

Project cover image

Image by Development Seed

Overview

OSM Seed is the fastest way to deploy OpenStreetMap tools for your project using Docker and Kubernetes locally or in any cloud environment. OSM Seed is great for teaching mapping, building machine learning training datasets and geospatial data analysis.

Challenge

While OSM tools are versatile and tested for mapping at scale, setting them up is hard due to their disconnected configuration and varied packaging requirements.

Outcome

OSM Seed makes this easier by packaging the key tools in the OSM ecosystem, and allowing you to deploy an OSM system with a few Docker commands.

OpenStreetMap (OSM), the open map repository of the world, runs on a versatile ecosystem of software that empowers anyone to create geospatial data. That includes our Data Team, who use OSM software whether they are creating data destined for OSM, like mapping a new village, or creating training data for a machine learning task.

The OSM ecosystem has incredible tools and a robust workflow to create and consume geospatial data. Tools like iD, JOSM, Overpass, and Tasking Manager are critical for modern mapping workflows. This is compelling for many uses outside creating data for OSM — teaching new communities how to map and the basics of geo data, preparing GeoAI machine learning datasets, or validating model predictions, as well as running large-scale research projects focusing on map data.

OSM Seed is a way to deploy the entire OSM tools ecosystem from scratch to support any of the above uses. At its core, OSM Seed is a simple collection of Dockerfiles along with orchestration code for deploying at the stack on Kubernetes. Kubernetes allows us to easily provide an option to disable services that are not required for all users. To see the list of currently available OSM tools and get started, visit osm-seed on Github.

OSM Seed for Mapping

You may have a mapping project where using the OSM ecosystem would be optimal but the data you are creating, such as temporal datasets, don’t necessarily belong in OpenStreetMap. For these types of tasks that involve time-specific, or private datasets, OSM Seed is a great approach.

Example: Tracking the Evolution of Special Economic Zones

We used OSM Seed in partnership with the World Bank Group to monitor the growth and evolution of Special Economic Zones.

Our Data Team started with a new OSM Seed instance deployed to a private server on AWS. Working with a blank slate allowed the team to focus on a single zone at a time. Using historic high-resolution satellite imagery, the team was able to accurately track the growth and urbanization of special economic zones, culminating in a report published in conjunction with the World Bank.

Image

The Data Team mapped the Athi River export-processing zone using osm-seed and historical imagery.

OSM Seed for Research

Example: OpenHistoricalMap

OpenHistoricalMap, a freely editable map of the World across time, uses osm-seed to manage its infrastructure. With some customization to handle start and end dates, we can repurpose the entire suite of battle-tested OSM tools to build a historical map: from using Tasking Manager to coordinate mapathons to Overpass and Nominatim to query data. One could install each of these tools individually, but we’ve found osm-seed to be a convenient wrapper to manage these setups as a single unit, in a consistent way. The code to deploy Open Historical Map is on GitHub, which also serves as a real-world example of using osm-seed while making customizations to some applications.

Example: Digitizing Historic Maps

Palestine Open Maps uses osm-seed to digitize Palestine's map archive and build an immersive storytelling experience. The platform aims to bring together maps that the Survey of Palestine institution has been producing since 1945.

Osm-seed allows the Palestine Open Maps team to use georeferenced historic maps as a layer in the iD editor. With this useful context, mappers can identify features and create map data.

While this historical data is not relevant to OSM, it is immensely useful to researchers and archivists, and Palestine Open Maps includes it in a searchable database.

Image

Palestine Open Maps

Serving your own OSM map tiles

Osm-seed is more than a blank slate. You can run a version pre-populated with any amount of OpenStreetMap-flavored geospatial data, whether that's a city or planet level extract.

The project ships with osmium, an export tool that allows you to import OSM data published on planet.openstreetmap.org or Geofabrik.

Once you've got your data, osm-seed is ready by default to serve vector tiles. Osm-seed uses Tegola, a Mapbox-format vector tile server that supports PostGIS. Tegola reads directly from a replica of the database inside osm-seed, can be updated minutely, and supports custom styles.

Image

Extract of Peru served using Tegola.

OSM Seed for Teaching or Training

Teaching OpenStreetMap or training a new team using Osm-seed is a great way to tech up a new team of mappers. OpenStreetMap software is built to be intuitive. That said, the tools have a learning curve. To map complex features, it is important to have a good understanding of advanced tools like JOSM.Osm-seed is the quickest way to deploy a sandbox, so workshop attendees can train and learn advanced mapping skills without worrying about making mistakes in the core OSM dataset. After setting a few environment configurations, docker-compose build && docker-compose up is all you need to run an ecosystem of OpenStreetMap software.

Architecture

At the core, osm-seed is a collection of Dockerfiles, each containerizing a part of the OSM stack. Dockerizing the infrastructure meant that we can now deploy the stack at scale on any cloud provider that supports Kubernetes. We provide a Helm Chart to easily deploy osm-seed onto your Kubernetes cluster, with a configuration option to use the containers you need, etc. This architecture not only allows you to modify the stack according to your use case but also lets you stand up an empty stack or import data from a planet extract. As of today, osm-seed provides:

  • OpenStreetMap Rails Port - the core of OpenStreetMap.
  • A database to use with the API.
  • iD editor - when you spin up osm-seed, it already sets up iD pointed to the API.
  • Import planet extract - import the whole planet (takes time) or an extract as a starting point.
  • Replication - publish a feed of minutely, hourly or daily replication.
  • Vector Tile server - a Tegola-based vector tile server that updates based on replication.
  • Backup - automated, regular backup of the database.
  • Tasking Manager - a the tool for coordination of volunteers and organizations for mapping projects.
  • Overpass - a read-only API that serves up custom-selected parts of the OSM-seed map data
  • TagInfo - a tool that collects information about several key=value tags from planet replication

Related content

More for you

    Have a challenging project that could use our help?

    Let's connect

    We'd love to hear from you.