When you’re keeping tabs on an event like the upcoming election in Afghanistan, a basic street map that plots news stories is quite useful. But what could you do with a map that plots those news stories over voting regions that are shaded by poverty rate, literacy rate, or another human development indicator? The effectiveness of a map increases drastically when you add specialized data to the base layer. In this case, not only would you see the hot spots of activity, you could identify possible explanations for the activity.

The maps we’re familiar with are powered by tile sets – collections containing hundreds of thousands of individually rendered images that stitch together to form a larger map view. Tile sets are useful because they allow users to pan and zoom around a map with a web browser, but creating and maintaining a tile set is challenging. Tile generation demands a considerable amount of computing power and can take days depending on the size of the region being rendered. Finished tile sets occupy many gigabytes of disk space, making storage and distribution difficult.

With the help of Amazon Web Services, we’re building an infrastructure capable of generating beautiful interactive maps quickly. We’re using four Amazon services in this workflow: SQS (job queuing), EC2 (tile generation), S3 (storage), and CloudFront (distribution). The figure below illustrates the design.

The steps in the workflow include:

  1. GIS data is collected and managed in a PostgreSQL database.
  2. Designers use Quantum GIS to create beautiful maps with the data.
  3. Quantum GIS generates a configuration file for the map, which is submitted to the SQS job queue.
  4. Render nodes running on EC2 periodically check the queue for unclaimed jobs. When a job is found, they claim it and get to work rendering tiles. More EC2 instances can be created on the fly during times of high volume or to increase speed.
  5. Finished tiles are saved to S3, and when all tiles are complete the render node marks the job completed and becomes ready for another job.
  6. After the tiles are approved, the tile set is bundled up and pushed to Amazon Cloudfront for Internet delivery.
  7. An OpenLayers client embedded in any webpage can access the finished product.

We’re working on firm benchmark numbers for tile generation time and size. An important component to this is establishing a standard base layer, because these figures depend entirely on the complexity and area of the map being rendered. We expect to have more figures to share in mid July. We’re also planning to share the AMI for our service once it is stable. It will include PostgreSQL, Mapnik, TileCache, and all of the custom glue necessary to allow anyone to start rendering tiles with custom data quickly.

For more technical information about creating queued services with AWS, refer to this excellent guide by Mitch Garnaat.