Mapnik Caching Using Squid

Blog

Estimated
2 min read

Rendering custom maps on our mapping servers (powered by “Mapnik”:http://mapnik.org/) with large shape files is processor intense, and thousands of requests per day lead to a serious concern as well as a clear solution: caching. We’ve now replaced all our in-house caching solutions with “Squid”:http://www.squid-cache.org/.

Squid is, technically, a caching proxy. It’s an in-between layer which can accelerate, limit, and sometimes even modify content like web pages and images. We use Squid as a caching reverse proxy — it is a layer between the user and the actual map server. The setup is very simple. Instead of directly serving requests on the standard port 80, Apache 2 serves requests on port 8080. When users visit the server, their web browser requests a page from port 80, and instead of hitting Apache, they hit Squid. Squid then checks whether it has a cached version of the page they requested, and if not, it reaches back to port 8080 and requests that page from Apache. In our setup, the maps take a very long time to expire so after a single request, the map will be in Squid’s cache for the foreseeable future.

In order to make Squid and Mapnik’s WMS server cooperate, Mapnik must send proper HTTP headers. This is because Squid ideally will not contain the logic for how long to cache a certain piece of content, but rather the application will indicate this value and it will be useful for both a proxy cache and for the end user’s browser. These changes are now in Mapnik’s core and included in recent releases. (As a sidenote, Four Kitchens’ “Pressflow”:http://fourkitchens.com/pressflow-makes-drupal-scale is a Drupal branch that’s compatible with Squid and Varnish.)

!http://farm4.static.flickr.com/3653/3463861424_50062b5c55.jpg?v=0!

There is a big advantage in using Squid and other reverse proxies over hacked-in caching. Squid has been developed and debugged for years and counts thousands of users, so there is a large community (and a few commercial services like “The Measurement Factory”:http://www.measurement-factory.com/ and “Henrik Nordstrom Consulting”:http://www.henriknordstrom.net/) to fall back on. Possibly an even greater benefit is that by removing caching from the actual application server — whether it’s Mapnik, Drupal or Django — you can scale that cache more cleanly. For example, we could deploy four Squid servers collaboratively handling the cache of a single map server, or swap out Squid for Varnish, or even replace it with a basic CDN, without significantly changing the application logic.

What we're doing.

Latest