Over the weekend over a million people “came together around the world to Stand Up and Take Action against poverty and in support of the Millennium Development Goals”:http://www.developmentseed.org/blog/2008/oct/22/united-nations-uses-drupal-huge-anti-poverty-event. The organizers of all these participants used the “Stand Against Poverty”:http://standagainstpoverty.org/ Drupal website to post information about their events, report the number of participants, and share photos. As you can imagine, a site like this needs to be prepared to weather a storm of traffic, specifically, millions of hits on the web servers and mapping servers — which generate the beautiful pin-pointed maps of the events — that power the site. For this particular situation, we had to deal with an infrastructure that could not be significantly changed due to time restrictions and find a way to keep the website up and running. We achieved this with the “Boost module”:http://drupal.org/project/boost and some fine tuning.
So what does the infrastructure look like?
- Four load balanced Apache web servers
- A single database server, well equipped
With four Apache web servers, a single database server, Drupal, and millions of hits, we obviously wanted to minimize the stress on the database as well as conserve memory on the Apache heads. There are several steps needed to do this, but there is also one enormously helpful Drupal module, called Boost, which essentially takes PHP, MySQL, and Drupal out of the picture for all anonymous visitor traffic. I’ll get into more about Boost, but first it’s important to understand the two basic kinds of “users” on a Drupal site.
For “StandAgainstPoverty.org”:http://standagainstpoverty.org/, thousands of event organizers went to the site to publish events, report event attendee numbers, and share photos. Therefore, a lot of traffic was created by “authenticated users” — someone who has created an account and signs in to the website. Everyone else who visited the site and didn’t login is an “anonymous users.” There’s a big difference in how these users are handled in Drupal. For authenticated users, their page requests cannot be cached, and rather only things like css and images can be. This means that every page load for an authenticated user is a full load of the Drupal stack, including parsing PHP and running lots of queries on the database, which takes a big toll on web servers. For anonymous users, a lot more can be cached, and “StandAgainstPoverty.org”:http://standagainstpoverty.org/ is a site that gets a lot of both kinds of traffic.
Back to the Boost module. What Boost does is make a cached HTML file of nearly all pages on a Drupal site, saves them to the web directory, and serves them to anonymous users. A set of Apache .htaccess rules allows for PHP, MySQL, and Drupal to not be summoned at all for nearly all anonymous user hits on the website. If you consider that a typical Apache memory footprint attributed to a Drupal bootstrap is around 30 to 40 megs of memory, and compare it to the memory footprint of a regular Apache request for a static file, the latter is just a tiny fraction of the amount of memory. Once a page has been cached by the Boost module, no database queries are made, no PHP is parsed, and Drupal is completely averted. Here’s a look at how this works:
Everyone that came to the site likely hit the front page as an anonymous user. Since Boost is installed, all these requests will get served by Apache only, and not load Drupal at all. The stress on the database server is immediately reduced. Then for any users that browse the site as anonymous users, all these hits will just be requests for static HTML files created by Boost and served solely by Apache after they’ve been cached once. Boost even works with a site running i18n — the module that allows for multilingual support in Drupal 5 — but you should note that there is an “important patch”:http://drupal.org/node/196266 that must be applied to Boost to make it work with a multilingual site. Once this is done, everything works well. The “StandAgainstPoverty.org”:http://standagainstpoverty.org/ is in five languages, and with Boost it was able to handle the site traffic and perform well.
Boost is not the only step we took to improve the scalability and performance of the website, but it did a significant amount of the work for us. Combined with regular slow query optimization, tuning of the database and apache servers, and appropriate hardware from the get-go, Boost is a great companion to help weather high traffic. “StandAgainstPoverty.org”:http://standagainstpoverty.org/ put it to the test with huge amounts of traffic all focused on just three days. I’m happy to report that the website performed great and never went down.
Arto Bendiken, the original author of Boost, and Justin Miller both have great write ups about Boost that go into more technical detail “here”:http://bendiken.net/2006/05/28/static-page-caching-for-drupal and “here”:http://codesorcery.net/2007/07/23/boost-your-drupal-site