Development Seed

Blog

Improving Drupal's Performance with the Boost Module for the UN's Millennium Campaign

How the Boost Module Helped a Very High Traffic Drupal Site Stay Online

Over the weekend over a million people came together around the world to Stand Up and Take Action against poverty and in support of the Millennium Development Goals. The organizers of all these participants used the Stand Against Poverty Drupal website to post information about their events, report the number of participants, and share photos. As you can imagine, a site like this needs to be prepared to weather a storm of traffic, specifically, millions of hits on the web servers and mapping servers – which generate the beautiful pin-pointed maps of the events – that power the site. For this particular situation, we had to deal with an infrastructure that could not be significantly changed due to time restrictions and find a way to keep the website up and running. We achieved this with the Boost module and some fine tuning. 

So what does the infrastructure look like?  

  • Four load balanced Apache web servers
  • A single database server, well equipped

With four Apache web servers, a single database server, Drupal, and millions of hits, we obviously wanted to minimize the stress on the database as well as conserve memory on the Apache heads. There are several steps needed to do this, but there is also one enormously helpful Drupal module, called Boost, which essentially takes PHP, MySQL, and Drupal out of the picture for all anonymous visitor traffic. I’ll get into more about Boost, but first it’s important to understand the two basic kinds of “users” on a Drupal site.

For StandAgainstPoverty.org, thousands of event organizers went to the site to publish events, report event attendee numbers, and share photos. Therefore, a lot of traffic was created by “authenticated users” – someone who has created an account and signs in to the website. Everyone else who visited the site and didn’t login is an “anonymous users.” There’s a big difference in how these users are handled in Drupal. For authenticated users, their page requests cannot be cached, and rather only things like css and images can be. This means that every page load for an authenticated user is a full load of the Drupal stack, including parsing PHP and running lots of queries on the database, which takes a big toll on web servers. For anonymous users, a lot more can be cached, and StandAgainstPoverty.org is a site that gets a lot of both kinds of traffic.

Back to the Boost module. What Boost does is make a cached HTML file of nearly all pages on a Drupal site, saves them to the web directory, and serves them to anonymous users. A set of Apache .htaccess rules allows for PHP, MySQL, and Drupal to not be summoned at all for nearly all anonymous user hits on the website. If you consider that a typical Apache memory footprint attributed to a Drupal bootstrap is around 30 to 40 megs of memory, and compare it to the memory footprint of a regular Apache request for a static file, the latter is just a tiny fraction of the amount of memory. Once a page has been cached by the Boost module, no database queries are made, no PHP is parsed, and Drupal is completely averted. Here’s a look at how this works:

Everyone that came to the site likely hit the front page as an anonymous user. Since Boost is installed, all these requests will get served by Apache only, and not load Drupal at all. The stress on the database server is immediately reduced. Then for any users that browse the site as anonymous users, all these hits will just be requests for static HTML files created by Boost and served solely by Apache after they’ve been cached once. Boost even works with a site running i18n – the module that allows for multilingual support in Drupal 5 – but you should note that there is an important patch that must be applied to Boost to make it work with a multilingual site. Once this is done, everything works well. The StandAgainstPoverty.org is in five languages, and with Boost it was able to handle the site traffic and perform well.

Boost is not the only step we took to improve the scalability and performance of the website, but it did a significant amount of the work for us. Combined with regular slow query optimization, tuning of the database and apache servers, and appropriate hardware from the get-go, Boost is a great companion to help weather high traffic. StandAgainstPoverty.org put it to the test with huge amounts of traffic all focused on just three days. I’m happy to report that the website performed great and never went down.

Arto Bendiken, the original author of Boost, and Justin Miller both have great write ups about Boost that go into more technical detail here and here

motogp.com also uses a HTML cache

We reached a similar approach for the site motogp.com. We don’t use boost, but a specific module written to generate the HTML files. Also, we implemented a easy way to do master-slave for MySQL, sending SELECTs to slaves and INSERT/UPDATE to master. That allows us to have 3 slaves and a easy way to scale the service just adding more slaves.

Great Writeup.

Thanks for sharing this highly valuable information. I see that the D6 version of Boost is finally ready (should be listed in the next few hours), which is great news. I handle a site that averages 400,000 page impressions a day. For that we are using cache router with memcache. Even at a peak time a couple of weeks ago, when we were getting more than 60,000 page impressions per hour, our servers sat almost idle (one quad core running Apache, one quad core running MySQL). We also average about 150 logged in users on the site at any given time and the highest our CPU has ever gone on Apache is 10%.

I do have one question. How did you deal with stale data issues? In other words, if someone posts something on a Drupal server, how did you transport the new static page over to your static page server?

Hi Jamie, great to hear

Hi Jamie, great to hear about your experience w/ cacherouter; it sounds like it really helped. In regards to your question about the stale data issue, I'll try to explain in brief. Boost lets you define the minimum cache lifetime for pages it caches. Then, when the Drupal cron is run, stale pages are deleted. On the next request for a cached page that's been deleted, Boost will re-cache the page. The cached pages are just stored in a directory in the web root or somewhere below. As for how this directory is kept up to date across the 4 web servers - the directory in the case of StandAgainstPoverty.org is mounted to all 4 web servers. You can read more technical details at http://bendiken.net/2006/05/28/static-page-caching-for-drupal and http://codesorcery.net/2007/07/23/boost-your-drupal-site

Nice post. Would be

Nice post.

Would be interested in seeing the performance advantage gain of using Boost module and Squid proxy.

Hi, thanks for the comment.

Hi, thanks for the comment. A conversation about using Squid in front of Apache and a "boosted" Drupal site was started over here http://drupal.org/node/214726 and it sounds like someone has it working fine, except some work was needed on the default .htaccess file provided in the Boost module. It looks like it is worth checking that thread out, and a related thread linked http://drupal.org/node/185075

Cheers,
Ian

Thanks for this and the

Thanks for this and the previous post about how you developed Stand Against Poverty. It is a great looking site and, of course, a great cause.

Boost is a fantastic module. It takes the pain out of one of Drupal's few drawbacks - performance. I have used it on a few projects with great success. It is also very helpful for smaller, low profile website. Such websites often have limited budgets for server setups and often run on shared hosting. Boost can turn a sluggish Drupal site on shared hosting into a fast site. It is a life saver!

Hi Blair, thanks for the

Hi Blair, thanks for the comments about the site, and about using Boost on a shared host. The viability of Boost on a limited-resource, shared host is something I left out and should have mentioned. You're definitely right, that Boost can help a great deal even on shared hosts, and that it's not just for sites backed by an elaborate or clustered infrastructure. Good point.

Happy Boosting.
Ian