Pressflow 7: Continuing to Push Performance and Scalability in Drupal
Interview with David Strauss about Pressflow 7
Pressflow - a Drupal distribution that provides improved performance and scalability and which is particularly useful for high traffic sites - continues to develop and the guys behind it over at Four Kitchens have some exciting plans for Pressflow 7 and beyond. After Jeff's post outlining how important speed is for the data heavy sites we build and why we use Pressflow and Varnish to make them faster, I wanted to dig in to find out what's next for Pressflow. I talked with David Strauss, the creator of Pressflow, to get the low down on their plans for the project in the New Year.
Q: I thought all the changes from Pressflow were ported into Drupal 7?
A: Pressflow 7 and later will continue to provide significant improvements over Drupal's performance and scalability. Even maintaining existing Pressflow 6 features will keep Pressflow ahead of Drupal 7 in performance and scalability. But yes, most of the main Pressflow 6 features are in Drupal 7 or have equivalents. We're glad that's the case; it opens the door for new development on Pressflow while maintaining good compatibility with Drupal. Drupal 6 included changes that were in Pressflow 5.
Right now there is work underway to get as many Pressflow 6 changes into Drupal 7 as possible. We've been having discussions with Angie, Dries, and a bunch of core developers over what's viable to merge.
Q: What exactly will Pressflow 7 give me that Drupal 7 will not?
A: Pressflow supports multi-tier proxy layers, which is in use by several major sites. No version of Drupal (including the one in development) can properly handle this architecture. But Dries wants this support in Drupal, so I wouldn't be surprised if it makes the final Drupal 7 release.
Drupal 7 also lacks solutions for common, slow queries that are optimized in Pressflow in a MySQL-specific way. Last-minute Drupal 7 work is underway there too, but that work has continued for a long time (generally since Drupal 6 development). So, it's not clear if those optimizations will make it into the upcoming Drupal release.
Porting existing Pressflow 6 features to Pressflow 7 is far from the final word. Like Pressflow 6, 7 will integrate valuable improvements from Drupal's ongoing development as well as original work.
There a difference in the missions behind Drupal and Pressflow. Drupal provides broad support on even shared web hosts. Pressflow captures the leading performance and scalability edge by using the latest infrastructure advances, and we're willing to break support for older technologies to do that. That means we can move forward faster, but also that Pressflow will never be a system for basic sites.
Q:For people new to Pressflow, when should they use it as a substitute to Drupal core?
A: Pressflow can speed up any Drupal site to some degree, but the impressive changes generally require root access (or equivalent) to install supporting services. Pressflow really shines when it's integrated with APC, memcached, Varnish, MySQL replication, and other extended family members of LAMP that aren't on regular shared hosts.
For module authors, Pressflow can help the transition to the next version of Drupal. For example, Pressflow 6, like Drupal 7, supports database replication and features a smart session and page caching architecture. Modules that run properly in Pressflow 6 are less likely to have issues porting to Drupal 7 and are better poised to take advantage of Drupal 7's new features. Most of Pressflow's extended APIs are either pulled from the next Drupal release or implemented in a compatible way.
Q: What is the latest Pressflow work?
A: Ongoing Pressflow development focuses on two classes of problems:
(1) Benchmarks show Drupal 7 being considerably slower than Drupal 6. This is a much larger hit than the one we had from Drupal 5 to 6. The Drupal core team (including us) has placed a high priority on remedial work here, but it's unlikely Drupal 7 will close the gap. Because the overhead is largely on the PHP side, Pressflow is exploring ways to accelerate common functions by offloading select parts of core to Java (which blows away PHP + APC on a modern Java VM) and performing expensive page assembly and caching operations with systems like Varnish's ESI and nginx's SSI.
(2) The sites running Drupal and Pressflow are bigger than ever. Certain components that ran well on a four-server cluster with heavy traffic cannot survive on a 30-server cluster with massive traffic. We're solving these problems with decentralization. For example, the menu system replacement that is landing in Pressflow 6 generates, caches, and uses menu data locally on each web server. That takes menu operations from one of the biggest cluster bottlenecks (though it is a bit better in Drupal 7) to a component that can scale almost perfectly horizontally. We're also working on distributed, multi-tier caching strategies that already show a 3-7x increase in cache read performance versus using memcached on localhost. The improvement is even greater versus non-loopback access to other memcached instances.
As with our existing development, improvements in Pressflow will be candidates for future Drupal releases. And, conversely, future improvements for Drupal will be candidates for back-porting to Pressflow.
We also have clients sponsoring work on high-availability measures, including built-in database connection monitoring and failover. While Drupal 7 gains native database replication support, Drupal will continue to require expensive and complex approaches to achieve the same failover capability that Pressflow will have built in.
Q:How soon after Drupal 7's official release will Pressflow 7 be coming?
A: It shouldn't be more than a few weeks; porting the missing Pressflow 6 features should not be difficult. We are not beginning Pressflow 7 work until Drupal 7 goes gold.
Q:Where is the best place to download Pressflow?
A: Visit Pressflow.org - there will always be direct-download links from that page. Because large, complex projects use Pressflow, branching using Bazaar (a version-control tool) is a popular way to maintain local changes and apply updates. Project Mercury from Chapter Three integrates a self-updating and configured copy of Pressflow into an Amazon EC2 image (AMI). We're also working with some higher education institutions to provide their students, staff, and faculty with managed, one-click Pressflow installations, but those won't launch for a few months.
Awesome!
Ive been loving messing with the Pantheon Mercury AMI & Pressflow .. its been an invaluable learning tool & a lot of fun, nothing like kicking around a fully working example, albeit an all in one server implementation (for now).
Thanks for the excellent writeup! Cheers!
Porting bits of core away from PHP
I’m most interested in your comments that you might be porting bits of core away from PHP and into Java. I would love to hear more about this. I’ve been thinking of ways to speed up drupal, and was musing along similar lines. Specifically was thinking about porting bits to C, and then creating a PHP extension around it (drupal.so?).
Using a C extension has a few
Using a C extension has a few serious tradeoffs versus Java:
(1) Integrating with the Zend engine in C is awkward, at best. It doesn’t even look like C. Bridging PHP’s loose types to C’s strict types is also frustrating. (Java has strict types, too, but conversion is more straightforward.) (2) Java has a superior array of storage, caching, and cluster libraries, especially for web systems. (3) Anything coded as a C extension for PHP binds itself closely with PHP. PHP/Java integration strategies are generally more abstract, providing reusable functionality that’s easy to interface with other Java systems and scripting languages. C has nothing like JSR-223. (4) Quercus.
Quercus?
Are you planning to use Quercus? As far as I understood it, only the licensed version supports compiled php.
It could still be a big performance win of course, depending on how much and what parts of the code you are planning on porting to java, but I don’t like the idea to tie myself to a certain vendor in order to get the most performance out of my drupal installation.
All versions of Quercus, free
All versions of Quercus, free or not, are “licensed”; don’t confuse licensing with cost or proprietary-ness. Drupal and Pressflow, for example, are licensed under the GPL.
The advantage Quercus has, even with the free/open source software version, is running PHP entirely within the JVM. This removes the cost of PHP calling Java or Java calling PHP using slow, interprocess mechanisms.
Drupal 7 also lacks solutions
This isn’t accurate, and hasn’t been for several months. Core has had support for running LIKE on MySQL and ILIKE on postgres since shortly after the new database layer went in.
The recent activity has been around two issues:
- properly escaping wildcard characters in regular strings when using LIKE as a replacement to LOWER – this fix is in D7 now, but not in Drupal 6 Pressflow that I can see.
- also the pattern isn’t fully applied to every core query running LOWER, but that’s just cleanup, and http://drupal.org/node/279851 has an up-to-date patch needing review.
Re: Escaping in Pressflow
Your criticism is ill-founded. Pressflow 6 has only dropped use of LOWER, not changed anything to use LIKE that did not before. In case-insensitive collations for MySQL, the regular equality operator (”=”) is also case-insensitive, so there’s no need to increase use of LIKE. That means Pressflow 6 has identical escaping issues as LIKE on Drupal 6.
Drupal 7 has more issues with escaping “%” for uses of LIKE because the chosen solution for PostgreSQL is to use ILIKE (case-insensitive LIKE) as a case-insensitive version of “=”. Don’t falsely associate this new problem with Drupal or Pressflow 6.
Ahh, this is true. I thought
Ahh, this is true. I thought we still had some LIKE LOWER queries knocking around in D6, but looks like they’ve been cleared up by now, and even so, would have affected both core and D6. Apologies for the confusion.
Re: Lacking solutions
When I said “lacks solutions,” I didn’t mean that APIs aren’t available. I simply meant that the queries aren’t optimized (whether through those APIs or not). There’s a big difference between solving a problem and just having the APIs available to solve the problem. Drupal core, at this moment, falls into the latter category. The issue you cite (#279851) has been around for a year and a half, and most of the work (including 150 comments) has involved debates over the right way to abstract the solution for all supported databases. As of today, it’s still not clear that consensus is at the point where the patch is going in, even if I gave it my own RTBC right now.
Well, lacking solutions
Well, lacking solutions usually means lacking either an API, or a pattern which can be applied, or a patch – none of which applied to that issue when this interview was published. Either way, that patch just got committed: http://drupal.org/node/279851 bye bye LOWER!