Why the heck a new aggregator for Drupal 7?


3 min read

This year’s Google Summer of Code season I’ve got the distinct pleasure of mentoring Aron Novak’s work for a new aggregator in Drupal 7. Aron’s well into his task and has just rolled a patch for core and an alpha 2 version — time to share why I think that this patch is important and why you should have a look at it. If you’re into aggregation and Drupal, that is.

Drupal’s original aggregator module was designed foremostly for pulling news feeds into your site and displaying in a straightforward fashion: no workflow, very basic permissioning, no API for interacting with feed items, no event aggregation, no custom parsers — to name a few limitations.

Soon contrib modules mushroomed that addressed one or the other shortcoming of the core aggregator: a list of them would start with the aggregator 2 module which was published in the fall of 2005 and would include Leech (I don’t regret its demise), Aggregation (first time use of SimpleXML for parsing in Drupal), SimpleFeed (first extensible architecture) and FeedAPI (Aron and my first attempt at an aggregation API together with Ken Rickard).

Nearly all of them added node aggregation and nearly all of them added a set of cool features that the core aggregator didn’t support. All of them wound up duplicating functionality: the core of aggregation, creating a representation of a data feed out there and pulling it down to your site.

So why would we care about that? Isn’t wild growth on the Meadows of Contrib a wonderful thing? Don’t its thousand blossoming flowers make it more likely that everybody find their daisy? Up to a point I’d say. I feel we’ve come past this point for three reasons: 1) the features in contrib land are incompatible, 2) we loose time on common tasks and 3) we haven’t got enough eyes on our code.

p())(#id). 1) Incompatible features

p())(#id). While the many modules at hand address use cases as wide as data migration, organic groups integration, event aggregation, workflow support etc, they’re not compatible! So in many cases I can have one or the other feature, but I can’t have a particular sub set or all of them.

p())(#id). 2) Lost time on common tasks

p())(#id). We get bogged down in reinventing the same ole functionalities, fixing the same ole bugs and thus missing time for implementing higher level features like:

p())(#id). — monitoring of feed download performance

p())(#id). — a standard interface for integration with alternative parsers

p())(#id). — full blown event aggregation

p())(#id). — lazy node instantiation from feed items

p())(#id). — full blown feed element mapping

p())(#id). — (fill in your aggregation desires here)

p())(#id). 3) Not enough eyes on our code

p())(#id). We’re missing out on the great opportunity of open source that is to evolve solid code by sharing it with as many people as possible. I personally have seen many problems with feed parsing, duplicates or download performance repeated, solved and forgotten in Aggregator, SimpleFeed, Leech and FeedAPI. These are not the most error prones, but just the ones I worked with.

The suggested solution is no big recent finding: an architecture that handles the most common tasks, defines a normalized data structure for feeds, and offers an API for parsing and processing data feeds. For convenience, it shouldn’t come as an empty shell but in a sensible default configuration that allows an out of the box usage that we’re used to from the current aggregator module.

The aggregator patch that Aron is cooking up follows these principles. I’ll be blogging about the implementation details of it soon. In the meantime, check out the project outline on Drupal Groups and the patch on drupal.org.

What we're doing.