Development Seed Blog

Why the heck a new aggregator for Drupal 7?

Or, Check Out the Patch

Or, Check Out the Patch

This year’s Google Summer of Code season I’ve got the distinct pleasure of mentoring Aron Novak’s work for a new aggregator in Drupal 7. Aron’s well into his task and has just rolled a patch for core and an alpha 2 version – time to share why I think that this patch is important and why you should have a look at it. If you’re into aggregation and Drupal, that is.

Drupal’s original aggregator module was designed foremostly for pulling news feeds into your site and displaying in a straightforward fashion: no workflow, very basic permissioning, no API for interacting with feed items, no event aggregation, no custom parsers – to name a few limitations.

Soon contrib modules mushroomed that addressed one or the other shortcoming of the core aggregator: a list of them would start with the aggregator 2 module which was published in the fall of 2005 and would include Leech (I don’t regret its demise), Aggregation (first time use of SimpleXML for parsing in Drupal), SimpleFeed (first extensible architecture) and FeedAPI (Aron and my first attempt at an aggregation API together with Ken Rickard).

Nearly all of them added node aggregation and nearly all of them added a set of cool features that the core aggregator didn’t support. All of them wound up duplicating functionality: the core of aggregation, creating a representation of a data feed out there and pulling it down to your site.

So why would we care about that? Isn’t wild growth on the Meadows of Contrib a wonderful thing? Don’t its thousand blossoming flowers make it more likely that everybody find their daisy? Up to a point I’d say. I feel we’ve come past this point for three reasons: 1) the features in contrib land are incompatible, 2) we loose time on common tasks and 3) we haven’t got enough eyes on our code.

1) Incompatible features

While the many modules at hand address use cases as wide as data migration, organic groups integration, event aggregation, workflow support etc, they’re not compatible! So in many cases I can have one or the other feature, but I can’t have a particular sub set or all of them.

2) Lost time on common tasks

We get bogged down in reinventing the same ole functionalities, fixing the same ole bugs and thus missing time for implementing higher level features like:

- monitoring of feed download performance

- a standard interface for integration with alternative parsers

- full blown event aggregation

- lazy node instantiation from feed items

- full blown feed element mapping

- (fill in your aggregation desires here)

3) Not enough eyes on our code

We’re missing out on the great opportunity of open source that is to evolve solid code by sharing it with as many people as possible. I personally have seen many problems with feed parsing, duplicates or download performance repeated, solved and forgotten in Aggregator, SimpleFeed, Leech and FeedAPI. These are not the most error prones, but just the ones I worked with.

The suggested solution is no big recent finding: an architecture that handles the most common tasks, defines a normalized data structure for feeds, and offers an API for parsing and processing data feeds. For convenience, it shouldn’t come as an empty shell but in a sensible default configuration that allows an out of the box usage that we’re used to from the current aggregator module.

The aggregator patch that Aron is cooking up follows these principles. I’ll be blogging about the implementation details of it soon. In the meantime, check out the project outline on Drupal Groups and the patch on drupal.org.

Comments
reasons

these aren't really valid arguments why the new code (FeedAPI) should be in core, you can still resolve these problems in contrib..
the main reason for this patch should be: the core aggregator module sucks, and badly needs a successor..

you can still resolve these

"you can still resolve these problems in contrib.."

We agree on that we need a new core aggregator, but let me be a little nitty gritty here. It took me so long to figure it out myself: we can *not* solve these problems in contrib.

A contrib solution will always lack this last bit of attention that is so important to get to a common approach and it will always have at least one twin: the core aggregator.

(BTW: the patch in the queue != FeedAPI)

I very much like FeedAPI. It

I very much like FeedAPI. It basically kicks the snot out of aggregator module.

Glad you like FeedAPI :)

Probably you'll be glad to hear that the basic architecture of the patch in the works is similar to FeedAPI.

Fantastic News

Really, this is great news. On a number of occasions we've done work with leech and feedapi and it would be really nice to have all those features (and much much more) in core to begin with. So I, for one, applaud your efforts and now... I'll sit around and wish we were using 7.x at large already. :-D

Eclipse