Development Seed

Blog

Good bye FeedAPI, hello Feeds!

The time has come for a new aggregator that doesn’t just aggregate

With the launch of Managing News we have released Feeds, the intended successor of FeedAPI. Feeds is a next generation import and aggregation API that applies lessons learned from three years of intensive work with aggregation in Drupal. This is one of the outcomes of our work on Managing News that we are most excited about. We'd like to again thank the Knight Foundation for their vision to support the improvement of fundamental aggregation tools for Drupal, which helped create Feeds. In this post I'd like to explain the reasons for building a new aggregation and import API, the design goals for it, and what this is going to mean for FeedAPI.

It's over two years now since a conversation at OSCON led Ken Rickard to post the Aggregator API proposal. That resulted in the successful Google Summer of Code project, Feed API, which has been developed and maintained largely by Aron Novak. Since then, we have used FeedAPI on many (perhaps most) of our projects and we have dedicated significant resources toward improving and extending it.

Over time, the flexibility of FeedAPI's architecture started to pay off. For instance, we used it to aggregate RSS and Atom in Managing News, import compressed CSV crime feeds for Stumble Safely, and create events from iCal feeds for Open Atrium. Feed Element Mapper gave us granular control over mapping feeds to Drupal content. A series of contrib modules mushroomed to plug into FeedAPI to offer additional functionality.

However, as we addressed these and many other use cases with FeedAPI, its limitations became clear.

Limitations of FeedAPI

  • FeedAPI assumes that feeds are loaded via HTTP; file support has been added, but still feels alien.
  • Parsers are responsible for fetching feeds; fetching mechanisms (HTTP or file) are hard or impossible to reuse.
  • Exotic feed formats forced many FeedAPI developers to fork parsers.
  • Mapping is done by a separate module which leads to complex code.
  • FeedAPI assumes an aggregation use case. Feeds as nodes and a baked-in scheduler make sense in this context, but there is no reason for it to be the only use. Aggregation is essentially just periodic import.
  • FeedAPI is a child of PHP4. As such, object-oriented techniques that would allow for a more efficient API aren't used.
  • FeedAPI configurations are not exportable to code which hampers development workflow and makes FeedAPI awkward to use in install profiles.

As many of these issues slowed down day to day development for us, design debt became critical to address.

The goals for the rewrite

We took the launch of Managing News as an excuse to push hard on a rewrite of FeedAPI with these goals in mind:

  • Exportability
  • Better and cleaner extensibility
  • Suitable for aggregation and data import jobs
  • Comparable or better performance
  • Suitable for pulling from files, pulling from HTTP or pushing through HTTP

What's in the box

On October 20th, we released the first version of Feeds. If you download Feeds today you'll get:

  • Fetching data from file or HTTP sources
  • Parsing RSS, Atom, CSV or OPML
  • Producing either simple database records (with Data module), nodes, users or terms
  • Use an importer configuration either by creating a node or on a stand alone form
  • Presets for the most common aggregation and import tasks
  • Integrated mapper for mapping feeds on a field level to CCK nodes, users, terms or SQL tables
  • A views-style OOP plugin API that makes it easy to tweak existing plugins or create your own (big KUDOS to merlinofchaos for CTools' plugin API)
  • A views-style export-to-code functionality for configuration (again, hats off, merlinofchaos: CTools export API makes it easy to add exportables to a module)
  • Support for concurrent feed aggregation with Drupal Queue - a backport of the great queue in Drupal 7 contributed by chx et. al.

A glance at Feeds' admin page shows the default configurations Feeds ships with. See more screenshots.

Beyond what's already available, there are a couple of interesting potential additions and improvements to Feeds that are planned or that we are very interested in getting in:

What does this mean for FeedAPI?

Aron Novak and I will start to phase out FeedAPI maintenance effective with this blog post. The goal is to find a new lead maintainer by the end of the year and to keep additional features in FeedAPI to a minimum. The same applies to dependent modules like Feed Element Mapper or CSV Parser that we are maintaining.

As far as I can see there is no reason to pull FeedAPI into Drupal 7. It can be completely replaced by Feeds for D7 (of course, this is open source code and anybody is free to step up and upgrade FeedAPI). Feeds supports the #D7CX movement, and there will be a full D7-compatible version of Feeds the day Drupal 7 is released.

A FeedAPI upgrade script to Feeds (#596584) is planned, but we will need serious help with testing and maintaining it. At Development Seed, we do not have many legacy FeedAPI systems that need to be upgraded, so we will have a shortage of "guinea pig" sites for testing corner cases.

Get involved

You will notice that Feeds still carries the 'alpha' label. The main reason for this is that we would like to keep the flexibility for adjusting Feeds to new use cases as early adopters work with it. The overall architecture is solid at this point, and we are using it in a series of production sites including Managing News. It is currently at a level where feature compatibility between minor versions can be guaranteed, but cautious changes to the API may still occur.

I would like to encourage everybody who has a stake in Drupal aggregation or content importing to take a close look at Feeds. This is a module that could make your life much easier. :) Up until the first beta release, API adjustments to accommodate your use cases are still possible, so your contributions and feedback are always welcome. I'll look forward to interacting with everyone more in the queues.

Hook for adding Importer

Hi Alex – great module!

I’ve used it to parse Picasa feeds and present the images, complete with thumbnails, to views. (I tried Brilliant Gallery but the performance wasn’t great and the presentation options were limited.) The results are here if you’re interested.

Anyway, there was one thing I couldn’t figure out. How do I get my module to add an Importer? I tried following the pattern in Feeds Defaults, but none of the hooks I added seemed to get called.

Following

Following feeds_default.module’s example is a good idea.

Besides implementing hook_feeds_importer_default(), don’t forget to implement hook_ctools_plugin_api() to let CTools/export know that you provide default hooks.

These hooks should be called when you visit admin/build/feeds

Guinea Pig

I have a site that might make a suitable guinea pig if you want to check it out.

http://www.whopub.com/

Feeds and feedapi together for now?

I“m loving feeds!!! The wave of the future. Unfortunately, I can’t seem to write the mappers I need and just don’t have enough module development experience to jump in and help:( Wish I did. So, I have to wait for some of the stuff to get ported, which I know will take time. I just don’t have the time to wait at the moment. ;( I was just about to start using feedapi, mapper, keyword filter, scraper, and then along came feeds!! But I really need to get something started in the interim. so….

My question is this: for now, till we get some of the basic mappers like taxonomy mapping, emfield, etc… working, can we run feeds and feedapi together? I tried tonight and it seems they all go into the feed area together and it seems dangerous to run them both at the same time.

I’d still like to play with feeds, but perhaps on a dev site only for now??

You should be able to run

You should be able to run FeedAPI and Feeds side-by-side. The obvious downsides are code bloat and two potentially cron-heavy modules on one site.

Of course you can’t use FeedAPI Mapper or any of its mapping implementations on Feeds.

Mapping issue

I feel myself a bit stupid because I don’t know how to get proper source fields when mapping custom rss. I have a feed that contains 30 custom fields and when I’m trying to push those to my cck fields I don’t know how to find sources. Sorry if this stupid question and if you didn’t quess it yet – yes I am newbie Drupalist ;)

Other than that silly issue I have found Feeds as awesome, fresh and hip.

- The availability of mapping

- The availability of mapping sources is a matter of the parser you are using. If the fields are custom, they won’t show up. The solution is to write an extension to an existing parser. – The availability of mapping targets depends on the mappers available to FeedsNodeProcessor. At the moment, there aren’t many, as we’re in the process of porting them from FeedAPI mapper: http://drupal.org/project/issues/feeds – contributions are warmly received : )

Now focusing to write an extension

Thank you for a reply. I guess I have to take the next step on my Drupal experience and start writing an extension. If I can get something proper I would of course share my results but I am a bit too new for Drupal so I guess only achievement would be a bowl of spaghetti code.

A better generic import tool

There is now a patch in the queue that will make Feeds an even better generic import tool – an interface for different transporters.

Patch: http://drupal.org/node/626352

By default the “direct transporter” is used, which transports parsed items directly to the processor on the same request. But there is also a “batch transporter” which uses Drupals built in Batch API to split up the import process over several requests.

Soon there is also the possibility to implement a “queue transporter” with the help of the awesome backported D6 version of D7’s Queue API – http://drupal.org/project/drupal_queue

I’d love to have some eyes on this patch, because I think this will make Feeds rock the world even more! :)

/Dick Olsson NodeOne – http://nodeone.se

Thank you and a question

This sounds great. I just spent the last month getting FeedAPI, FeedAPI Node, FeedAPI Inherit, FeedAPI Mapper, FeedAPI Image Grabber, and probably other modules that I can’t remember at the moment to get all I wanted and needed to import and display RSS data. Only thing left is setup DataSync and DataSync FeedAPI. (Which the creator of the module just told me he doesn’t think my shared hosting account will support DataSync. Crap.)

So, I have a question:

I plan to launch my new site early next year. Do you anticipate by that time that Feeds will have all the functionality of all those modules combined?

Also, I currently have about 30 RSS feeds and I constantly get cron errors. Do you plan on supporting an alternative like DataSync to fetch imports?

Thank you very much for this new module and all the work you have done to make Drupal great! Scott

Hi Scott, I don’t know your

Hi Scott,

I don’t know your specific requirements. I recommend you check out and test Feeds for your specific use case.

Thank you,

Alex

Thank you so much for this

Thank you so much for this project. The separation of fetcher, parser, and processor have let me stand up a custom XML parser and prototype it for different file locations quickly and easily and saved me lots of time by not having to implement my own node creation.

Sounds like Feed is becoming

Sounds like Feed is becoming more of a generic import tool. Which makes me think of table wizard and migrate. There’s got to be a fair amount of overlap where there might be potential for collaboration.

It's a matter of writing a Feeds processor

Indeed.

It’s a matter of writing a Feeds processor of making Feeds an importer for Table Wizard.

iCal

The iCal parser module is already supporting Feeds in cvs head by the way (because it’s so easy – thanks for that is really nice).

I’ve seen commits to add field mapping to feeds? So if anyone wants to play with that for the date field before I do :)

w00t!

w00t!

Taxonomy-Mapping-Support

For anybody searching for the support of Feeds to map taxonomys of Feeds to specific Taxonomys of the created nodes (Tags, Categories..), , i have written a patch.

Its not a patch, more a new additional mapper. Hopefuly nobody did that in the mean time, double-effort is alway pain :)

The Module has a really good API, while the UI lacks of quite some things. – You can`t remove mappings of a vonfiguration, you yet have to delete it and start from new – The flipped approach of having configurations in which you set the content type, which content type should be used to create new feeds using this configuration.Compared to the FeedAPI which had other downsides on this field, you dont have any Feed-Field-Informations(Live fetched) while you write your configuration and you cant add different mappings to different feeds (or overload them), when you finally created a concrete feed . You have to creae a configuration again and again, for every feed, which actually makes nearly all the effort useless, having the ability to create different feeds of a specific configuration, as you cant adjust it in any way, not even the mapping.

Thank you Alex for the great module, i hope that FeedAPI developers and users are moving to this project wihtout beeing scared to much.

Differences to Feed Element Mapper

Great to see you chiming in :)

“You can`t remove mappings of a vonfiguration, you yet have to delete it and start from new”

- I cannot confirm this on my installation of the latest alpha. Please file a detailed bug report – thanks.

“you dont have any Feed-Field-Informations(Live fetched) while you write your configuration”

Yes. The big difference between Feed Element Mapper and Feeds is that Feeds does not try to inspect an existing feed and generate mapping sources from that inspection. Feeds asks parsers to declare all possible mapping mapping sources ahead of time or leave it up to the user altogether to type in the mapping sources as a simple string (i. e. with CSV parser you would type in the header fields of your CSV file).

This makes for much simpler code and I’d argue, covers most use cases.

However, there may be situations where parsing the actual feed is indespensible. In these cases it should be possible to ask the user for a feed URL (feed file) on the settings pages an fetch+parse the feed on the mappings page. Haven’t tried that yet though.

In regards to per-node (per-import) overrides of mappings: I observed that mostly always we used these overrides in a context of very low numbers of feeds. This use case is covered with the ease of creating new configurations in Feeds. However, I may be wrong here and we may find out that per-node mapping overrides are a must-have. A discussion we should have on the issue queue or in IRC.

non-latin rtl languages?

Alex:

Thank you for this very promising module. What’s the future of the aggregator module that’s in core?

Also does the Feed module solve the problem of non-latin rtl languages in the URL?

Look at http://api.drupal.org/api/function/valid_url/6#comment-200

Feeds does use valid_url() at

Feeds does use valid_url() at some points. So it won’t fix issues that are related to this function.

For version 7, Drupal core

For version 7, Drupal core will continue to ship with aggregator module. After that, we will have to see. From a functional perspective, Feeds is definitly in a position to completely replace core Aggregator. I would not favor moving Feeds to core though. There is no good reason for doing that.

Wow, an excellent sounding

Wow, an excellent sounding progression to Drupal aggregation. We really need to migrate our sites to Feeds to take advantage of the future.

Ken Rickard’s name was

Ken Rickard’s name was misspelled. I just fixed that. Sorry, agent!

nice gift

thanks to dev seed and its clients for sharing your consulting work with the world. these are such great gifts. i’ll take this for a spin. i’m curious how this matches up with migrate module. maybe we can join forces there.

Forking your own modul?

Thank you for so much greate work.

What i do not understand is – your idea what a API is.

you clutter up the cvs of drupal.org (sorry that’s the way i see it).

Why do you not offer a smooth way of change: starting one or two (hey, you use git) new branches, the first one with a interface relaying on functions and offering the same as methods of objects with a OO-interface.

The second branch is only OO and with the time you let die the older branches in a planed way – all maintainers of integrating modules will be informed about your commited roadmap. So these maintainers can stay up to date with your module and all will find a smooth transition.

for questions give me a ping in IRC, i’m most of the time in #drupal.de (german spoken).

Best Thomas Zahreddin

Drastic changes in paradigms required new project

Hallo Thomas,

Precisely the departure from some of the fundamental architecture decisions in FeedAPI (Feeds are always nodes, baked-in aggregation, items are passed one by one into the processing stage etc.) led me to create a new project instead of creating a FeedAPI 2.

I wanted to avoid any expectations of a feature-equal second version of FeedAPI in favor of a clean slate. Paying down the refactoring debt outlined in the post above would be much more expensive if we took an “anything that FeedAPI does, only better, only more” approach.

In the very sense of keeping an API consistent, I think it was a better decision to mark the change by creating a new project.

Alex

Great Work

I just so happened to be playing with Feeds this morning, before you even posted this, and I’m really impressed. I remember creating feed item nodes and mapping elements being not-so-trivial with FeedAPI + Feed Element Mapper, and I can already see that this is a huge improvement. According to the project page, “it aims for easier usage and better extensibility,” and you definitely nailed it. Nice work.