The latest version of Feeds has realtime web feed subscriptions based on PubSubHubbub (PSHB). Here I explain when this could benefit you and how you can set up your site to aggregate news efficiently in realtime.
The polling problem
If you have ever worked with an RSS or Atom aggregation engine like Drupal core’s Aggregator, SimpleFeed, FeedAPI, or Feeds, you will be very familiar with their common shortcoming: they use the “are we there yet?” approach. They poll a feed every 30 or 60 minutes and see whether there is something new to be imported.
This approach is a waste of resources. Take for instance this blog, which publishes around three posts a week. If you poll it every 30 minutes, you ask it “are we there yet?” 336 times a week in order to get just three positive responses. In other terms, you would download around 11MB a week from developmentseed.org to get the 9K in blog posts you care about.
If you don’t aggregate one blog but instead 100, you’ll download over a gigabyte of data a week just in polling. What’s worse is that once you start adding many feeds to a single system, you find another common problem with this approach: feed polling can back up. When this happens your system doesn’t visit a feed every 30 minutes - instead visiting it every 60 minutes or at even less frequent intervals, resulting in articles coming in sometimes hours after they have been posted.
PubSubHubbub addresses the polling problem. It notifies a feed subscriber when new content is available, thus rendering the “are we there yet?” question virtually obsolete.
Here is how it works. Feeds that support PubSubHubbub include a link tag that specifies a hub that distributes update notifications. See for example the hub link in the Haiti Innovation News feed:
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" />
Upon pulling this feed for the first time, a subscription request can be sent to the hub specified in this link (“http://pubsubhubbub.appspot.com/”). When new content is added to the feed or when existing content is changed, the hub will send a notification containing the new and changed content to a notification URL specified upon subscription. This is the part of the magic that matters for the subscriber - if you are interested in the details or the publisher and hub side, check out PubSubHubbub’s website.
Right now blogs like ReadWriteWeb and Gawker support PubSubHubbub. Google (no surprise as the PSHB inventors are Google employees) threw its weight behind the protocol by supporting PSHB for all Feedburner feeds and in Google Buzz. The Bay Area startup SuperFeedr offers a blanket subscription service that even enables PSHB notifications for feeds that don’t support PSHB natively.
What about RSSCloud?
PubSubHubbub’s appearance on the scene has brought new life into an older solution for the same problem - RSSCloud. Wordpress supports RSSCloud, and there is a heated debate on the virtues of both contenders. The most important difference is that PubSubHubbub delivers new and changed content (called fat pings) whereas RSSCloud sends only a light notification and expects the subscriber to fetch the feed directly from the publisher.
The RSSCloud versus PubSubHubbub debate certainly warrants its own blog post, however, I can’t walk past this without posting the top two items from my wishlist. 1) I would love to see a light ping option in PubSubHubbub for dealing with protected feeds and non-syndication content and 2) both protocols should be, as far as possible, defined on an HTTP-only level, again to better separate notifications from content.
Adding seamless support for RSSCloud to Feeds at a later point would be straightforward. For now Feeds supports PubSubHubbub as it has the most detailed specification available and a very active community.
Now how do I set up PubSubHubbub with Feeds?
- Get Feeds 1.0 alpha 12 from Drupal.org.
- Go to admin/build/feeds and edit (override) the importer configuration you would like to use for PubSubHubbub.
- Choose the HTTP Fetcher if it is not already selected.
- On the HTTP Fetcher, click on ‘settings’ and check “Use PubSubHubbub”.
Now every time you add a new feed or any of your existing feeds is being updated on cron, a subscription will be issued to the hub that is specified in the feed. When there is new or changed content on the feed, the hub will notify Drupal and deliver the new or changed content.
What’s not yet implemented is an automatic adjustment of the refresh frequency. Say you configure a Feeds importer to update feeds every 30 minutes. If a particular feed is successfully subscribed to a hub, it will continue to download every 30 minutes. This is an issue that will be addressed in a future release of Feeds.
As long as refresh frequency is not automatically adjusted, Feeds’ PubSubHubbub support makes most sense when all feeds that need to be aggregated support notifications through a hub. If this is not the case, a hub like the above mentioned Superfeedr that offers notifications for any feed is a good alternative. You can use Superfeedr as a designated hub with Feeds by adding the hub’s URL with your access credentials from your Superfeedr account:
With a designated hub all subscriptions will be issued to that hub, even if the feed specifies a different hub URL. In these cases you can lower the minimum refresh period of an importer to 4 weeks.
I look forward to your questions and comments. For specific reports on Feeds’ PubSubHubbub integration, I encourage you to post directly on the Feeds issue queue.