Aggregating Things in Drupal 7
Adjusting Goals and Developing a Successor for FeedAPI for Drupal 7
In the last couple months we’ve been driving efforts to bring Drupal’s core aggregator up to speed so that we could use it to replace FeedAPI in Drupal 7. After trying to pull through some of the more extensive changes, I have to recognize that in this case the advantages of Drupal core work are not enough to outweigh the disadvantages.
A year ago when we decided to help improve Drupal’s core aggregator to a point where it could replace FeedAPI, my assumption was that the additional effort of working in core would be more than compensated by the important reviews and input we would get from the Drupal community.
Indeed, we’ve received some great help and I’d like to thank everybody involved. But it became very obvious that we could not gain the momentum I was hoping for. Patches like #303930: pluggable architecture and #293318: feeds as nodes, which I consider central to the effort, were only reviewed in depth by a few and overall the patches moved slowly.
I don’t think this lack of support can be blamed on anybody in particular, but I do think it shows that there is some discrepancy between what we are trying to do with aggregation in Drupal – namely building a flexible, extensible aggregation API – and what is reasonably possible within Drupal core.
At the same time we started investing in core aggregator, our demands in aggregation escalated. With every new client project we undertook, we contributed critical changes to FeedAPI and launched many new plugins for it. These contributions made the existing deficiencies of FeedAPI clearer and the gap that needed to be closed with its successor – Drupal 7’s aggregator – wider. We are now at a point where we cannot guarantee anymore that the slow moving and remote solution of a “Great New Aggregator for Drupal 7” will address our needs. Piling on extensive changes without simultaneous use or proper review is not the recipe for a solid module.
Therefore, we need to adjust our goals. Instead of replacing FeedAPI with Aggregator in Drupal 7, we will aim to develop a successor for FeedAPI in contrib. This approach will give us a chance to let the base of the FeedAPI module mature and give us more leeway for new features and adjustments.
And there is much to do. In the last year, FeedAPI has proven its potential as a universal importing platform, and I think this aspect of it will only become more important. We will need a cleaner and more powerful API, fast and flexible storage for simple feed items, better support for large feeds, and better performance monitoring.
I will share more concrete ideas and ask for your feedback as these ideas are taking shape. What is for sure is that there will be an upgrade path for FeedAPI into Drupal 7 and – of course – an upgrade path for Aggregator into Drupal 7.
This decision was not easy, but I believe that aggregation in Drupal needs – at least – another round in contrib, and I’m excited about the opportunities lying ahead.
+ 1 for small core
and hoping to see some coordinated product strategy around install profiles.
You should take a look at the
You should take a look at the Views 3 road map it has a pluggable query backend which will allow Views to contain external RDF data and any other type of data that gets plugged in.
Potentially this could contain any type of XML and the Fields in Views would contain the XML data structure. This could then be “cached” in nodes and potentially we’d have the same functionality as Feed API but with support for any type of XML.
- Sean Bannister
FeedAPI <> Views 3
Sean –
As you may know, there is a big difference between FeedAPI and a pluggable query backend for Views: aggregation.
A query backend that looks into external data sources uses a more complex query interface (i. e. the web API) and leaves storage up to the data source, the only local copy that you probably keep in such a scenario is a relatively “dumb” cache.
With aggregation (FeedAPI), you pull down data to your web site and you keep a very well structured copy of it (e. g. nodes). You then query this copy for display or you sometimes even manipulate it (e. g. users add tags to news they find important etc.).
As a matter of fact, already today with FeedAPI you can aggregate not just any kind of XML but any kind of datasource on the web. If you write the parser plugin for it, of course. There are parsers for iCal, CSV, KML etc. – we’ve even built one for email.
That said, the route source -> manipulation through Views could be more direct. You could hide the ‘smart cache’ FeedAPI uses to some extent from the user and focus on ‘define source’ -> ‘manipulate data’, you could support more complex queries for defining what you would like to aggregate. Today users go mostly through ‘build content type for aggregated items’ -> ‘configure FeedAPI’ -> ‘map feed elements to node fields’ -> ‘build feed and aggregate’ -> ‘build display in views’.
This could be a much simper and a more direct workflow in the future.
Flatstore is one element that will bring us closer to this ideal by giving us flexible storage and Views integration on the fly. I’m going to talk about it at DrupalCon Paris. – if I get the votes, that is : ) I would love to get yours!
- Alex
Awesome, I totally forgot
Awesome, I totally forgot about FeedAPI’s XML support. It’s been about a year since I used it. Guess I got a bit carried away with Views new features.
- Sean Bannister
Must say I agree
FeedAPI is (potentially) useful for so much more than integrating RSS (?? and Atom ??) feeds. There are so many things out there in non-RSS formats that the need for a generic import tool is needed.
But I also think a generic import (and export) tool is required for Core rather than being a contributed thing. My reasoning is that Drupal admins need a robust reliable way to export (and import) their content for many reasons if only so that our content is not entrapped within Drupal. As it stands a Drupal user has no convenient way to do stuff with their data other than what Drupal allows you to do, and there are a big range of data migration things that are hard to do with the lack of a core import/export feature.
Drupal development
As I see core should provide a minimalistic solution for the most common problems. I am almost sure that I can point out missing functionality in every core module. Like forum, book, dblog, upload, cache, search even in the block module, but I do not want to add that functionality into core. For every core feature we need to make it easily expandable or completely swappable. I think D7 is big leap in the right direction..
Drupal core and Drupal contrib development is really different and every Drupal developer should try both to understand where is the line between them, why it is different and what have to be done in core and contrib to create a good solution for a feature.
RDF futurism
I am a big fan of FeedAPI.
The integration of some RDF features in core makes me think that FeedAPI’s features are even more crucial. At the same time, RDF forces me to think about Drupal sites as silos and feeds as bridges between them… not the most elegant vision. It’s a little Rube Goldberg-y. I think some fresh ideas will come up in the next few months about what aggregation means for RDF-strong Drupal.
In the meantime, I keep depending on FeedAPI. Thanks for all of your work.
I've long been a fan of smallcore
And a fan of ripping modules out of core. However Dries has repeatedly expressed his desire to ship a Drupal that does most of what people want, out-of-the-box. So we’re not going to get anywhere arguing for ripping aggregator out. Instead, effort should be focused on finding (or creating) common ground between FeedAPI and aggregator. How are feeds parsed? Can the RSS/RDF/ATOM solutions for FeedAPI replace the parser in core? Would this make maintaining FeedAPI easier down the road if this bit was available in core?
In the end the world won’t collapse if status quo is maintained. But I am disappointed by the way this has turned out. Thanks for the effort, though. I know what you were facing.
You're kidding, right?
“However Dries has repeatedly expressed his desire to ship a Drupal that does most of what people want, out-of-the-box. “
OK, now you’re kidding me, right? No one – and especially not Dries – has given an ounce of support for my push to try and make Drupal do SOMETHING out of the box. It doesn’t do anything! Aggregator isn’t turned on out of the box! What is the use case for aggregator-available-out-of-the-box?!
See http://groups.drupal.org/node/21013 for my suggestion on what it COULD do out of the box.
Not kidding
Dries and others have indicated that core should have a sensible choice for all the tools that people use to build sites. Ok, maybe this is different than “doing something”, but it’s also different than saying “aggregator doesn’t belong in core, let contrib solve that problem”, or “Core should be the lightest framework available with no extras that could be solved in contrib.” The latter is what I would push for… a Linux kernel type of core, and maybe a maintained CMS distro that resembles today’s Drupal built on top of that kernel.
But back to FeedAPI, what is the API part of it? If there really is an API, it should be able to express it in OOP Interfaces. If that is possible we could ship core with those interfaces and use a factory to instantiate what is now the aggregator module.
If there is no such API in FeedAPI, maybe that’s the first line of work to be done.
Very little can be done with core out of the box
“Dries and others have indicated that core should have a sensible choice for all the tools that people use to build sites.”
So, currently, core is not that thing. Step one, install a dozen contrib modules. Or, rather, step one is get frustrated since you can’t quite build anything with just core. THIS is what I’m ranting against — you can’t have it both ways. Decide on what it does out of the box and MAKE IT DO THAT, as well as giving tools to expand it, or strip the whole thing down.
My challenge to you: what should Drupal do out of the box? What do we support? “Tool that people use to build sites” — what kind of sites? Blogs? Communities? Corporate / more static sites? WHAT IS CORE GOOD AT?
What’s the use case for aggregator? Because it’s certainly not a core functionality that is one of the first things that people reach for. Especially at the “low end”. And your comments are invalid completely for large sites — which will do custom builds and look at all of contrib when deciding on a build. So, you have no answer for why aggregator should be in contrib other than it has always been there.
Preacher. Choir.
I even argue that taxonomy.module doesn’t belong in core.
My vision, which is completely not about to happen, calls for a stripped down core that focuses on APIs and tools. Then have package management sytems, and tools like Contexts, Features, etc., even installation profiles, and let the Drupal CMS have the same relation to “core” that Ubuntu has to the Linux kernel.
I like the distributions that DevelopmentSeed and Phase2 are making. I like Pressflow. I like Acquia Drupal. All of them would benefit from a core that didn’t have aggregator, because people only use aggregator long enough to realize that it isn’t truly useful, and then they go get FeedAPI. My patch for core for the aggregator would just be to remove it. I’d put it in contrib and see if anybody wants it badly enough to maintain it.
Have it do something
Actually, Acquia doesn’t really do anything out of the box. It’s a better lego box, but it doesn’t aim in any one direction.
I would like to see core Drupal do something – to speak to some community of users out of the box, rather than the “learn Drupal and then you can build anything community”.
With such a plan/set of use cases, then we can rationally approach what should and should not be in a core distribution.
Build the bike store!
Interesting thread here. Good hearing multiple viewpoints. The way I see Drupal is like the following example.
You go into a bike shop and you’ve got all kinds of options. You can buy a bike that came from a manufacturer and it’s all put together for you. You look around the shop and on the wall are these cool looking frames. It’s just the frames you see, but you know something’s up.
You think to yourself, “I want one of those bad ass frames so I could build out my bike – then I’d be rockin’”.
Well, we all know what the outcome is. Unless you’re already a good rider, the bad ass bike won’t do much more for you until you can capitalize on it. The key is, you still desire it over whatever your budget dictates.
I like Drupal as the “bad ass frame”. What’s needed is the bike store. A place where the bike has been (or can be) put together for you.
Envision this. distro.drupal.org (geeky), distributions.drupal.org (a bit better), store.drupal.org (general public friendly) where you can choose either a) a prebuilt package based on common desired features or a “custom build” which will simply package it up for you.
I wish I could build it now, but a listing of modules which are just pulled via latest stable cvs are combined with core and the CVS folder is stripped out then tar’d or zipped.
If people supported the store, it gets bigger (like Costco – a US bulk store) and benefits the market.
But hey, I’m a dreamer!
Good move
As an avid user of FeedAPI, I think this is the right move. Aggregation needs will always be too fluid and complicated to hope for a powerful enough aggregator in core.
I agree with Boris, also. Aggregator should be orphaned.
Go smallcore
Onwards to smallcore, I guess. Time to ruthlessly rip out core pieces? What do you think, should we petition to remove aggregator? What about profile? BlogAPI? When is the time to start focusing on the best parts and removing the rest?
I’m almost ready to stick a fork in Drupal OOB and go full on for smallcore. Then we will be forced to have more innovation in install profiles.
Small core + strong install profiles
I’m all for a smaller core.
Long release cycles are stifling modules like forum, poll, aggregator.
At the same time, a sensible “full Drupal” package that addresses use cases like “blog” or “community site” is missing. I agree with Boris that Drupal does not do much out of the box. E. g. modules like Views are so essential to build anything that they should actually be packaged with such a “full Drupal”.
So I see a need in both ways: a need for smaller core to allow for different speeds of current portions of core and a need for a “full Drupal” that is more focussed on delivering on use cases (whatever we want them to be specifically).
The packaging infrastructure that Eaton brought up would allow us to do this.
On top of enabling us to better compartmentalize core, such packaging infrastructure would make it much easier to share any installer profile on d. o. – we would start to really leverage the community’s site building skills. An official “full Drupal” install profile would profit indirectly from such a new playing field of ideas.
I realize that such an approach would mitigate some of the problems I have described in my post above, but it would not solve them instantly. It would not generate a community around Aggregator instantly but it would create the conditions to make it possible.
So what I’d like to ask is:
- Alex
In the budget.
I actually had those items added to the 2009 Budget – see the Drupal.org Development Donations section in the accompanying PDF. These are matching funds, so we need to do fund raising and/or in-kind donations of development work.
My argument, by the way, is not the degree of how MUCH Drupal does out of the box, rather that there is some purpose / plan to WHAT it does. Can anyone easily explain the use case of core Drupal? Core Drupal is ______ enabling ______. Not really – it’s a half-assed selection of random default modules with a couple of content types – it’s not a blog, it’s not a static site, etc.
Trade-offs
Unfortunately, I think smallcore will always be a non-starter until the d.o. infrastructure is capable of offering a pre-bundled download for each install profile that’s maintained on the site. As much as I want smallercore, relative newcomers to Drupal are the least likely to walk through the steps to assemble the pieces a profile needs.
What steps are necessary for the packaging system to do that? Are there changes to profiles themselves that would make it possible? (Storing module dependencies in .info files rather than PHP code, for example.) We’ve talked about it for a long time, and argued for the removal of modules for a long time. Perhaps we’ll have to address it from the other direction.
I already identified some of the issues with install profiles
You can find them here : http://drupal.org/node/509404
The very first step is introducing .info files into profiles, which drastically reduces their complexity (you literally remove 3 hooks and replace it with a small text file) : http://drupal.org/node/509404
I’m committed to pushing this through for D7, and then we can look at the packaging scripts.
Oh, awesome. I was hunting
Oh, awesome. I was hunting for exactly that issue last night and didn’t find it (probably just late night fogginess). I’d like to jump on board and help with this set of issues; I think it’s critical for Drupal’s long term evolution. See you in the queue!