Importing and Aggregating Stuff with Feeds
Three examples of importing and aggregation in Drupal
Nearly three months ago we released the Feeds module. When we introduced it, I explained that Feeds has a fundamental dual nature as both an import framework and an aggregator. When you think about it, aggregation is nothing other than a scheduled import, so why would you build two different infrastructures for virtually the same functionality?
I'd like to illustrate this point with three examples that show how Feeds can be used to aggregate RSS and import users or nodes from CSV. If you would like to follow these examples, all you need to do is install the latest version of Drupal 6 and the latest version of Feeds 1.0.
1. Aggregate RSS or Atom
This video shows how an RSS feed like the one on our blog can be aggregated using Feeds. It further explains how to adjust the default settings to capture categories on the feed as taxonomy.
2. Import Nodes
This video shows how nodes can be imported into Drupal and how to adjust the default settings to populate CCK fields.
3. Import Users
This video shows how users can be imported into Drupal. It takes a look at how default settings need to be adjusted for different CSV files.
Looking around
If you are looking for import solutions in a migration context, there are alternatives to Feeds such as Migrate, Table wizard with CSV import support, User import, and Node Import.
Migrate and Table wizard enjoy very active development and are used for complex migration tasks. You want to use them if your data needs cleaning and reorganizing before an actual import. I could see Feeds helping these modules by providing a pluggable import functionality (which would be as easy as writing a Feeds processor that populates database tables). To what extent this makes sense is a discussion that I'd be happy to have with the maintainers.
User import is a great module if you are looking for more advanced functionality around importing users (for instance notifying the imported users by email).
From all the modules that I've mentioned, I see the most overlap with Node import. Both Feeds and Node import aim to offer full blown import functionality for nodes.
In comparison to the modules mentioned here, Feeds' philosophy is to allow recurring action and pluggability. Feeds does not make any assumptions on whether an import is being used at build time or throughout the life time of a site. It provides a common architecture on which the three stages of importing stuff (fetching, parsing, processing) rest and it allows the site builder to configure these three stages to fit the use case at hand.
Looking forward
In the last two months several developers have gotten onto the Feeds train and have contributed valuable additions and fixes. While we use Feeds in production, we consider it in progress and I encourage interested developers to join in. Here are some of the issues that are high on the priority list:
- Additional mappers like support for Filefield and Imagefield, Date, Emfield, etc.
- Full enclosure support
- Use Batch API
- FeedAPI upgrade script
- Pubsubhubbub and RSSCloud support
- Mapping on import
Is it possible to have a
Is it possible to have a periodic import with a csv file?
Yes, it is.
Yes, it is.
embedded media field doesnt work
as title says. ive been reading feedsapi documentation and watching video tutorials. it seems it is impossible map RSS feeed field (e.g. youtube url) to Embedded media field (custom CCK field). it is possible to map to other custom fields like “text”.
There is a patch in the queue
There is a patch in the queue for this functionality http://drupal.org/node/623432
about your feeds modules
was watching a screencast by Sean Effel from Drupaltherapy about FeedAPI and Emfield, (http://drupaltherapy.com/screencasts) as I am new to Feeds and hadnt worked with FeedAPI, would it still work with Feeds and if so could you do a screencast for Feeds and Emfield. I think your module Feeds is great by the way, I feel like a bit of an idiot, as i have spent hours trying to work out why the feeds where not posting on the home page, wasnt until I checked the logs that I realised I had forgotten to take off the mollom protection for the feed items
Defining Sources
Are you able to add on to the Sources under Node Mappings so you can parse a XML file that has CCK fields in it and target their CCK counterparts on import?
Have a look at these issues on d. o.
http://drupal.org/node/662504
http://drupal.org/node/631104
fetching HTML
Alex, I’ve been trying to get feeds to pull down to a node some HTML from a traffic report site. (http://www.dot.ca.gov/hq/roadinfo/sr17)
Am I correct in thinking that since the data returned is not in XML format that feeds won’t work for this purpose? I’m evaluating Managing News for use with our transportation folks and looking for a method to create a feed of local highway reports.
thanks,
Peter
Peter: this is the issue that
Peter: this is the issue that you are looking for: #631104 Extensible XML parser (mapping more sources)
Further, it is really easy to write your own parser for a “custom” format, see the documentation here: http://drupal.org/node/622700
Great Presentation - But need views integration
Alex,
This was a great presentation.
My needs are slightly different. I want to leave the data on the remote server and map my view to a remote data source and have it shine through whenever the block related to that view is displayed.
Do you have any knowledge of any module work being done in that direction?
We would also like to be able to map cck to remote data sources so when we create data for that content type, instead of storing it in the Drupal database, it will pass the import to the remote data source. Any ideas for that? Basically offer CRUD functions to remote data sources and have virtual nodes in Drupal.
Thanks for any suggestions.
Views 3 or Feeds in Sync mode
Views 3 will support external data sources (non-RDBMS data sources). Take a look at Jeff’s post from earlier this year where he explains how he used a patched Views 2 to query Flickr directly with Drupal.
Depending on your task, you could also consider running a Feeds importer in a “sync mode” that would only import currently available data from a remote data source and prune all data that is not available anymore. I’ve been thinking about such a sync mode for a while now, your comment prompted me to actually open a feature request: #661314 “Sync” or “cache” mode.
This is good to know about. I
This is good to know about. I don’t have any immediate uses for the scenarios you explained here, but its versatility in mapping any cck filed opens up a lot of doors. Good presentation, and very easy to follow. Thanks Alex.
Import from XML files
Can we make Feeds grab data from a static file list, instead of an XML feed?
What would I do to set that up?
It's in the works
You’ll have seen in screencast 2 and 3 how you can upload a CSV file from your local storage to the site and have feeds import from that file.
But I think what you are asking – correct me if I’m wrong – is if Feeds supports pulling a list of files sitting in a directory. You would point Feeds to a directory and then it would subsequently open each file and import it.
While Feeds does not support this out of the box (yet), it is very close. Defining a source by directory would be as trivial as writing a patch for the existing file fetcher that makes it scan directories for files. If the found files can be concatenated you’re pretty much done. Most likely though, these files will be too different or too numerous to be concatenated.
At this point, you’ll want to batch them.
To do this right, the issue to solve is #600584 Use Batch API. Once Batch API support is in for processing, it can be extended to also support batching on the preceding import stages – fetching and parsing.
User Import
Alex, thanks for the kind words about User Import module! My road map (once it’s ported to Drupal 7) is to break it down into more re-usable components, and I’m looking forward to seeing what I can use from Feeds and Migrate.
Points of integration
Hi Robert –
There may indeed be points of integration. Specifically I’m thinking of the fact that feeds can be used as an API. It may be beneficial for user import to rely on it. This question definitely requires more research. If you have any specific questions around Feeds – hunt me down on IRC.
Alex