We’re currently working on a project to deliver weather information to remote areas that don’t have access to high speed internet. We plan to do this by exposing weather information as RSS feeds on a central server, and then pull these feeds in via a low-bandwidth connection over GSM. GPRS is the service that’s allowing us to do this, and there is one major issue with it – it only allows for data rates between 56 and 114 kbps. Just in case you’ve forgotten how little that is, it would take this page you’re reading now 10 seconds to load at 114 kbps.
At these data rates, you’re counting every character you pull from the remote server. So what are the options to reduce traffic between the server and clients pulling in weather data? I’ve identified four of them:
1) Use Last Modified Headers
There’s a simple way to make sure that you don’t download the entire feed when there are no new items on it – use the last modified date in the feed’s header.
This is how the last modified date looks on Development Seed’s blog feed:
# curl -I http://www.developmentseed.org/blog/feed HTTP/1.1 200 OK Date: Fri, 09 Jan 2009 17:28:51 GMT Server: Apache/2.2.9 (Unix) PHP/5.2.6 X-Powered-By: PHP/5.2.6 Set-Cookie: SESScc80b31090254b70d0fcb410a144c9f4=gj6r38c070o84h44att0qsfa86; expires=Sun, 01 Feb 2009 21:02:11 GMT; path=/; domain=.developmentseed.org Expires: Sun, 19 Nov 1978 05:00:00 GMT Last-Modified: Fri, 09 Jan 2009 17:28:51 GMT Cache-Control: store, no-cache, must-revalidate Cache-Control: post-check=0, pre-check=0 Content-Type: application/rss+xml; charset=utf-8
You can either conditionally download the feed with an “If-Modified-Since” header or download only the header and check for modified date yourself.
2) Only Download What’s New
This is a simple trick you can apply if you’re controlling the feed you’re polling (in our case, we are). Pass a timestamp to the feed URL that defines the last time you’ve checked the feed. The application then only serves up items that were added since the last time you’ve checked the feed. For example, http://www.example.org/feed?last-checked=2009-01-01 will only show items added since the beginning of 2009.
3) Compress the Feed
Once you know, based on the last modified date, that there’s new data in the feed, don’t just download it – compress and download it. Compression is something that both the server and the client need to support. For Apache, that’s mod_deflate, which ships with Apache 2.0, or the slightly higher compressing mod_gzip. Our client for this example is curl, which uses compression when called with the —compress flag.
For the Development Seed blog feed, the non-compressed transmission is 30,734 bytes and the compressed transmission is only 8,517 – that’s more than three times smaller:
# curl http://www.developmentseed.org/blog/feed > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 30734 0 30734 0 0 75535 0 --:--:-- --:--:-- --:--:-- 115k
# curl --compress http://www.developmentseed.org/blog/feed > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8517 0 8517 0 0 30937 0 --:--:-- --:--:-- --:--:-- 65603
4) Don’t Poll, Ping
Lastly, don’t poll for new data – instead have the server ping you when there’s news to fetch. This approach is called publish/subscribe (pub/sub) and while it’s never been widely used in the news syndication world, it’s the perfect solution in low-bandwidth scenarios. Usually for this you have the feed provider ping your application through an HTTP GET request that specifies the feed that’s refreshed. Your application then goes out and harvests it.
In our case, the client application is usually not online, but on the upside, there’s a callable GSM modem available. With this, the feed provider sends an SMS message with an update code to the client, and the client dials in and checks the feed specified by the update code.
These four methods are helping us keep the bandwidth we use in feed aggregation at a minimum. If you can think of another approach to squeeze a couple more bytes out of the process, post a comment – I’d be happy to learn about it.