Development Seed

Blog

We Will Geocode Anything

Putting News Stories on the Map

We’ve long wanted our team aggregator and media analyzer Managing News to automatically geotag the news that it tracks. But getting this to happen presented some interesting questions and challenges. What does it mean to put a news story on map? Should it show where the news is coming from or what part of the world is being talked about?

We decided that in the case of Managing News and the people using it to monitor the news, it’s more important to map what the news is about. We want to be able to show a map that people can look at and immediately know what is being talked about – like in this map, showing news tracked about several key financial institutions activities.

We Will Geocode Anything

But this raised more questions. How can you geocode the content of a news item, and could we do this meaningfully? And most importantly, how do we identify locations in a stream of text?

Turns out the answer to these questions is yes, and doing so is simpler than we had anticipated. In my last blog post, I talked about the third party tagging services we’ve been testing and evaluating. tagthe.net is one we’ve been looking into, and it provides a similar service to Yahoo! term extraction. The main difference between the two is that tagthe.net tends to return shorter tag names, and it groups these into tags, locations, people, and languages. This classification of terms solved our problem of how to extract locations from text – we simply let tagthe.net do it for us. 

In Managing News we set this up so that once tagthe.net identifies a location, that information is passed to the GeoNames.org geocoding service, which is a geographical database of information like latitude and longitude coordinates. In Drupal terms, we're geocoding taxonomy terms in this process, which works well with how we want locations to behave in Managing News. Fundamentally, locations in Managing News are a concept with an additional attribute - a spot on the earth somewhere - so taxonomy is well suited to manage them. 

As Managing News or any website geotags its content, it builds up a vocabulary of places. This vocabulary then becomes a reference for content in the site so we don’t have to use the geocoder as often. For example, we only have to geocode "Washington DC" once. Also if we don't like how a term has been geocoded ("Georgia" for example is more likely to be geocoded as the country rather than the U.S. state), we can customize that term's location for a particular instance in Managing News.

This term extraction and geocoding is all happening outside of Drupal. We're using the Python daemon I described in my last post to contact tagthe.net and GeoNames and link it to Drupal. Each of the third party services we're using has a plugin for our Python program. Once we’ve enabled the tagthe.net and geonames plugins, new nodes are first sent to tagthe.net for semantic analysis and then, if locations are returned that haven't already been geocoded, they are sent out to geonames for processing.

The combination of these services has made it possible to display data in some very interesting ways. We’re still looking into how to best take advantage of the data and display it visually, but immediately we were able to generate some nice maps like the one shown above. It will be fun to figure else what else we can do. 

Also, I wanted to say that we’re not the only ones using Python with Drupal to get information like this. Boris Mann posted last month about a project that’s looking to re-implement Drupal in Python. Is anybody else out there working with Drupal and Python, or any other languages (perhaps java)?

Aggregation

Hi -- just to say that this is an interesting area.

For an intresting use of geotagging/mapping, look at www.misdaadkaart.nl -- an interactive map of all crime in the Netherlands, built by scanning police press releases.

--------
For aggregation, I generally find Google reader to be progressing very quickly.

--- Suggested features:
1) Integration with the JRC's Media Monitor

Syntax:
http://press.jrc.it/rss?type=search&language=all&mode=quick&all=drogensucht

Instructions

--> change the string subsequent to 'all=' to add a search string

--> to filter news in a specific language, change the language parameter (=all) to language (=en, =pl, etc.). For example: http://press.jrc.it/rss?type=search&language=fr&mode=quick&all=darfur

Geotag.. geocoding..

Am a lil confused with these tems.. I think I've to come back again in search of these terms..! (* am not updating myself recently!)

GeoParsing

There are several other tools available, and as you point out, it's not that difficult to build a geoparser.

MetaCarta is a company that specializes in this, and they have beta RSS-to-GeoRSS that adds tags for all locations in a post, not just a single one - though there is no demarcation between the aspects of the article (byline location, about location, etc)
http://labs.metacarta.com/rss-geotagger/

It does get difficult if there are several locations in a news story. And how are the different locations related to one another?

At Mapufacture (http://mapufacture.com) we're building a geospatial aggregator for news, user-generated content, and other geospatial data, and making it available to users through their browser, mobile phone, GPS, and even paper.

Thanks for the links!

I had not seen MetaCarta's services before, and what you are doing at mapufacture is also very interesting. I'll need to take a good look at both.

-jeff

Yummy!

How do I get my hands on these goodies?

Hi Robert, Development on

Hi Robert,

Development on these tools is currently pretty rapid and a lot of improvement is happening. We've got a plan for how we'd like to get this stuff into the larger community, and we'll let you know when that starts happening.

-jeff

II'll be keeping an eye out also.

I'll be keeping an eye out also. I have two projects in mind that this would be perfect for, especially with the mapping & newsletter options.

One is long term and open-ended, but the other's prime marketability has a limited time frame. I had hoped to start the second at the beginning of the year with BuzzMonitor, but this would be even better.

The mapping tool had a Drupal module

Opps, I forgot to mention that the DIY mapping tool I mentioned has a Drupal module that may help in its use. It is kind of new though, and I myself haven't had a chance to use it.

http://drupal.org/project/diymap

A mapping tool that might be of interest.

Love where managing news is going(BuzzMonitor is pretty cool too)

Just wanted to pass along a mapping tool that I thought you might find interesting

http://backspace.com/mapapp/

A clickable, zooming map written in Flash and colored by data from an external text file.

The external data file makes it easy to customize and update state colors, add points, and use the same Flash file many times in the same Web page with different data sets.

ARG - I hit the info on formatting options link below and lost my original msg

in above text

Agaric Design apologizes for the technical difficulties, let's try those middle paragraphs without an unclosed link tag:

As for Drupal and other languages SocialWay, a site for sharing stuff (and saving the environment and building community at the same time) has just been open-sourced and turned over to non-profit management by the Center for Information Awareness, which runs COA News.

SocialWay is written in Java and enhancing it with location-specific (independent) news would be awesome.

If Development Seed or other folks are interested do contact Agaric Design or Steve Anderson.

Geocoding taxonomy

How do you store the geocoding of taxonomy terms?

For WSF2008.net (the world social forum this year is local calls to action all over the place) Agaric Design added a dead-simple addition of latitude and longitude (and a field intended for radius that devolved into a Google Maps zoom). It's called the Taxonomy Location module and its in the new, in-development Place module, which is doing some other taxonomy and place stuff.

As for Drupal and other languages SocialWay, a site for sharing stuff (and saving the environment and building community at the same time) has just been open-sourced and turned over to non-profit management by the Center for Information Awareness, which runs .

It's written in Java and enhancing it with location-specific (independent) news would be awesome.

If Development Seed or other folks are interested do contact Agaric Design or Steve Anderson.

For myself I am very interested in everything Development Seed is doing with Managing News. Keep up the updates!