Adding Custom Geo Tagging to Managing News

Making it easy to import a custom geotaxonomy

Managing News can geotag items using its default database of 790 locations. But it’s also highly extensible, allowing users to import any custom geotaxonomy to tag any location in the world. Following up on my screencast on customizing the maps and geo tagging behavior of Managing News where I imported a CSV file with placenames and coordinates from around Washington, DC, I now want to show how I prepared the file using Quantum GIS, OpenOffice, and data from the DC government.

Adding Custom Geo Tagging to Managing News

The geotagger in Managing News works by performing a simple search through the text of each story it aggregates. It checks each word against the placenames in the database of geographic locations. If it finds a match, it tags the story with that placename. Each placename has a set of coordinates, which are used to plot the story on the map. My task was to replace the default geographic data that ships with Managing News with a custom data set tailored to Washington, DC.

The DC government has a vast catalog of open data and much of it is geographic. I found a shapefile called Metro Stations - DC Only, which contains a named point for each metro station in DC. Since many stories refer to locations by the nearest metro station, this data is perfect for use with Managing News.

Managing News requires geographic latitude and longitude points in decimal degrees. Many shapefiles will already be formatted this way (projected to WGS84), but the file from DCGIS shapefile is not. We need to reproject the data, and here is one way to do it:

  1. Open the shapefile in Quantum GIS.
  2. Right click on the layer and select Save as shapefile.
  3. Create a new directory for the files, provide a name, and click Save.
  4. Find and select the WSG 84 (EPSG 4326) projection.
  5. Click OK.

Now, we’ll open the new shapefile and copy the data over to OpenOffice.

  1. Start a New Project in Quantum GIS.
  2. Open the new shapefile you created during the last step.
  3. Go to View > Select Features to make the select tool active.
  4. Drag to select all of the points and go to Edit > Copy Features.
  5. Open a new spreadsheet in OpenOffice.
  6. Paste the data into the spreadsheet.
  7. Save the spreadsheet as a CSV file.

The CSV contains several columns, but only two are important for Managing News. The first is wkt_geom, which has the lat/lon coordinates for the point. The second is the name column. Managing News will simply ignore the other columns, but I removed them to clean up the file a little.

We now have a CSV file, but there’s a problem. The coordinates are currently in WKT format, but Managing News requires that we split the latitude and longitude into separate columns. For small datasets this can be done by hand, but that’s not feasible for larger amount of data.

We can automate this process using regular expressions, but it can get tricky. I used to the Find & Replace feature of TextMate for Mac OS X, which supports regular expressions. There are similar tools for other platforms (sed, vim) that can do this as well. Here are the steps I used:

  1. Open the CSV file in TextMate.
  2. Replace the first line of the file with the following text:
    "LON","LAT","NAME".
  3. Open Find.
  4. In the find field enter this regular expression for TextMate or this one for sed and vim.
  5. In the replace field enter the following text:
    "$1","$2".
  6. Check the Regular Expression box.
  7. Click Replace All and save the file.

The file is now ready to be imported into Managing News as shown in the screencast.

May 12 2010
Posted in MapBox.
0 tweets link to this blog post. Start a conversation with @developmentseed on Twitter.

Search

No results found.
About
Projects
Team
Blog