Parsing and Storing Data from SMS Messages

Blog

Estimated
5 min read

SMS is an excellent way to collect data from remote areas where internet connectivity is sparse. Drupal’s SMS Framework makes it possible to create an intelligent message parser, so the incoming data can be processed and stored in a useful manner.

The Scenario

A farmer in Tanzania is part of a data collection project for an NGO. She reports the prices she gets for her chickens and goat over SMS each time she goes to market. The NGO needs to extract the location and the prices from the text message and store this data so it can be analyzed, displayed, and exported as needed.

b935ae80c3cc  0OdpUl3tHTkVR5CJM

Photo by Erwin Bolwidt (El Rabbit)

It’s important the the parsing process be forgiving. Common typos or user of shorthand should not cause the parse to fail. Even though they vary slightly, each of the example messages below will trigger the same behavior: a node will be created and tagged with the ‘NW Dar es Salaam’ taxonomy term. The chicken field will be populated with 45.09 and the goat field will be populated with 3.32.

Example #1: “NW Dar es Salaam” Goats: $45.09 Chickens: $3.32 Example #2: “NW Dar es Salaam” g:45.09 c: 3.32 Example #3: “NW Dar es Salaam” chicken:3.32 goat:45.09

Using Drupal, the SMS Framework, and some knowledge of regular expressions, any organization can start collecting and compiling bits of data like this from remote locations.

The Code Walk Through

Our code will implement a Drupal hook called hooksmsincoming(). The SMS Framework will call this function when a new message arrives.

We start by checking the value of $op to make sure that it is ‘process’. Other possible values include ‘pre process’ and ‘post process’, which are used to execute actions before or after processing occurs.

We want to clean up the message by forcing each character to lowercase and by trimming the whitespace off the beginning and the end of the string.

$message = strtolower($message);
$message = trim($message);

We then do our first regular expression match, which will give us the location that the report relates to. The value of $matches[1] is our location. The location is them removed from the $message so we can continue processing without it getting in the way. We reset $matches so we can use it again.

if (preg_match('/["'](.+)["']/', $message, $matches)) {
$location = $matches[1];
$prices = trim(str_replace($matches[0], '', $message));
$matches = array();

Our next regular expression finds any text that looks similar “key:value”. We then loop through the results and add each value to a new node object.

if (preg_match_all('/(\w+):\W*([\d.]+)/', $prices, $matches, PREG_SET_ORDER)) {
$node = new StdClass();
foreach ($matches as $match) {
foreach ($match as $key => $value) {
if ($key == 1) {
$field = 'field_' . $value;
}
elseif ($key == 2) {
$node->{$field}[0] = $value;
}
}
}

Finally, we look up the taxonomy term for the location that we extracted at the beginning of the process. We define the type and title for the new node and save it.

if ($term = taxonomy_get_term_by_name($location)) {
  $term = $term[0];
  $node->type = 'report';
  $node->title = 'Report from ' . $term->name;
  $node->taxonomy[$term->tid] = $term;
  $node = node_save($node);
}

Here is the complete example code:

function mymodule_sms_incoming($op, $number, $message) {
  if ($op == 'process') {
    $message = strtolower($message);
    $message = trim($message);

    $matches = array();
    
    if (preg_match('/["\'](.+)["\']/', $message, $matches)) {
      $location = $matches[1];
      $prices = trim(str_replace($matches[0], '', $message));
      $matches = array();
      
      if (preg_match_all('/(\w+):\W*([\d.]+)/', $prices, $matches, PREG_SET_ORDER)) {
        $node = new StdClass();
        foreach ($matches as $match) {
          foreach ($match as $key => $value) {
            if ($key == 1) {
              $field = 'field_' . $value;
            }
            elseif ($key == 2) {
              $node->{$field}[0] = $value;
            }
          }
        }
        
        if ($term = taxonomy_get_term_by_name($location)) {
          $term = $term[0];
          $node->type = 'report';
          $node->title = 'Report from ' . $term->name;
          $node->taxonomy[$term->tid] = $term;
          $node = node_save($node);
        }
      }
    }
  }
}

What we're doing.

Latest