Over the last few weeks, we worked closely with the Humanitarian OpenStreetMap Team and Azavea to come up with a map completeness estimation model for building data in OpenStreetMap. This is directly used in the pilot of HOT Analytics for Health.
One of the most important analytics question we’ve come across in OpenStreetMap is how complete the map is in certain areas. There can be several definitions of completeness — it depends on the context, the attributes, and a lot more on how someone plans to use the data.
HOT has been working with partners like the Clinton Health Access Initiative (CHAI)and the Botswana Ministry of Health and Wellness on malaria elimination. The experts use OpenStreetMap data to plan spraying campaigns, distribute prevention supplies, and prioritize areas for the mission. These campaigns are only effective if they can identify where living structures such as buildings are located — even a small gap could undermine the effort to tackle growth and spread of mosquitoes. This is why we built HOT Analytics for Health.
HOT Analytics for Health uses machine learning to build a model that correlates population density in a given geography from WorldPop, and compare this with distribution of built-up area that’s mapped in OpenStreetMap. The model uses population density distribution per 100x100m, rasterizes OpenStreetMap building data to compare them for building completeness in OSM.
Together with attribute and temporal accuracy analysis, the results are promising — here for example is Gaborone, the capital city of Botswana. You can see how recent mapping campaigns have increased the building data coverage in the city centre and this starts falling low as you go to the suburbs.
Currently we use OSM QA Tile country extracts and the data around tile boundaries can lead to under/overestimation. WorldPop has several gaps and when compared directly to buildings could result in sparse estimates in some parts. The tool alone is not sufficient for accurate measurements, but the analysis presents indicative results and a good direction for analytics tools.
This is work in progress, while we experiment different methods of understanding completeness. Take a look at the code repositories, and let us know what you think!