Opening Data on 7,500 Higher Education Institutions and Almost 14,000 U.S. School Districts

Blog

Estimated
3 min read

The Federal Education Budget Project at New America Foundation just relaunched its open data site EdBudgetProject.org, adding new mapping and comparison tools for more than 65 indicators about the 14,000 public school districts in the United States and an entire new section monitoring 7,500 higher education institutions.

The site features fast auto-complete search that allows you to quickly find any school district or higher eduction institution right from the homepage. Each of the two datasets includes state rankings, showing in a glance how states compare. The maps on the site have also been enhanced with chlorophleth displays, improved interactions, and more appealing base maps.

79704c6094dd  0GEvziEcILq25T U7

The new higher education dataset includes information about federal grants and loans (like as Pell Grants and Work-Study), graduation rates, and tuition and fees. This information is available for both individual institutions and in aggregate at the state level.

79704c6094dd  0jMf9boq1CD1xCB0O

The new iteration of the site also features updated data browsing and mapping tools. The site is intended to allow natural exploration the data. As you view information about a school or state, nearly any indicator can be used as the basis of a comparison which visualizes how that school or state against similar ones. You can also dig down into fairly complex comparisons using the site’s graphing tools.

79704c6094dd  0UtY5AKklVIPUEhbO

We also completely overhauled the administrative backend of the site. All of the data in the site is managed externally, with the bulk of it imported in batches. An issue in the previous implementation was that it was possible (though not likely) that the site’s database would import data incompletely if a page was requested while data was being added or if the import failed. Additionally, it wasn’t possible to review the imported data before it was available to the general public. So any formatting or precision issues would be immediately visible, and require a re-import or database rollback to fix.

79704c6094dd  0 BwotmuQOPs3daza

With the new version of the site, we implemented a versioned dataset manager. Each of the four datasets in the site (public schools state level, public school districts, higher education state, higher education institutions) can have any number of versions of data in the site at a time. There is always a version of each set to active, and it’s possible to set one to ‘preview’ so that it’s visible to administrators only.

Metadata about each dataset is managed in a similar way. For every indicator in a dataset there is a set of details that the site needs to know, from things like a title and description to the required formatting. This information is captured in a schema document. In exactly the same manner as the data itself, these schema documents can have any number of versions available in the site, and a particular one is set as active for each of the four data sets. Previewing also works for schemas, allowing administrators to verify that meta data and formatting changes are proper before a schema is live. This flexibility in managing and previewing data sets and related schema information makes administering the site a more reliable experience.

The site itself was build using Express and Node.js. The administrative backend leverages socket.io to provide real-time updates about data imports and validation. The data is stored in Couchdb, and the site uses Elastic Search to index all the data.

What we're doing.

Latest