Today the National Library of Medicine launched Pillbox for Developers. Developers now have access to powerful tools for processing and publishing drug label data. Ultimately this will lead to more accurate and accessible drug data for pharmacists, health professionals, advocates, and patients.
Digesting Complex Drug Data
Drug label data is a incredibly complex and critically important set of data. In the United States, the Food and Drug Administration (FDA) maintains a massive dataset off all prescription and over-the-counter medicines. Over 16GB of XML data is made available on a daily basis. Processing this data is technically difficult and time consuming.
Pillbox is an initiative of the National Library of Medicine, to produce an easy to use, “pill-focused” view of this data. Pillbox is one of the largest freely available drug label and image datasets in the US. The Pillbox website and API are a critical resource for pharmacists, health professionals, journalists, and citizens to access drug information. Over 30,000 products are available to be searched by shape, size, color.
Acetaminophen has recently been found to have significant liver health risks. Pillbox enables searching of dosage amounts and images of pills with acetaminophen, like Tylenol above.
Vastly Improved Processing
We worked with Pillbox to move from a manual processes that took nearly two weeks to process data to one process that now takes 45 minutes. We built a new process using Python that downloads 16GB of XML files, consumes and parses the data, checks for errors and individual products, and produces a new drug label dataset based on individual pills. The improved process is faster and it produces more accurate and usable data.
Making Pillbox More Open
We are also helping Pillbox to be more open. The data processing code is open source and is available today on GitHub. We’ve worked with Pillbox to structure and comment the code in a way to make it more accessible to open source developers. In addition to the existing API, Pillbox will make raw datasets available for bulk download.