Natural Language Processing Workbench Web Services
Overview of Project Activities
The NLP Workbench project will include five steps: environmental scan; stakeholder engagement, requirements gathering, and technical design; prototype development; pilot testing; and release.
In the United States, central cancer registries collect cancer data from sources such as hospitals, laboratories, physician’s offices, and independent diagnostic and treatment centers. While Meaningful Use and other activities have increased the use of standardized electronic health record (EHR) systems, some parts of the medical records, laboratory reports, and other clinical reports are still free-form, unstructured (narrative) text.
Computers cannot process narrative text automatically; human intervention is required to extract the critical pieces of information needed to complete a cancer case report. Similarly, trained abstractors must retrieve and code the appropriate elements from the clinical narrative text submitted to the U.S. Food and Drug Administration’s (FDA’s) spontaneous reporting systems for drugs, vaccines, and blood products.
The process of abstracting these data manually is labor-intensive and expensive. In addition, a diminishing workforce and an increased demand for timely and accurate data that are stored in narrative text create challenges. The unstructured narrative text in pathology, post-market, biomarker, and EHR reports contain data researchers need to study overall population health and quality of patient care.
The use of natural language processing (NLP) will increase the completeness, timeliness, and accuracy of data while reducing the level of human intervention needed to identify critical data in narrative text.
The Assistant Secretary for Planning and Evaluation’s Patient-Centered Outcomes Research (PCOR) Trust Fund funded FDA and CDC for two years to develop an NLP Workbench on a shared Web service platform for PCOR researchers, as well as public health agencies at all levels. The NLP Workbench will provide free access to NLP and machine learning tools to develop and share language models and other algorithms that convert unstructured clinical text to coded data. The NLP Workbench will consist of open-source architectures and tools that any public health agency can use to develop NLP services, and will be hosted initially on CDC’s Innovation Research and Development lab.
CDC and FDA hosted the first quarterly web call on April 26, 2017, to share progress and gather input from interested stakeholders. Please see the list of questions and answers [PDF-24KB] that were discussed during the call.
Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. Journal of Biomedical Informatics 2017.
For more information about this project or to join the NLP stakeholder meetings, please send e-mail to NLPWorkbench@cdc.gov.