Synthetic Patient Dataset for the Determination of Ventilator-Associated Events in Electronic Health Record Systems

Project One-pager: To view the project methodology, results and opportunities for reuse in public health, click hereCdc-pdf.

Project Results: To view the PowerPoint presentation from CHIIC’s February 2015 meeting, click hereCdc-ppt.

Project Status: Completed

Point of Contact: Barry Rhodes, PhD, Development Team Leader, Surveillance Branch, Division of Healthcare Quality Promotion, National Center for Emerging and Zoonotic Infectious Diseases

Center: NCEZID

Keywords: Ventilator Associated Events, Healthcare Associated Infections, Electronic Health Record systems

Project Description: Each year in the United States, hundreds of thousands of patients are admitted to intensive care units (ICU) with critical illness; many of these patients require life-saving mechanical ventilation for respiratory failure. There are many complications that can occur in patients on mechanical ventilation, and ICU mortality rates in these patients are high—as high as 27% in one study. Healthcare-associated infections are among the complications that can happen in mechanically-ventilated patients, and of these, ventilator-associated pneumonia (VAP) is one of the most common. The Centers for Disease Control and Prevention (CDC) conducts national surveillance on VAP and other healthcare-associated infections (HAI) through the web-based National Healthcare Safety Network (NHSN).

For many years, surveillance for VAP and evaluation of VAP prevention measures have been hampered by complex, subjective VAP definitions that are difficult for users to learn and apply. Over the past year, the CDC has collaborated with a working group of critical care, respiratory care, and healthcare epidemiology experts to change the approach to surveillance in mechanically-ventilated patients. The working group developed a new national surveillance definition algorithm for “Ventilator-Associated Events” (VAE). This new approach is based on objective clinical data available from all mechanically-ventilated patients and often exists as structured data in Electronic Health Record (EHR) systems. An EHR vendor seeking to automatically detect and report an HAI within their data will first read and interpret the definitions from a written description. The interpretation will be handed off to a programmer who implements the definition in computer code. Currently the only means of testing the accuracy of the coding is by manual inspection by the vendors and/or system users. There is no systematic and thorough way to test the accuracy of the coded definitions within the EHR system in terms of a system’s performance in using patient data to detect and report an event . What is needed is a synthetic dataset that is seeded with fictitious patient records, a known subset of which meet the NHSN VAE definitions. By crafting the dataset carefully and obtaining the assistance of subject matter experts, most of the likely variations in the data can be represented in the dataset. Because the data is fictitious, it can be made available freely on the NHSN website for anyone to download. This dataset would be imported into a test EHR system and the coded HAI definitions run against it.

The output would then be compared to the expected results (also available on the NHSN website). A one-to-one match between the output of the EHR system and the set of positive patient records would indicate that the definitions were accurately interpreted and coded. Any discrepancies would be investigated by the vendor and corrected as appropriate. While initially created to assist vendors through self-evaluation, a gold standard dataset such as this could become a part of a more detailed certification process for vendor systems in the future.

Impact or potential impact of project if successful:

  • Facilitating vendor implementations of HAI detection and reporting
  • Improved data quality of electronically reported VAE events to NHSN.
  • Greater integration of public health with EHR vendors.


  • Because the dataset is fictitious and freely available to anyone, any EHR vendor or hospital IT staff can test their coding against the dataset.


  • Create the synthetic dataset with the help of SMEs in DHQP
  • Validate the dataset against the existing VAE web service and publish to the NHSN website, the dataset and the results
  • Announce the availability of the dataset on the monthly NHSN vendor calls
  • Work with volunteer vendors to fine tune the dataset

Measure of Success:

  • Adoption by EHR vendors who code the VAE definitions
  • Feedback for EHR Vendors as to usefulness

For more information about this project, please contact the CHIIC at or Brian Lee at

Page last reviewed: February 15, 2019