Federated EHR Data for Public Health Surveillance, Including Cloud and Open APIs
Dan Gottlieb and Ken Mandl (Boston Children’s Hospital & Harvard Medical School); Josh Mandel (Microsoft Healthcare, Boston Children’s Hospital & Harvard Medical School)
Download and print the Public Health Data Modernization Executive Summary Report pdf icon[PDF – 666 KB]
Modern standards and technologies are evolving at faster rates and are designed to promote data liquidity. If architected appropriately, new approaches to interoperability built on top of these standards can help address longstanding public health data challenges, resulting in well-processed data that are useful and actionable. These approaches can also help preserve patient privacy and address multi-layered data access needs without requiring public health to shoulder all of the burden of operating and maintaining a separate infrastructure.
Federated Models for Accessing Data, Supported by Cloud-Based Infrastructure, Open New Opportunities for Public Health Data Modernization
Data must be processed to be useful. The data pipeline starts with getting the data out of the source system and applying standard transformations, such as mapping to standard formats or indexing the data, applying terminologies or other coding systems, and filtering out sensitive information such as identifiers.
The simplest way to process data and run analytics can be to use a centralized model. Under this model, users querying the data have a straightforward path because their applications only need to connect to one place and access data in the centralized datastore. Centralized models are not often used because there are a number of disadvantages and risks associated with them. There are significant privacy and security risks, especially if the system is hacked. There are reliability and uptime risks; if the datastore goes down you lose access. A centralized model may also lead to duplication of data and increased costs, especially when a data user wants to run a different type of analytics and may not have the opportunity to create their own infrastructure on top of the centrally stored data.
In contrast, under federated data sharing systems, data are stored locally at multiple nodes across a network. Each node runs software to map data to a common model, and nodes across the network are queried for intelligence. One advantage is that data control is local, and data breaches or system failures on a single node may not impact the entire platform. Also, the local node can be designed to be useful for local analyses and processes. However, federated data models currently in production—such as the ones funded by the National Institutes of Health or the Patient Center Outcomes Research Institute—require tremendous local expertise to get the data out of EHRs, they tend to be driven by principal investigators and not institutions, and they’re not supported by regulations. Furthermore, maintaining and updating node specific infrastructure can be costly and complex.
Combining a federated model with cloud-based capabilities provides advantages of both centralized and federated approaches. Organizations can control their own data and allow others to use the data for multiple purposes. Users have the advantage of accessing the data at multiple sites through a secure firewall without having to customize their requests at every site. This multi-scale approach helps to address sometimes divergent needs of federal, state, and local data users, while generating well-processed data that are useful and actionable across the public health ecosystem.
Organizations at each node of the network maintain control of their data and can use it for local objectives as well. Technology vendors may provide services to help run the infrastructure at scale and support more advanced analyzes of the data, but each site of care retains ownership and control. It is a very important aspect of the design. Building on top of the cloud can help standardize the infrastructure used at each node, resulting in better maintainability of the network.
There are Significant Advantages to Adopting Standards that are Regulated and Well-Supported by Multisector, Multistakeholder Alliances
In practice, every installation of every brand of EHR generally stores data in a unique, proprietary format, and those data need to be extracted from the EHR for meaningful analysis in another software system. The standards required by federal regulators will dictate what data elements will be captured and made available, in a standard format, nationwide. Because these same FHIR standards will be used to support payment, it is more likely that the data flowing through the regulated APIs will be high-quality and available in near real-time.
As the standards become more broadly adopted, public health can leverage FHIR APIs to harvest USCDI data elements (including clinical notes) from EHRs across the country without having to operate and maintain siloed reporting systems or one-off protocols for requesting data. This can help drive down costs and make it easier to build out a common set of data tools that scale and can be used at multiple levels of the public health ecosystem.
For example, the standard APIs will make it easier for provider institutions to use data from their EHRs to measure quality, track outcomes, conduct research, make resource allocation decisions, and offer clinical decision support. Through the same APIs, a state or local public health department could access up-to-date data and roll up the data to the level needed to fulfil their missions. Multiple users could explore the data through dashboards, dynamic visualizations, and more advanced analytics. Similarly, federal data users could use the same APIs to access de- identified data needed to drive policymaking.
The distributed model, in which each institution’s data are maintained separately, not only affords local control over data and participation in studies but also enables member institutions to develop important local applications that use their own data plus networked-derived intelligence. Subject to regulatory requirements, data holders at each level of the ecosystem could explicitly decide what queries can be run, and after the queries are run, they can also decide where and how to allow the results of those queries bubble back up. This functionality offers local control as it allows each local institution or public health department the ability to ask the same kinds of questions users at the national level might decide to ask.
In short, the standard APIs required by federal regulators coupled with cloud storage and elastic computing capabilities offer opportunities for co-development of public health data science that happens at scale at multiple sites across the country.
Standards and Technology Products Emerge and Evolve to meet Real-World Needs of the Communities Who Engage in Their Development, Testing, and Adoption
FHIR focuses on addressing real-world challenges. For example, SMART on FHIR is a protocol that helps individuals to access their own data or clinicians to plug apps into their EHRs. Bulk FHIR offers access to population-level data inside EHRs, and CDS Hooks provides in-depth clinical decision support tools that can integrate into EHRs in standard and scalable ways.
Community interest in building and maintaining interoperability standards needs to be driven from the top down, with policies and priorities, but also from the bottom up, with grassroots implementations. The standards development process brings a broad set of viewpoints that help stakeholders go from a set of needs, to a set of detailed prototypes, and ultimately to a set of standards that feed into regulations. Public health can build on this community process to extend and accelerate the adoption of standards others have committed to implement instead of building a one-off, public-health-specific approach.
The technology giants and the EHR vendors have an important role in the development of such a system. However, the regulated APIs dramatically reduce dependence on EHR vendors for building a system. A standards-based, interoperable surveillance backbone can be built and managed as a multi-stakeholder public utility, with the opportunity for innovation and contribution by government, academia, and business.