Analytical Challenges for Emerging Public Health Surveillance
Corresponding author: Henry Rolka, Office of the Director, Public Health Surveillance and Informatics Program Office (proposed), CDC, 1600 Clifton Road MS E97, Atlanta, GA 30333; telephone: 404-498-6626; Fax 404-498-0585; E-mail: firstname.lastname@example.org.
The root of effective disease control and prevention is an informed understanding of the epidemiology of a particular disease based on sound scientific interpretation of evidence. Such evidence must frequently be transformed from raw data into consumable information before it can be used for making decisions, determining policy, and conducting programs. However, the work of building such evidence in public health practice — doing the right thing at the right time — is essentially hidden from view. Surveillance involves acquiring, analyzing, and interpreting data and information from several sources across various systems. Achieving the goals and objectives of surveillance investments requires attention to analytic requirements of such systems. The process requires computer programming, statistical reasoning, subject matter expertise, often modeling, and effective communication skills.
Public health surveillance relies heavily on data collected by two different approaches that require different types of analyses and interpretations of what the data represent. First are surveys that are designed to be representative of the population from which they are sampled. These data are analyzed using validated statistical methodologies that directly correspond to the survey design for exploration and inference in public health. In contrast, data used for public health surveillance that are not collected using probability sampling (e.g., case reports, automated electronic health record data, or syndromic surveillance) require a different approach for analysis and interpretation. To avoid bias, maintain objectivity, cross-validate findings, and ensure data quality, analysts must work with the data empirically on a regular basis (i.e., every day) to have a thorough understanding of the data-generating environment, detailed particulars of the specific data source, and the purpose of the surveillance system.
Integrating and analyzing data from new and multiple sources pose new challenges. A major reason is that time and experience are fundamental to learning about the data, the system, how to prepare the data for analysis, and to analyze the data and create reports, often on a rapid cyclic schedule. In certain instances, the required work has never been done before. A contemporary example is BioSense (1), which brought together data from numerous disparate systems, relying on expert analytic data managers to quickly assess new data-source content, guide systems developers in incorporating new data into analytic data warehouses and data visualization applications, and to provide data content details to statisticians and epidemiologists preparing analytic algorithms.
One critical requirement for successful public health surveillance is the ability to analyze and present data so that it is understandable to leaders and the public. This can be viewed as the cross-cutting operational work space between data availability in data base architectures and useful information derived from data provided or generated for surveillance purposes.
This report proposes a vision for the analytic challenges for emerging public health surveillance, identifies challenges and opportunities, and suggests approaches to attain the vision. This topic was identified by CDC leadership as one of six major concerns that must be addressed by the public health community to advance public health surveillance in the 21st century. The six topics were discussed by CDC workgroups that were convened as part of the 2009 Surveillance Consultation to advance public health surveillance to meet continuing and new challenges (2). This report is based on workgroup discussions and is intended to continue the conversations with the public health community for a shared vision for public health surveillance in the 21st century.
A strong data analytic foundation is implemented widely and guides public health surveillance.
Several ongoing and new analytic challenges for public health surveillance are apparent. Continuing challenges include managing data originating from disparate sources, protecting confidentiality, and attracting and retaining staff with appropriate skills. New challenges include demands for early detection of disease and visualization.
Effective data management is critical to the public health surveillance mission; however, appreciation of the quality of data needed for appropriate inferences and interpretation is often lacking. Data management is the development, implementation, and maintenance of plans, policies, and programs that control, protect, and enhance the value of data. Cleaning and manipulation are not intended to alter data to reach a desired conclusion, but to ensure that data accurately reflect the true nature of what has been measured. Preparing high-quality data for public health analysis requires transformation from the data collection system for use in different formats to conduct quality checks and to prepare it for the analysts who need the analytic "flat" file (Figure 1). The analytic data management work function serves as a crosswalk across domains (Figure 2).
Early Detection of Emerging Diseases
The need for enhancing detection of emerging diseases faster and enhancing public health emergency response and recovery capabilities introduce new analytic challenges. Signal (or aberration) detection algorithms, applied to real-time processing of electronic medical records data, generate syndromic surveillance capacity to monitor for disease outbreaks and to support situation awareness and recovery monitoring. These new methodologies, developed during the smallpox vaccination activities and the anthrax attack of the early 2000s (1,3), also are useful for detecting emerging infectious diseases (e.g., severe acute respiratory syndrome), extending analytic capabilities for chronic diseases, and developing approaches to support health-care reform. Prudent application of new analytic surveillance methods and interpretation of results from novel data sources used for public health (e.g., patient health encounter records) might require interdisciplinary collaboration across public health and health-care domains, epidemiologic and statistical science domains, and public health jurisdictions.
Inadequate Computing Resources
With the increase in number of sources and volume of data available for analysis, insufficient resources in the computing environment might be a limiting factor on timely processing of data and communication of results. This is particularly true for observational data such as those collected from longitudinal studies or various surveys or surveillance systems. When an emphasis is placed on real-time analysis and dissemination of the processed results from data, visual displays of data might be important. For example, displaying trends or clusters might provide information of potential bioterrorist activity, and maps of disease incidence/mortality might help target epidemiologic investigations (4). Because information needed to respond to an acute event needs a rapid response and simple, understandable display of complex data, a proactive approach would be to anticipate a need for sophisticated graphics display technology and plan for study of the cognitive aspects of such technology and how it will be used. Improved graphic displays of data is an area that requires further study.
Shortage of Skilled Staff
Human resources to accomplish analytic data management, statistical analysis, methods for performing geographic and other information displays, visualization of data and effectively communicating uncertainty in health-data evidence are needed in public health surveillance. However, persons and teams with the required skills and experience are in short supply. Furthermore, while core competencies have been developed for some public health professions (e.g., epidemiologists) to ensure staff have the skills needed to successfully perform this work (5), none have been developed for public health data managers and analysts. Leaders and managers and decision-makers who allocate staffing resources but have not worked directly in analytic data management must trust subordinates to accurately characterize resource requirements that may on the surface, appear inflated. The challenge of recruitment and retention of analytic staff is amplified in public health surveillance because of low pay grades compared with other industries. Within operational programs, analytical knowledge, procedures, and operations are frequently the most complex and detailed areas. Public health curricula have not been able to keep pace with new data management and analytic requirements. Courses that relate specifically to public health analytic data management with administrative data, a large part of where public health surveillance now resides are few.
The beginning of the 21st century marked an era of rapid growth and change in information resources that can be useful for public health surveillance. Many new data sources with huge amounts of data can be expected from initiatives such as electronic health records and other data not developed specifically for public health surveillance purposes. Increased capacity of information technology to perform analytic processing and increased availability of health-care–related data create opportunities to develop new surveillance and analytic methods. Data-mining tools allow analysis of data on several different types of events collected at once to determine relationships. Such tools offer potential and represent a blending of statistical methodologies with computing resources; their best application should include appropriate statistical interpretations of findings.
Emerging Data Useful for Surveillance
The wide-scale implementation of electronic information systems has resulted in an increased generation and availability of data. Sources available include data systems for collecting sentinel disease reports and spontaneous adverse event reports related to drugs, vaccines, and other medical products. Others include information systems designed for various other purposes (e.g., prescription pharmaceuticals, medical encounters, inventory and marketing, over-the-counter pharmaceutical sales, and emergency service dispatches) produce data that potentially can augment evidence for monitoring and assessing the health of populations. However, the data generated from such systems do not readily lend themselves to well-defined sample/population relations. No clear sampling design is available to determine how well the data represent a target population. Applications for analyzing data in such settings are empirical in nature; interpretation of these data and subsequent implications are not well informed by theory and must be acquired through experience by engaged and invested public health professionals.
Substantial real-time public health information is available that offers potential surveillance value in the form of unstructured or text data. Data or information in such form includes news or intelligence-like reports that originate from the news media and systems like EpiX, ProMed, HealthMap, and Argus (6). Analyzing and understanding structured data requires different skills than those required for analyzing and understanding unstructured information. In addition, because unstructured data tend to be anecdotal in nature and are delivered more quickly than traditionally sourced surveillance data, the need to combine or fuse data and information of different types adds additional complexity to the analytics.
Electronic Health Records
The Centers for Medicare and Medicaid Services (CMS) is initiating a financial incentives program to help eligible providers and eligible hospitals adopt and make meaningful use of electronic health record (EHR) technology so they can provide better health care to their patients (7). Over time, the EHR incentive program under Medicare and Medicaid will accelerate and facilitate health-information technology adoption by more providers and hospitals throughout the health-care system. State health departments will need to be ready to manage and analyze the expected increase of population health data as sources of electronic medical records become available in volume, in complex data-messaging structures, and in real time as patient visits occur. In addition, the clinical care measures collected by CMS will be of interest to state and local health departments that want to monitor preventive care.
Health Information Exchanges
The State Health Information Exchange Cooperative Agreement Program (8) funds states' efforts to build capacity for exchanging health information across the health-care system both in and across states. Awardees are responsible for increasing connectivity and enabling patient-centric information flow to improve the quality and efficiency of care. This system has been built to facilitate the exchange of patient-level data among providers. It offers the potential for public health use to monitor quality of care and health status and outcomes.
The future of public health surveillance will depend on developing new analytical approaches to adapt to changing health data sources, increased information technology capacity, and increased concerns about the sensitivity of patient data revealed in unintentional data release. Although information technology specialists and public health programmatic or scientific staff might be comfortable within their respective domains of expertise, the new challenges will require increased attention in the analytic data management gap that exists between these two domains. To adapt the traditional public health functions of notifiable disease reporting, outbreak detection, emergency response, and program evaluation, public health departments will need to update existing approaches to data collection and management and develop new analytical techniques to take advantage of evolving public health data sources while protecting patient confidentiality.
- Bradley CA, Rolka H, Walker D, Loonsk J, BioSense: implementation of a national early event detection and situational awareness system. MMWR 2005;54(Suppl);11–19.
- CDC. Introduction. In: Challenges and opportunities in public health surveillance: a CDC perspective. MMWR 2012;61(Suppl; July 27, 2012):1-2.
- Ozonoff A, Forsberg L, Bonetti M, Pagano M. Research methods for bivariate spatio-temporal syndromic surveillance. MMWR 2004;53(Suppl):59-66.
- Wallenstein S, Naus J. Scan statistics for temporal surveillance for biologic terrorism. MMWR; 2004 (Suppl):74–8.
- CDC and Council of State and Territorial Epidemiologists. CDC/CSTE Applied Epidemiology CompetenciesToolkit. Available at http://www.cste.org/dnn/Home/CSTEFeatures/Competiencies/tabid/174/Default.aspx and http://www.cdc.gov/od/owcd/cdd/aec.
- Khan AS, Fleischauer AF, Casani J, Groseclose SL, The next public health revolution public health information fusion and social networks. Am J Public Health 2010;100:1237–42.
- Centers for Medicare & Medicaid Services. 42 CFR Parts 412, 413, 422. Medicare and Medicaid Programs; Electronic health record incentive program; final rule. Federal Register 2010;75:July 28, 2010.
- American Recovery and Reinvestment Act of 2009, Title XIII - Health Information Technology, Subtitle B—Incentives for the Use of Health Information Technology, Section 3013, State Grants to Promote Health Information Technology. State Health Information Exchange Cooperative Agreement Program. Funding Opportunity Announcement. Available at http://www.grants.gov/search/search.do?oppId=58990&mode=VIEW.
FIGURE 1. Analytic data management functions in public health
Alternate Text: The figure is a diagram that displays how data is transformed from a data collection system for use in different formats so it can be used by an analyst who needs the data.
Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of
Health and Human Services.
All MMWR HTML versions of articles are electronic conversions from typeset documents. This conversion might result in character translation or format errors in the HTML version. Users are referred to the electronic PDF version (http://www.cdc.gov/mmwr)
and/or the original MMWR paper copy for printable versions of official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices.
**Questions or messages regarding errors in formatting should be addressed to email@example.com.