Skip Navigation LinksSkip Navigation Links
Centers for Disease Control and Prevention
Safer Healthier People
Blue White
Blue White
bottom curve
CDC Home Search Health Topics A-Z spacer spacer
spacer
Blue curve MMWR spacer
spacer
spacer

Performance-Critical Anomaly Detection --- United States, December 2002- -March 2004

Colin R. Goodall,1 A. Lent,1 S. Halasz,1 E. Koski,2 D. Agarwal,1 S. Tse,1 G. Jacobson1
1
AT&T Labs (Research), Middletown, New Jersey; 2Quest Diagnostics Incorporated, Teterboro, New Jersey

Corresponding author: Colin R. Goodall, AT&T Labs, 200 S. Laurel Ave. D4 3D28, Middletown, NJ 07760. Telephone: 732-4205816; Fax: 732-368-7201; E-mail: cgoodall@att.com.

Disclosure of relationship: The contributors of this report have disclosed that they are employees of AT&T Labs or Quest Diagnostics, Inc., and that their employment compensation may include ownership of company stock. This report does not contain any discussion of unlabeled use of commercial products or products for investigational use.

Abstract

Introduction: Performance-critical anomaly detection for biomedical surveillance requires 1) reliable data that are both geotemporally and demographically representative; 2) efficient, real-time, large-scale information-processing capabilities; 3) comprehensive, tunable anomaly-detection algorithms; 4) a flexible platform for investigation and management of anomalies; and 5) alert distribution and management.

Objectives: This study analyzed a reliable, high-performance, end-to-end, modular process for early event detection that included data loading and transformation, statistical anomaly detection, and tools for user interaction.

Methods: The process architecture and implementation included three components: 1) a data layer, including modules for data loading, cleaning, normalization, coding, and aggregation; 2) an anomaly-detection layer, including multiple methods for statistical anomaly detection and an anomaly case manager; and 3) a presentation layer, including dynamic visualization of data (geographically, temporally, and logically) used in case investigation, publication, and process monitoring. Specific statistical anomaly detection methods used included process-control techniques; SaTScan™ (a free software program used to calculate spatial, temporal, and space-time scan statistics); a square-root technique; and a new adaptation of Bayesian shrinkage estimation (Kalman Filter Gamma Poisson Shrinker [KF GPS]) used to monitor a stream of events organized into a periodic (daily) array of cross-classified counts with geographic and medical dimensions. Shrinkage estimates were obtained of ratios of observed counts to proportionally fit expected counts that update smoothly with time after allowing for changes in marginal totals. KF GPS was used to model spatial associations and dependencies among the medical measurements. The case manager was used to organize groups of related anomalies into cases and to support collaboration, by providing a set of functions and software linkages for persons with subject-matter, statistical, and analytic expertise to use to investigate and manage anomalies. Each case could be resolved as an alert, deferred, or dismissed. The case manager included a logic-rich engine and two feature-rich, configurable tools for case organization and dynamic data visualization. Similar technology used by AT&T for telecommunications monitoring and case management in an environment in which >300 million calls are received daily was adapted to health-care data, including laboratory test and emergency room data, with comparable performance.

Results: In collaboration with Quest Diagnostics, Inc. (QDI), AT&T used a subset of QDI's nationwide testing data for December 2002--March 2004 for three syndromic groupings (respiratory, gastrointestinal, and heavy metals [lead]) in the New York City (NYC) metropolitan area and nationwide (lead only). The system computed approximately 600,000 scores, resulting in approximately 400 anomalies and their cases. Certain anomalies included a spike in overall respiratory test requisitions in the area of Bensonhurst, Queens, NYC; a spike in mycobacteria requisitions in Orange County, New York; and a change in data coding affecting viral tests in Bergen County, New Jersey.

Conclusion: This analysis demonstrated 1) the importance of end-to-end process architecture; 2) the utility of multiple algorithms, especially KF GPS, for anomaly detection; and 3) the effectiveness of using a case manager to investigate anomalies and reduce the burden of false positives. The system can handle massive data streams and allows rapid anomaly detection through use of a suite of analytic, data management, and visualization tools.

Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of Health and Human Services.


References to non-CDC sites on the Internet are provided as a service to MMWR readers and do not constitute or imply endorsement of these organizations or their programs by CDC or the U.S. Department of Health and Human Services. CDC is not responsible for the content of pages found at these sites. URL addresses listed in MMWR were current as of the date of publication.

Disclaimer   All MMWR HTML versions of articles are electronic conversions from ASCII text into HTML. This conversion may have resulted in character translation or format errors in the HTML version. Users should not rely on this HTML document, but are referred to the electronic PDF version and/or the original MMWR paper copy for the official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices.

**Questions or messages regarding errors in formatting should be addressed to mmwrq@cdc.gov.

Date last reviewed: 8/5/2005

HOME  |  ABOUT MMWR  |  MMWR SEARCH  |  DOWNLOADS  |  RSSCONTACT
POLICY  |  DISCLAIMER  |  ACCESSIBILITY

Safer, Healthier People

Morbidity and Mortality Weekly Report
Centers for Disease Control and Prevention
1600 Clifton Rd, MailStop E-90, Atlanta, GA 30333, U.S.A

USA.GovDHHS

Department of Health
and Human Services