Skip Navigation LinksSkip Navigation Links
Centers for Disease Control and Prevention

CDC Home Search Health Topics A-Z
MMWR

Persons using assistive technology might not be able to fully access information in this file. For assistance, please send e-mail to: mmwrq@cdc.gov. Type 508 Accommodation and the title of the report in the subject line of e-mail.

Fast Grid-Based Scan Statistic for Detection of Significant Spatial Disease Clusters

Daniel B. Neill, A. Moore
Carnegie Mellon University, Pittsburgh, Pennsylvania

Corresponding author: Daniel B. Neill, Department of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213. Telephone: 412-621-2650; E-mail: neill@cs.cmu.edu.

Abstract

Introduction: The spatial scan statistic is a commonly used statistical test for detecting significant disease clusters. However, the time needed to compute the scan statistic increases as the square of the number of data points M, making the test computationally infeasible for large data sets (M >100,000). One solution is to aggregate data points to a uniform grid --- when the grid is dense, the scan statistic can be computed substantially faster, with complexity O(Equation)instead of O(M 2). However, even this approach can require multiple days to compute when M is large. Because disease clusters must be found in minutes rather than days for real-time detection, a more efficient algorithm is needed.

Objectives: Given a grid of squares, where each square has an associated count (number of disease cases) and underlying population, the goal is to quickly find the region with the maximum value of the scan statistic (the most significant disease cluster).

Methods: A multiresolution algorithm is proposed that partitions the grid into overlapping regions, bounds the maximum score of each region, and prunes regions that cannot contain the most significant cluster. This method enables users to search across all possible regions while examining only a fraction of the regions. This reduces complexity to O(M) for dense test regions. As in the original scan statistic, randomization testing is used to calculate the statistical significance (p-value) of the detected cluster. (For additional details, see the full paper at http://www.cs.cmu.edu/~neill/papers/sss-techreport.pdf.)

Results: The algorithm was tested on seven data sets (M ≈ 200,000), including western Pennsylvania emergency department data. The algorithm identified the most significant disease clusters in 20--130 minutes, 20--150 times faster than exhaustive search (Table).

Conclusions: The algorithm results in substantial speedups as compared with exhaustive search, making real-time detection of disease clusters computationally feasible. This algorithm is being applied toward automatic real-time detection of outbreaks.

Table

Table 1
Return to top.

Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of Health and Human Services.


References to non-CDC sites on the Internet are provided as a service to MMWR readers and do not constitute or imply endorsement of these organizations or their programs by CDC or the U.S. Department of Health and Human Services. CDC is not responsible for the content of pages found at these sites. URL addresses listed in MMWR were current as of the date of publication.

Disclaimer   All MMWR HTML versions of articles are electronic conversions from ASCII text into HTML. This conversion may have resulted in character translation or format errors in the HTML version. Users should not rely on this HTML document, but are referred to the electronic PDF version and/or the original MMWR paper copy for the official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices.

**Questions or messages regarding errors in formatting should be addressed to mmwrq@cdc.gov.

Page converted: 9/14/2004

HOME  |  ABOUT MMWR  |  MMWR SEARCH  |  DOWNLOADS  |  RSSCONTACT
POLICY  |  DISCLAIMER  |  ACCESSIBILITY

Safer, Healthier People

Morbidity and Mortality Weekly Report
Centers for Disease Control and Prevention
1600 Clifton Rd, MailStop E-90, Atlanta, GA 30333, U.S.A

USA.GovDHHS

Department of Health
and Human Services

This page last reviewed 9/14/2004