Syndrome and Outbreak Detection Using Chief-Complaint
--- Experience of the Real-Time Outbreak and Disease Surveillance Project
Michael M. Wagner,1 J. Espino,1 F-C. Tsui,1 P. Gesteland,2,3 W. Chapman,1 O. Ivanov,1 A. Moore,1,4 W. Wong,4 J. Dowling,1 J. Hutman1
1University of Pittsburgh, Pittsburgh, Pennsylvania;
2Intermountain Health Care Institute for Health Care Delivery Research, Salt Lake City, Utah; 3University of Utah, Salt Lake City, Utah;
4Carnegie Mellon University, Pittsburgh, Pennsylvania
Corresponding author: Michael M. Wagner, Real-Time Outbreak and Disease Laboratory, University of Pittsburgh, Suite 500, Cellomics Building, 500 Technology Drive, Pittsburgh, PA 15219. Telephone: 412-383-8137; Fax: 412-383-8135; E-mail: email@example.com.
This paper summarizes the experience of the Real-Time Outbreak and Disease Surveillance (RODS) project in
collecting and analyzing free-text emergency department (ED) chief complaints. The technical approach involves real-time transmission of chief-complaint data as Health Level 7 messages from hospitals to a regional data center, where a Bayesian text classifier assigns each chief complaint to one of eight syndrome categories. Time-series algorithms analyze the syndrome data and generate alerts. Authorized public health users review the syndrome data by using Internet interfaces with timelines and maps. Deployments in Pennsylvania, Utah, Atlantic City, and Ohio have demonstrated feasibility of real-time collection of
chief complaints. Retrospective experiments that measured case-classification accuracy demonstrated that the Bayesian classifier
can discriminate between different syndrome presentations. Retrospective experiments that measured outbreak-detection accuracy determined that the classifier's performance was adequate to support accurate and timely detection of seasonal disease outbreaks. Prospective evaluation revealed that a cluster of carbon monoxide exposures was detected by RODS within 4 hours of the presentation of the first case to an emergency department.
In 1999, the Real-Time Outbreak and Disease Surveillance (RODS) project created a regional test bed in a
large metropolitan area (population: 2.3 million persons) that had the characteristic of high sampling density (i.e., monitoring of >50% of the population for at least one type of data). The project then used this test bed to study detectability of outbreaks, especially detectability of cohort exposures (e.g., a citywide aerosolized
Bacillus anthracis release) that have a narrow window
of opportunity for mitigation and thus present a substantial surveillance challenge
(1). After early studies of laboratory data
(2) and International Classification of Diseases, Ninth
Revision (ICD-9) coded chief complaints
(3,4), later research focused on
analysis of free-text chief complaints. This paper describes the experience of the RODS project in
collecting and analyzing patient chief complaints.
The technical approach to Health Level 7 (HL7)--based data collection and chief-complaint processing has been
described previously (5--9). Briefly, when a patient registers for care at an ED, a triage nurse or registration clerk enters the patient's reason for visit (known as a chief complaint) into a
registration system. This step is part of the normal workflow in
multiple U.S. hospitals (10). The registration system transmits chief-complaint data in the form of HL7 messages (5) to an HL7 message router in the hospital, which can de-identify these messages and transmit them via the Internet to a health department in real time.
At the health department, a naïve Bayesian classifier
(9) encodes each chief complaint into one of eight
mutually exclusive and exhaustive syndromic categories (respiratory, gastrointestinal, botulinic, constitutional, neurologic, rash, hemorrhagic, and none of the above). RODS software then aggregates the data into daily counts by syndrome and
residential zip code for analysis by time-series algorithms and
display on interfaces using timelines and maps.
A goal of the project has been to test whether early detection of outbreaks can be achieved through statistical analysis
of chief-complaint data (or other routinely collected data). Although chief complaints are insufficient for accurate diagnosis of an individual patient, the hypothesis is that they contain sufficient information so that, when aggregated into daily
population counts and analyzed by using spatio-temporal
algorithms, early detection of an abnormally high number of persons who
have contracted a respiratory or other illness is possible.
The research team conducted numerous experiments to test this hypothesis. The first type of experiment measured
the information content of chief complaints for syndrome categorization by measuring the sensitivity and specificity with
which patients with different syndromes can be detected from their chief complaints alone (Table). Each experiment tested the ability of a classifier program to accurately assign a syndrome to a patient on the basis of the chief complaint alone (in certain experiments, the patient data were ICD-9-coded ED diagnoses). For example, one experiment measured the accuracy of the Bayesian text classifier for respiratory syndrome in comparison with a manual determination made by the Utah Department of Health during the 2002 Winter Olympic Games. In that experiment, the Bayesian respiratory classifier detected 52%
of affected patients, with a specificity of 89%.
The experiments demonstrated that chief-complaint data contains information about the syndromic presentation and that
a naïve Bayesian classifier can extract that information. For certain syndromes of interest to terrorism preparedness, the sensitivity of classification is approximately 0.5 (i.e., in the event of an outbreak causing respiratory complaints, 50% of affected patients examined at a monitored facility would be detected).
As expected, the case-detection experiments demonstrate that the specificity of case classification from chief complaints
is <100%, meaning that daily counts of patients with respiratory syndrome would contain noise attributable to falsely classified nonrespiratory patients. Therefore, a second type of experiment was needed to determine whether outbreaks would produce a sufficiently large spike to stand out from the background noise in the daily syndrome counts (and to determine how early
any spikes would occur). In these outbreak-detection
experiments, a time-series detection algorithm was run on 3 years of
daily syndrome counts from metropolitan areas that had experienced annual winter outbreaks. The time of detection from daily syndrome counts was determined as the date the algorithm first signaled during the beginning of the seasonal outbreak and was compared with the time of detection from ICD-9-coded hospital diagnoses
(14). For detection of three pediatric gastrointestinal outbreaks, detection occurred 29 days earlier (95% confidence interval [CI] = 4–53) with no false alarms. For
pediatric and adult respiratory outbreaks, detection occurred 10 days earlier (95% CI =
-15–35) and 11 days earlier (95% CI = -10–33), respectively, also with no false alarms.
Early Experience with Prospective Evaluation
Retrospective studies cannot prove that, in field use, this type of system will lead to earlier detection than existing methods. For this reason, the project initiated a prospective evaluation.
The RODS test bed enables public health officials to examine timelines and maps whenever an outbreak occurs or
whenever they receive alerts of anomalous syndrome activity. On Friday, July 18, 2003, an on-call epidemiologist
received an alert regarding a spike in respiratory cases in a single county outside Pittsburgh (Figure). Normally, daily counts of respiratory cases numbered 10, but on that day they numbered 60. The epidemiologist logged onto the RODS interface, reviewed the
verbatim chief complaints of affected patients, and discovered that all were related to carbon monoxide exposure from a faulty
furnace. (Authorized public health users can access case studies of these and other outbreaks through the RODS interface by sending e-mail to firstname.lastname@example.org).
After rapid (6-week) deployment in February 2002 during the Winter Olympics, RODS had a proven model for
building permanent, real-time, HL7-based data feeds of chief complaints from hospitals to public health agencies. Such feeds
would have immediate surveillance use and could later be
expanded to include transmission of data about microbiology
results. However, because adoption of the RODS approach has been slower than expected, the project began to systematically
identify and address barriers to dissemination. One barrier was the perception that such approaches are still
unproven and would absorb public health resources through technology costs and false alarms
(15,16).A second barrier was limited availability
of software and lack of technical expertise. Accordingly, the University of Pittsburgh agreed to distribute the RODS system
free of charge in 2002. Although this action resulted in hundreds of downloads of both the RODS system and the Bayesian
parser, certain health departments lack expertise in database administration, network
administration, geographic information systems, HL7, and systems management. The RODS laboratory helped Utah and Pennsylvania avoid this barrier by hosting their surveillance operations. A cost model for this service was then developed, and the service was offered to other states, which
led to implementation in Ohio and New Jersey. In addition, the RODS Open Source Project
(http://openrods.sourceforge.net) was created in 2003 to catalyze the growth of a community of consultants to help health departments install and
operate surveillance systems (17). In 2003, the University of Pittsburgh placed the RODS source code into the public domain under the GNU General Public License (18). Open-sourcing a project can facilitate technology dissemination because it directly addresses information technology managers' concerns about access to source code, code sustainability, customizability, and availability of expertise.
Status of RODS
RODS has operated continuously since 1999, connecting with 51 hospitals in Pennsylvania, 10 hospitals and 17 urgent
care facilities in Utah, 12 hospitals in Ohio, and four hospitals in New Jersey. The system is also installed in Taiwan and Michigan.
Free-text chief-complaint data are useful in public health surveillance because they are widely available and can
be obtained in real time for modest cost. Moreover, the HL7 technical infrastructure thus created can later be expanded
to transmit other types of data. The technical expertise and cost to create and operate a real-time facility is substantial; therefore, sharing costs by using application service providers leads to cheaper and faster deployment.
This work was supported by Pennsylvania Department of Health Grant ME-107, the Defense Advanced Research Projects Agency,
Agency for Healthcare Research and Quality, and the National Library of Medicine. The authors also thank Jagan Dara, Feng
Dong, William Hogan, Robert Olszewski, Hoah Su, and Virginia Dato.
Kaufmann A, Meltzer M, Schmid G. The economic impact of a bioterrorist attack: are prevention and postattack intervention programs
justifiable? Emerg Infect Dis 1997;3:83--94.
Panackal AA, M'ikanatha NM, Tsui F-C, et al. Automatic electronic laboratory-based reporting of notifiable infectious diseases. Emerg
Infect Dis 2001;8:685--91.
Tsui F-C, Wagner MM, Dato V, Chang CC. Value of ICD-9 coded chief complaints for detection of epidemics. Proc AMIA Symp 2001:711--5.
Espino JU, Wagner MM. The accuracy of ICD-9 coded chief complaints for detection of acute respiratory illness. Proc AMIA Symp 2001:164--8.
Tsui F-C, Espino JU, Dato VM, Gesteland PH, Hutman J, Wagner MM. Technical description of RODS: a real-time public health
surveillance system. J Am Med Inform Assoc 2003;10:399--408.
Tsui F-C, Espino JU, Wagner MM, et al. Data, network, and application: technical description of the Utah RODS Winter Olympic
biosurveillance system. Proc AMIA Symp 2002:815--9.
Gesteland PH, Gardner RM, Tsui F-C, et al. Automated syndromic surveillance for the 2002 Winter Olympics. J Am Med Inform
Gesteland PH, Wagner MM, Chapman WW, et al. Rapid deployment of an electronic disease surveillance system in the state of Utah for the
2002 Olympic Winter Games. Proc AMIA Symp 2002:285--9.
Olszewski R. Bayesian classification of triage diagnoses for the early detection of epidemics. In: Russell I, Haller S., eds. Recent advances in
artificial intelligence: proceedings of Sixteenth International Florida Artificial Intelligence Research Society Conference. Menlo Park, CA: AAAI
Travers D, Waller A, Haas S, Lober W, Beard C. Emergency department data for bioterrorism surveillance. Proc AMIA Symp 2003:664--8.
Chapman WW, Espino JU, Dowling JN, Wagner MM. Detection of multiple symptoms from chief complaints. Technical Report, CBMI
Report Series, 2003.
Ivanov O, Wagner MM, Chapman WW, Olszewski RT. Accuracy of three classifiers of acute gastrointestinal syndrome for syndromic
surveillance. Proc AMIA Symp 2002:345--9.
Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. Technical Report,
CBMI Report Series, 2003.
Ivanov O, Gesteland PH, Hogan W, Tsui F-C, Wagner
MM. Detection of pediatric respiratory and gastrointestinal outbreaks from free-text
chief complaints 2003. Proc AMIA Symp 2003:318--22.
Reingold A. If syndromic surveillance is the answer, what is the question? Biosecur Bioterr 2003;1:1--5.
Broome C, Pinner R, Sosin D, Treadwell T. On the threshold. Am J Prev Med 2002;23:229.
Espino JU, Wagner MM, Szczepaniak M, et al. Removing a barrier to computer-based outbreak and disease surveillance---The RODS Open
Source Project. MMWR 2004;53(Suppl):32--9.
Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of
Health and Human Services.References to non-CDC sites on the Internet are
provided as a service to MMWR readers and do not constitute or imply
endorsement of these organizations or their programs by CDC or the U.S.
Department of Health and Human Services. CDC is not responsible for the content
of pages found at these sites. URL addresses listed in MMWR were current as of
the date of publication.
All MMWR HTML versions of articles are electronic conversions from ASCII text
into HTML. This conversion may have resulted in character translation or format errors in the HTML version.
Users should not rely on this HTML document, but are referred to the electronic PDF version and/or
the original MMWR paper copy for the official text, figures, and tables.
An original paper copy of this issue can be obtained from the Superintendent of Documents,
U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800.
Contact GPO for current prices.
**Questions or messages regarding errors in formatting should be addressed to