Evaluation of an Electronic General-Practitioner--Based
Syndromic Surveillance System ---
Auckland, New Zealand, 2000--2001
Nicholas F. Jones,1 R.
1Auckland Regional Public Health Service, Auckland, New Zealand;
2University of Auckland, Auckland, New Zealand
Corresponding author: Nicholas F. Jones, Auckland Regional Public Health Service, Private Bag 92 605 Symonds Street, Auckland, New Zealand. Telephone: 64 9 262-1855; Fax: 64 9 623-4633; E-mail:
Introduction: During 2000 and 2001, Auckland Regional Public Health Service piloted a
general-practitioner--based syndromic surveillance system (GPSURV).
Objectives: The pilot evaluated data capture, the method used to distinguish initial from follow-up visits, the definition
of denominators, and the external validity of measured influenza-like illness trends.
Methods: GPSURV monitored three acute infectious-disease syndromes: gastroenteritis, influenza-like illness, and
skin and subcutaneous tissue infection. Standardized terms were used to describe the syndromes. Data were uploaded daily from clinics and transferred to a database via a secure network after one-way encryption of patient identifiers. Records were matched to allow the distinction of follow-ups from first visits, based on between-visit intervals of
<8 weeks. Denominator populations were based on counts of unique patients treated at participating clinics during the previous 2 years. Record completion was examined by using before-and-after surveys of self-assessed standardized-term recording.
Between-visit intervals were counted for matching records and alternative denominators were calculated on the basis of different observation periods. Weekly influenza-like illness rates were compared with rates generated by an alternative system.
Results: Physicians' self-reported recording compliance was highest for skin and subcutaneous tissue infection (71%) and lowest for influenza-like illness (48%). Initial visits had 18%--19% greater compliance than follow-up visits. The number of physicians reporting increasing compliance during the pilot was greater than the number reporting decreases for all conditions.
Comparison of data with an independent influenza-like illness surveillance system indicated a close agreement between the two data series.
Conclusions: These results indicate that incidence of acute syndromes can be monitored, at least as successfully as
a manual system, by using standardized clinical-term data from selected general-practice clinics. The provision of feedback reports appears to have a limited but positive effect on data quality.
The potential to enhance public health surveillance by
using general-practice data has been discussed by public
health practitioners (1,2). Computerization of general practice records and increased emphasis on population health within primary care (3) have brought this potential closer to realization.
In New Zealand (NZ), electronic systems for physician
reimbursement have contributed to widespread adoption
of computerized family practice information systems. In 1995, an estimated 84% of NZ family physicians or
general practitioners (GPs) used a computer for, at minimum, office management
(4). A recent survey determined that 57% of
NZ GPs use an electronic system to record and store clinical data; this figure was predicted to reach 89% by early 2004 (5). The potential for GP-based sentinel surveillance in NZ is also
enhanced by virtually every GP clinic having, at minimum,
dial-up connectivity to a secure wide area network.
These trends of increased information-system use among GPs created an opportunity for Auckland Regional Public Health Service (ARPHS) to develop a general-practitioner--based sentinel surveillance system (GPSURV). ARPHS provides
public health surveillance for NZ's greater-Auckland region, which consists of seven districts or cities with a combined population of 1.29 million persons (6). GPSURV was designed to monitor community incidence of specified acute syndromes and rates
of physician visits for common chronic conditions.
During 2000 and 2001, to test the feasibility of GPSURV, ARPHS undertook a pilot study with 27 volunteer GPs
from nine clinics. After 3 months of system implementation, ARPHS evaluated the data collected to assess different aspects of internal validity, including data quality. External validity, or the degree to which observed trends were likely to represent communitywide trends, was examined after 12 months of data had been collected.
This paper summarizes the evaluation of the GPSURV
pilot with respect to acute syndrome surveillance. The
evaluation assessed data capture, the validity of methods used to define illness episodes and denominator populations, the
effect of physician participation on self-reported data-quality assessments, and the external validity of influenza-like--illness reporting.
CDC (7) and the World Health Organization (WHO)
(8) have produced frameworks for evaluating established surveillance systems. The WHO protocol focuses on reviews of paper-based systems and therefore was not applicable to this study.
The CDC framework accounts for the interchange of electronic data but is not intended to guide pilot studies, nor does it focus
on the outbreak-detection capability of real-time surveillance systems. However, a recently published evaluation framework
for evaluating syndromic surveillance systems
(9) explicitly addresses evaluation of the outbreak-detection function of
syndromic surveillance and guided the writing of this paper.
GPs were recruited from nine clinics whose physicians routinely used standardized terms to record patient
assessments. Clinics were distributed across four cities, but locations were not random, and only one clinic was located in central Auckland. The combined population represented by the
recruited clinics was 52,960 persons, or approximately 4.1% of the
Auckland region's population.
GPSURV was designed to use standardized terms rather than free-text searches to identify patients with target conditions for three reasons. First, clinics were using different information systems, thereby necessitating use of a standard data-extract specification. Second, the project aimed to collect minimal data from clinic information systems with
minimal disruption. Third, using standardized terms would likely enhance specificity and simplify analyses.
The standardized terminology used by participating physicians was the Read Codes, Version 2
(10). This terminology was widespread in NZ at the time of the pilot because the NZ Ministry of Health had promoted it as the national standard for electronic primary care records. The Read terminology incorporates a conceptual hierarchy within its coding system
(11). Codes are used as shorthand for clinical terms, and variations of general terms use codes that incorporate the parent term
code (e.g., the code for viral
gastroenteritis, A07y0.00, includes the first two characters of the code for the parent term
intestinal infectious diseases, A0.00).
Although not ideal for epidemiologic purposes, the Read hierarchy can be used to specify syndromes for surveillance.
Three acute infectious clinical syndromes were chosen for the pilot: gastroenteritis, influenza-like illness, and skin infection. Physicians were provided case definitions and corresponding codes (Table 1).
Physicians were advised to record either the specified parent code or a more specific instance of the parent term
or corresponding code, as clinically indicated. Data were uploaded daily from clinics via a secure network (Figure 1). A utility within each system enabled the physician or researchers to specify search terms or codes, thus ensuring the system had the flexibility to change conditions under surveillance. A unique patient identifier, the New Zealand National Health Index (NHI)
was encrypted by an independent third party before data were transferred to the GPSURV database.
Encryption enabled data for matching patients to be linked while maintaining patient privacy.
The electronic record system used by a majority of physicians did not allow physicians to distinguish an initial visit
from follow-up visits for the same illness episode. Record linkage for this pilot allowed this distinction to be made by using an algorithm based on between-visit intervals. Visit records for the same patient and syndrome were categorized as
follow-ups if the visit occurred within 8 weeks of a previous visit.
Because GPSURV aimed to compare disease occurrence among clinics and districts, denominators were required to
calculate incidence rates. Unlike in the United Kingdom or the Netherlands where patients register with only one physician or clinic, NZ patients can visit as many GP clinics as they wish. This factor increased the difficulty of defining a
denominator population. Alternative denominators have been recommended for countries in this situation
(12). GPSURV defined denominators as active patients
(13) and used counts of unique patients treated once or more by a participating
physician during the previous 2 years. These counts were
performed automatically by the clinic information system.
Physician-specific reports providing feedback on recorded illnesses and comparisons with regionwide trends were produced on a weekly and quarterly basis. Reports aggregating data to district and region levels were produced at the same time intervals. No statistical aberration-detection methods were used during the pilot because the focus was on assessing feasibility, data quality, and internal validity.
The sensitivity of GP-based surveillance systems is a function of diagnostic reliability and record completion or data capture. By defining the events under surveillance as conditions or problems identified by participating physicians, GP-based syndromic surveillance (e.g., GPSURV) is less
concerned with diagnostic reliability than with record
completion and data capture. Given the primary function of outbreak detection through detection of aberrations in time-series data,
even incomplete data capture does not necessarily prevent such a system from fulfilling this function, provided data
completion does not fluctuate over time. Nevertheless, the completion of recording and event data collection does affect system sensitivity.
Multiple approaches have been taken to assess the completion of term or code recording within electronic GP records. In
the UK, where GPs have been required to retain both
paper and electronic records, studies have measured completion
by comparing those records (14--16). When clinics do not retain paper records, this approach is not possible. Direct inspection
of electronic records would be possible but expensive and disruptive. Other approaches have included classifying physicians
into adequate or inadequate recorders by comparing their incidence and prevalence rates with average values
(17,18), and by using other data (e.g., diagnoses mentioned in hospital letters) as a proxy for prevalence (18). The proxy most commonly used has been data on prescribed medicines, obtained either directly from the clinic (19) or from centralized data collections
(20,21). This method is useful only when medicines are prescribed exclusively for specified
Survey methods have demonstrated that GPs reliably self-report certain activities (e.g., asking patients about tobacco use ), and one study used a survey to examine electronic record-keeping within GP clinics in a UK network
(23). No known studies have been published on the effect of individualized feedback on data quality in GP-based surveillance
systems, although certain authors have reported that feedback is likely to have a positive effect
For this study, a survey method was used to measure the completion of data recording for acute syndromes in
the evaluation. Surveys of participating physicians were conducted
before and after the first 3-month period of the pilot. For
each surveillance condition and consultation type (i.e., initial and follow-up), respondents were asked to estimate the percentage of patient visits for which they recorded a standardized term or code (as opposed to free text).
To assess the effect on the denominator of changing the observation period, counts of active patients seen within previous
6-, 12-, and 18-month periods were compared with the denominator obtained by counting the number of patients
attending during the previous 24 months. For evaluating the appropriateness of using an 8-week interval between
consecutive visits to identify new illness episodes for the same health problem, distributions of between-visit intervals for
matching patient records were examined.
Generalizability of measured trends to the region's population would have depended on the geographic distribution
of conditions under surveillance and the representativeness of disease events detected at the sentinel sites. A full evaluation of these concerns was beyond the scope of the pilot study. However, an attempt was made to examine external validity of
observed trends by comparing data for one syndrome with data from an independent source. The age-sex structure of the study population was also compared with that of the region.
Self-Reported Term-Recording Compliance
A total of 21 physicians completed a baseline survey, and 22 of 27 participating physicians completed a follow-up
survey administered 3 months after the pilot began. Not all 22
of those completing the follow-up survey answered each
question; nonrespondents for particular questions were
removed from analysis. Compliance was defined as recording standardized
terms for >90% of patient visits.
Of the acute syndromes studied, recording for skin and subcutaneous tissue infection had the greatest compliance (71% of physicians), and influenza-like illness had the least (48% of participants) (Table 2). For all conditions, physicians reported recording standardized terms for follow-up visits less frequently than for first visits.
Of the 21 physicians who had previously returned a baseline survey, 17 completed the follow-up survey.
Before-and-after responses from these physicians were compared (Table 3). The number of physicians reporting a between-survey change for
each diagnosis and visit type, based on a change of
>10% in percentage of terms recorded, was determined. Although the number
of participants was too limited to test any trends statistically, for all acute syndromes, more increases than decreases occurred.
Categorization of Follow-Up Visits
The percentage of visits for acute syndromes that were categorized as follow-ups (i.e., by using the 8-week
between-visit interval) were as follows: 5% for influenza-like illness, 9% for gastroenteritis, and 25% for skin infections. Analysis of pairs of consecutive encounters for skin infections determined that 82% of follow-up visits occurred within 14 days of the
previous matching encounter. Only three matching visits for any acute condition were recorded >8 weeks after the previous encounter; however, only 3 months of data were analyzed for matching pairs.
The size of the active-patient population increased with the period of observation, as would be expected. The number
of active patients counted during a 6-month period was 60% of the 24-month count and 78% and 92% of the 24-month
count for 12- and 18-month periods, respectively.
Weekly ILI rates were compared with ILI rates as measured by a separate surveillance system. FLUSURV, a
surveillance system for influenza and ILI, collects manually recorded data from approximately 40 volunteer GPs in the Auckland region. Participating FLUSURV clinics keep a written tally of patients meeting the WHO case criteria for ILI. Each week, a public health clerical staff member calls clinics to obtain data on the number of new cases. Denominator data for participating physicians are based on physician estimates of total patient population numbers. Only one clinic participated in both GPSURV and FLUSURV. The result of this comparison is illustrated (Figure 2). Although data are collected from a
different network of clinics, incidence trends indicate statistical agreement (t = 1.81; p = 0.085 not significant, 20 degrees of
freedom). The first peak of the season appears to be higher in the FLUSURV data, but incidence rates from GPSURV were age-standardized, which is likely to have reduced measured rates slightly. GPSURV appears to have detected the second substantial peak of the season earlier than FLUSURV.
Given the low self-reported compliance for recording of influenza-like illness, this result appears surprising. One
possible explanation is that the initial 3-month pilot period was during late spring and early summer when ILI incidence was likely to be minimal. Thus, GPs might have been more likely to use alternative terms (e.g.,
hay fever) to record syndromes with upper respiratory symptoms.
Comparison of age-sex structures demonstrated that
approximately all age-sex bands of the study population were
within 2% of comparable percentages for the regional population. An exception was the <10 years age group; when compared with the regional age-sex distribution, this age group comprised 6% more of the study population for males and 5% more
This study examined the validity of disease-incidence measures based on the collection and analysis of clinical data
routinely recorded by a network of volunteer family physicians. The study's findings indicate that, despite participant variability in data recording and problems with defining denominator populations, the incidence of common acute syndromes can be
monitored at least as successfully by using standardized clinical-term data from selected GP clinics as by using manual methods. However, the sensitivity of this method will depend on the frequency of the syndrome under surveillance. For less common conditions,
a larger sample of GPs would be required. Similarly, geographic variations in disease incidence probably would not be
detected without increasing the geographic spread of participating clinics.
The study's findings indicate that the algorithm used to classify follow-up visits is probably working effectively. In the case of influenza-like illness, however, only 5% of visits were actually follow-ups. Thus, misclassification of these as first visits would have had minimal impact on measured rates. The evaluation indicated that approximately 80% of patients treated over a
2-year period would be counted over 12 months. The effect of changing observation period for defining the denominator
would be more complicated given possible changes in age structure at different time periods. Nevertheless, if such a denominator were to be used for further surveillance, a
12-month observation period would probably suffice.
Clinic participation in the pilot appeared to have a limited but positive impact on data quality. This might have resulted from regular feedback provided to physicians in weekly and quarterly reports. Other aspects of participating in the project might also have contributed to improvements in data quality; for example; physicians might have gained an
increased awareness of the public health benefits of providing valid data. However, observed fluctuations in the recording
of standardized terms raise the possibility that this approach might be prone to artefactual aberrations in time-series data,
and participating GPs would need to maintain consistency in their recording behavior for ongoing surveillance.
Adelstein AM. Policies of the Office of Population Censuses and Surveys: philosophy and constraints. Br J Prev Social Med 1976;30:1--10.
Pinsent RJ. The primary observer. Ecol Dis 1982;1:275--9.
Rigby M, Roberts R, Williams J, et al. Integrated record keeping as an essential aspect of a primary care led health service. BMJ 1998;317:579--82.
Thakurdas P, Coster G, Gurr E, Arrol B. New Zealand general practice computerisation; attitudes and reported behaviour. N Z Med
National Health Service Information Authority. The clinical terms version 3 (the Read Codes): incorporation of earlier versions of the Read
Codes (the Superset). Birmingham, England: National Health Service Information Authority, 2000.
Britt H, Miller G. A critical review of the final report of the general practice coding jury. Sydney, Australia: Family Medicine Research
Centre, University of Sydney, 2000. Available at
Schlaud M, Brenner MH, Hoopman M, Schwartz FW. Approaches to the denominator in practice-based epidemiology: a critical overview.
J Epidemiol Community Health 1998;52:13S--19S.
De Loof J. Practice size: a fraction of the yearly attending group as practice size indicator. Gen Pract Int 1983;12:127--8.
Boydell L, Grandidier H, Rafferty C, McAteer C, Reilly P. General practice data retrieval: the Northern Ireland project. J Epidemiol
Community Health 1995;49:22--5.
Neal RD, Heywood PL, Morley S. Real world data-retrieval and validation of consultation data from four general practices. Fam Pract
Pringle M, Ward P, Chilvers C. Assessment of the completeness and accuracy of computer medical records in four practices committed to
recording data on computer. Br J Gen Pract 1995;45:537--41.
Bartelds AI. Validation of sentinel data. Gesundheitswesen 1993;55:3--7.
Jick H, Jick S, Derby L. Validation of information recorded on general practitioner based computerised data resources in the United Kingdom.
Vernon JG. Ensuring the accuracy and completeness of clinical data on general practice systems. J Inform Prim Care 1998;Nov:18--9.
Bruno G, Bargero G, Vuolo A, Pisu E, Pagano G. A population-based prevalence survey of known diabetes mellitus in northern Italy based
upon multiple independent sources of ascertainment. Diabetologia 1992;35:851--6.
Gribben B, Coster G, Pringle M, Simon J. Non-invasive methods for measuring data quality in general practice. N Z Med J 2001;114:30--2.
Eccles M, Ford GA, Duggan S, Steen N. Are postal questionnaire surveys of reported activity valid? An exploration using general
practitioner management of hypertension in older people. Br J Gen Pract 1999;49:35--8
Lawrenson RA, Coles G, Walton K, Farmer RDT. Characteristics of practices contributing to the MediPlus database and the implications for its
use in epidemiological research. J Inform Prim Care 1998;Nov:14--18.
Chauvin P, Valleron AJ. Participation of French general practitioners in public health surveillance: a multidisciplinary approach. J
Epidemiol Community Health 1998;52:2S--8S.
Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of
Health and Human Services.References to non-CDC sites on the Internet are
provided as a service to MMWR readers and do not constitute or imply
endorsement of these organizations or their programs by CDC or the U.S.
Department of Health and Human Services. CDC is not responsible for the content
of pages found at these sites. URL addresses listed in MMWR were current as of
the date of publication.
All MMWR HTML versions of articles are electronic conversions from ASCII text
into HTML. This conversion may have resulted in character translation or format errors in the HTML version.
Users should not rely on this HTML document, but are referred to the electronic PDF version and/or
the original MMWR paper copy for the official text, figures, and tables.
An original paper copy of this issue can be obtained from the Superintendent of Documents,
U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800.
Contact GPO for current prices.
**Questions or messages regarding errors in formatting should be addressed to