Collecting Data

Katrina Hedberg and Julie Maher


Epidemiologic data are paramount to targeting and implementing evidence-based control measures to protect the public’s health and safety. Nowhere are data more important than during a field epidemiologic investigation to identify the cause of an urgent public health problem that requires immediate intervention. Many of the steps to conducting a field investigation rely on identifying relevant existing data or collecting new data that address the key investigation objectives.

In today’s information age, the challenge is not the lack of data but rather how to identify the most relevant data for meaningful results and how to combine data from various sources that might not be standardized or interoperable to enable analysis. Epidemiologists need to determine quickly whether existing data can be analyzed to inform the investigation or whether additional data need to be collected and how to do so most efficiently and expeditiously.

Epidemiologists working in applied public health have myriad potential data sources available to them. Multiple factors must be considered when identifying relevant data sources for conducting a field investigation. These include investigation objectives and scope, whether requisite data exist and can be accessed, to what extent data from different sources can be practically combined, methods for and feasibility of primary data collection, and resources (e.g., staff, funding) available.

Sources of data and approaches to data collection vary by topic. Although public health departments have access to notifiable disease case data (primarily for communicable diseases) through mandatory reporting by providers and laboratories, data on chronic diseases and injuries might be available only through secondary sources, such as hospital discharge summaries. Existing data on health risk behaviors might be available from population-based surveys, but these surveys generally are conducted only among a small proportion of the total population and are de-identified. Although some existing data sources (e.g., death certificates) cover many disease outcomes, others are more specific (e.g., reportable disease registries).

Accessing or collecting clean, valid, reliable, and timely data challenges most field epidemiologic investigations. New data collected in the context of field investigations should be evaluated for attributes similar to those for surveillance data, such as quality, definitions, timeliness, completeness, simplicity, generalizability, validity, and reliability (1). Epidemiologists would do well to remember GIGO (garbage in, garbage out) when delineating their data collection plans.

Data Collection Activities

Collecting data during a field investigation requires the epidemiologist to conduct several activities. Although it is logical to believe that a field investigation of an urgent public health problem should roll out sequentially—first identification of study objectives, followed by questionnaire development; data collection, analysis, and interpretation; and implementation of control measures—in reality many of these activities must be conducted in parallel, with information gathered from one part of the investigation informing the approach to another part. Moreover, most, if not all, field investigations will be done by a larger team. The importance of developing a protocol, identifying roles and responsibilities of team members, and documenting all activities and processes should not be underestimated.

Determine Decisions Regarding Control Measure Implementation

The epidemiologist must keep in mind that the primary purpose of a field investigation into an urgent public health problem is to control the problem and prevent further illness. The range of public health control measures is broad (see Chapter 11). Many of these control measures, such as recalling contaminated food products, closing business establishments, recommending antibiotic prophylaxis or vaccination, and requiring isolation of an infectious person, considerably burden individuals, businesses, or the community. Therefore, it is incumbent on the epidemiologists to determine up front which decisions need to be made and what information is needed to support these decisions.

Define The Investigation’s Objectives And Determine Data Needed

Determining whether an urgent public health problem exists (i.e., an excess of observed cases of illness above what is expected) depends on knowing the expected background rate of endemic disease. The background rate generally is determined by accessing existing data sources, such as reportable disease registries or vital statistics. For foodborne outbreaks, most states and local jurisdictions publish data at least annually; however, for chronic diseases (e.g., cancer) or birth outcomes (e.g., microcephaly), expected baseline rates might have to be extrapolated by applying previously published rates to the population of concern. Although not specific, data from syndromic surveillance systems (e.g., from emergency departments) can be useful in determining background rates of prediagnostic signs or symptoms, such as fever, respiratory illness, or diarrhea.

After the epidemiologist has confirmed the existence of an urgent public health problem, the next important task in a field investigation is to define the specific objectives and determine what data are necessary and sufficient to justify the control measures. Is the objective to identify a point source (e.g., a contaminated food item) of an outbreak to recall the product? Is the objective to identify specific behaviors that put people at increased risk (e.g., cross-contamination during food handling)? Is the objective to identify factors in the environment that might be causing disease (e.g., elevated lead levels in drinking water)?

Although engaging stakeholders, such as other public health agencies, community partners, industry leaders, affected businesses, healthcare practitioners, customers, and regulatory agencies, early in an investigation is time-consuming, including them is essential. Discussing up front the purpose of the investigation and the data collection processes will prove invaluable in the long run when collaborators are needed during case finding, data collection, implementation of control measures, and communication with affected populations and the public.

Develop a Study Protocol

The ability to conduct an epidemiologic field investigation efficiently and effectively depends on understanding the interconnectedness of its parts. Many investigation activities must be conducted in parallel and are interdependent and iterative, with results informing edits or amendments. For example, available resources will influence how complex data collection efforts can be; the timeline for an investigation of an infectious disease outbreak needing urgent control measures might require a quick-and-dirty data collection process, whereas an investigation of a cancer cluster that has unfolded over several years may permit more in-depth data collection and analysis. Therefore, writing a protocol before embarking on any data collection is paramount.

The urgency of most field investigations requires that the epidemiologist act quickly but thoughtfully. An important and potentially time-saving step is to review prior epidemiologic investigations of similar illnesses and, whenever possible, use or adapt existing protocols, including standard data collection approaches and case definitions. Doing so facilitates data exchange with other systems if the outbreak extends to other jurisdictions.

A field investigation protocol does not have to be long, but it must include the following:

  • Investigation objectives.
  • Study design (e.g., cohort study, case–control study).
  • Study population, case definition, sample size, and selection.
  • Data collection procedures, variables to be collected, procedures to safeguard participants.
  • Data security, privacy, confidentiality, information technology controls.
  • Analysis plan.
  • Logistics, including budget, personnel, and timeline.
  • Legal considerations, including statutes, rules, and regulations.

Identifying up front which software package(s) will be used for questionnaire development, data collection, data entry, and analysis also is useful. One such tool, Epi Info, was developed by the Centers for Disease Control and Prevention (CDC) and is a public domain suite of interoperable software tools designed for public health practitioners (available at (see Chapter 5).

Considering all the different elements of an investigation from the beginning will minimize error that potentially can lead to inconclusive results. Major sources of error that need to be considered during data collection include the following:

  • Lack of generalizability because of selection bias, variable participation rates.
  • Information bias, such as measurement error, self-report bias, and interviewer bias.
  • Uncontrolled confounding or bias introduced in the association between exposure and outcome because of third variable.
  • Small sample size, resulting in inadequate power to detect differences between groups.
Identify Possible Data Sources

Keeping in mind the investigation objectives, the epidemiologist should evaluate whether existing data sources (e.g., vital statistics, notifiable disease registries, population surveys, healthcare records, environmental data) are useful for addressing the investigation objectives, whether these data are accurate and readily accessible for analysis, whether existing data systems are interoperable, and what additional data, if any, need to be collected de novo.

Mortality Statistics

Collecting mortality statistics and classifying the causes of death dates to the 1500s in London, when the Bill of Mortality was periodically published (2). During the 1800s, Dr. William Farr developed a disease classification system that ushered in the era of modern vital statistics (3). During the same period, Dr. John Snow, known as the father of modern epidemiology, mapped deaths from cholera in London and determined the Broad Street Pump as the source of contaminated water (4). The story of removing the pump handle is the quintessential public health intervention based on scientific data. Vital statistics remain an important source of data for understanding leading and unusual causes of death (e.g., childhood influenza-associated, viral hemorrhagic fever, variant Creutzfeldt-Jakob disease), and their timeliness is improving thanks to the electronic death reporting system, which many states have implemented (5).

Notifiable Diseases Reporting

In the United States, the legal framework for reporting infectious diseases to public health authorities for investigation and control dates to 1878, when Congress authorized the Public Health Service to collect reports of cholera, smallpox, plague, and yellow fever from consuls overseas to implement quarantine measures to prevent introduction into the United States (6).

In 1951, the first conference of state epidemiologists determined which diseases should be nationally notifiable to the Public Health Service and later to CDC. This process continues today; the Council of State and Territorial Epidemiologists determines which diseases and conditions are designated as nationally notifiable to CDC, but each state and territory legally mandates reporting in its jurisdiction. Although the list comprises primarily infectious diseases, in 1995, the first noninfectious condition—elevated blood lead levels—was added (7).

Laboratory Data

Data from laboratories are critical for investigating infectious disease outbreaks. By law, most states require laboratories that identify causative agents of notifiable diseases to send case information electronically to state public health agencies. In addition, most states require laboratories to send cultures to the public health laboratory in their jurisdiction for confirmation, subtyping, and cataloging results in state and national databases. These data are invaluable for determining whether an apparent cluster of cases might be linked and require further investigation or caused by a random clustering of events. Genotyping data on specific infectious agents (e.g., Salmonella strains) produced by state public health laboratories are loaded to CDC’s PulseNet database to enable identification of cases across jurisdictions that might have a common source (Box 4.1) (9).

Ongoing Population Surveys

Ongoing population surveys are important for understanding the prevalence of health risk behaviors in the general population. The predominant survey conducted in all states is the Behavioral Risk Factor Surveillance System, a random-digit– dialed household survey of noninstitutionalized US adults. Other ongoing surveys include the Youth Risk Behavior Survey, Pregnancy Risk Assessment Monitoring System, and National Health and Nutrition Examination Survey. Several states conduct population-based food preference surveys; such surveys are valuable in assessing the background rate of consumption of various food items and can help the field epidemiologist determine whether a foodborne outbreak in which many case-patients report eating a particular food item needs to be investigated further.

Environmental Exposure Data

Distribution of Vectors

Many emerging infectious diseases are zoonotic in origin, so related data are needed. For example, understanding the distribution of vectors for each infection and patterns of the diseases in animals is paramount. During the 2016 epidemic of Zika virus infection, understanding the ecologic niche for the Aedes mosquito vector was important when investigating an increase in febrile rash illnesses (Box 4.2) (12).

Environmental Contaminants

Illness resulting from exposure to environmental contaminants is another area of public health importance requiring surveillance. For example, elevated childhood blood lead levels are a reportable condition, prompting investigation into possible environmental sources of lead. During 2014–2015, a sharp increase in the percentage of children with elevated blood lead levels in Flint, Michigan, resulted from exposure to drinking water after the city introduced a more corrosive water source containing higher levels of lead (Box 4.3) (14).

Additional Existing Sources of Data

Additional existing data sources can help identify cases, determine background rates of human illness, or assess exposures to disease-causing agents (e.g., pathogenic bacteria, vectors, environmental toxins) in a field investigation. Examples of clinical data sources include medical record abstraction, hospital discharge data (e.g., for cases of hemolytic uremic syndrome) (15), syndromic surveillance systems (16) (e.g., for bloody diarrhea during an Shiga toxin–producing Escherichia coli outbreak) (17), poison control center calls (e.g., exposure to white powder during anthrax-related events) (18), and school and work absenteeism records (e.g., New York City school absenteeism in students traveling to Mexico at the beginning of the influenza A[H1N1] pandemic) (19). Examples of data sources for assessing possible exposures include sales receipts (e.g., meals ordered online or food items purchased from a particular store) (20) and law enforcement data (e.g., drug seizures involving illicit fentanyl in conjunction with opioid overdose deaths due to fentanyl) (21).

Newer Sources of Data

Electronic health records (EHRs) appear to be a promising newer source of data for public health surveillance and for assessing the prevalence of disease or behavioral risk factors in the population seeking healthcare (22). Furthermore, EHRs contain potentially useful data on healthcare use, treatment, and outcomes of a disease—elements not typically assessed by more traditional public health data sources.

With the advent of personal computers in most households and smartphones in many pockets (23), epidemiologists are evaluating the utility of the Internet and social media as data sources for identifying outbreaks or case finding during outbreak investigations. Many of these data sources are promising in theory, and epidemiologists are busy evaluating their utility in outbreak detection and case identification. Examples of these data sources include Google hits for antidiarrheal or antipyretic medications to detect outbreaks of gastrointestinal illness or influenza (24) and social media (e.g., Facebook, Twitter, blogs) to identify contacts of patients with sexually transmitted infections, restaurants where case-patients ate or products they ate before becoming sick, or levels of disease activity during influenza season (25). Online order forms or electronic grocery receipts may be useful in identifying names of customers to contact to determine illness status.

Box 4.1
Multistate Outbreak Of Salmonella Typhimurium Infections Associated With Peanut Butter– Containing Products, 2008– 2009

Public Health Problem: In November 2008, CDC’s PulseNet staff noted a multistate cluster of Salmonella enterididis serotype Typhimurium isolates with an unusual DNA fingerprint (pulsed-field gel electrophoresis [PFGE] pattern). The outbreak grew to involve 714 case-patients in 46 states; 166 (23%) were hospitalized and 9 (1%) died.

Public Health Response: The broad scope of the outbreak and severity of illness required coordination of data collection across jurisdictions and use of multiple data sources to identify a common source.

  • Case ascertainment: Salmonella is a reportable infection in all 50 states;

laboratory subtyping of isolates (i.e., PFGE) identified outbreak-associated cases across multiple jurisdictions.

  • Data collection: The initial investigation included detailed, open-ended

questions to generate hypotheses; case–control studies used common questionnaires of 300 possible food items; studies identified peanut butter products as common exposure.

  • Product tracing: Environmental testing of unopened packages and Food and Drug Administration product trace-back identified a single brand of peanut butter products.

Take-Home Point: This outbreak involved many jurisdictions and evolved over a several months. Coordination of epidemiologic studies (e.g., common methods, questionnaires), having a national database of PFGE patterns to identify outbreak-associated isolates, and an FDA product trace-back were key to identifying the cause, which resulted in a widespread product recall (and eventual criminal liability of the peanut butter producer).

Source: Reference 8.

Box 4.2
Zika Virus Infection: An Emerging Vectorborne Disease

Public Health Problem: In early 2015, an outbreak of Zika virus, transmitted by Aedes spp. mosquitoes was identified in northeastern Brazil. This area also had been affected by an outbreak of dengue fever. By September, an increased number of infants with microcephaly was reported from Zika virus–affected areas.

Public Health Response:

  • Laboratory testing: To identify the association between Zika virus and illness, samples were tested of amniotic fluid from two women whose fetuses had microcephaly and from several body tissues of an infant with microcephaly who died. Samples tested positive for Zika virus RNA.
  • Case investigation: Brazil’s Ministry of Health developed a protocol for investigating infants with microcephaly and pregnant women infected with Zika virus. Data included pregnancy history (exposure, symptom, laboratory test) and physical examination. A standard case definition for microcephaly was developed.
  • Distribution of the mosquito vector throughout the Americas led to recognition of the potential further spread of the virus.

Take-Home Point: Increase in an unusual syndrome (microcephaly) prompted government health agencies to coordinate efforts to collect systematic case data, develop a standard case definition to use across jurisdictions, and conduct uniform laboratory testing for possible etiologic agents. Since this outbreak was recognized, the epidemic has spread through the mosquito vector as well as through sexual and perinatal transmission to multiple countries and continents around the world.

Source: References 10, 11.

Box 4.3
Environmental: Childhood Lead Poisoning And Drinking Water

Public Health Problem: During April 2014–October 2015, residents of Flint, Michigan, were exposed to elevated lead levels in drinking water after the water source was switched from the Detroit Water Authority from Lake Huron to the Flint Water System (FWS) from the Flint River. Because corrosion control was not used at the FWS water treatment plant, the levels of lead in Flint tap water increased over time. Exposure to lead has significant adverse health effects (e.g., developmental delays) particularly for young children with developing brains.

Public Health Response:

  • To assess the impact of drinking contaminated water on blood lead levels (BLLs), the distribution of BLLs 5 μg/dL or higher among children less than 6 years of age before, during, and after the switch in water source was assessed.
  • Among 9,422 blood lead tests conducted during April 2013–March 2016, 284 (3.0%) BLLs were 5 μg/dL or higher; the probability of having BLLs of 5 μg/dL or greater was 46% higher during the period after the switch from Detroit Water Authority to FWS than before the switch to FWS. The probability of having an elevated BLL when the FWS was the source of water remained after controlling for covariates (e.g., age, race, season).

Take-Home Point: Collecting data over time and understanding changes in environmental exposures (e.g., various drinking water sources) was key to identifying a source of communitywide elevated BLL in children and supporting recommended control measures (e.g., filters on tap water).

Source: Reference 13.

Determine Data Collection Method
Box 4.4
Comparison Of Survey Methods In Norovirus Outbreak Investigation

Public Health Problem: To support a rapid response, field epidemiologists need to determine the most efficient, timely, and cost-effective method for data collection during an outbreak. In September 2009, the Oregon Public Health Division investigated an outbreak of gastroenteritis that occurred among more than 2,000 participants of a week-long, 475-mile bicycle ride. Participants came from throughout Oregon and other states, and were of higher socioeconomic status and technology-savvy. Norovirus (GII) infection was confirmed as the causative agent.

Public Health Response:

  • To determine the most efficient means of collecting data, epidemiologists administered a questionnaire using Internet-and telephone-based interview methods to directly compare data regarding response rates, attack rates, and risk factors for illness.
  • Survey initiation, timeliness of response, and attack rates were comparable.
  • Participants were less likely to complete the Internet surveys.
  • The Internet survey took more up-front time and resources to prepare but less staff time for data collection and data entry.

Take-Home Points: Internet-based surveys permit efficient data collection but should be designed to maximize complete responses. The field epidemiologist must understand the characteristics of the study population and their ability and willingness to respond to various survey methods (e.g., access computers and Internet-based surveys).

Source: Reference 26.

After evaluating whether existing data can address the study objectives, the field epidemiologist must determine whether additional data need to be collected and, if so, what and how (Box 4.4). This chapter focuses on the collection of quantitative data (see Chapter 10 for qualitative data collection). Information was drawn in part from the “Surveys and Sampling” chapter in the earlier edition of this book (27) and from Designing Clinical Research (28).

An important initial step in collecting data as part of a field investigation is determining the mode of data collection (e.g., self-administered, mailed, phone or in-person interview, online survey) (29). The mode in part dictates the format, length, and style of the survey or questionnaire.

Factors to consider when deciding on data collection methods include the following:

  • The feasibility of reaching participants through different modes. What type of contact information is available? Do participants have access to phones, mailing addresses, or computers?
  • Response rate. Mailed and Internet surveys traditionally yield lower response rates than phone surveys; however, response rate for phone surveys also has declined during the past decade (30).
  • Sensitivity of questions. Certain sensitive topics (e.g., sexual behaviors) might be better for a self-administered survey than a phone survey.
  • Length and complexity of the survey. For example, for a long survey or one with complex skip patterns, an interviewer-administered survey might be better than a self-administered one.
  • Control over completeness and order of questions. Interviewer-administered surveys provide more control by the interviewer than self-administered ones.
  • Cost (e.g., interviewer time). A mixed mode of survey administration (e.g., mailed survey with phone follow-up) might be less expensive to conduct than a phone-only survey, but it also increases study complexity.
Develop the Questionnaire or Survey Instrument

Before developing a survey instrument, review the investigation objectives (i.e., study questions) to identify the specific variables that need to be collected to answer the questions. Similar to developing a protocol, the most efficient and effective means for developing a survey instrument might be to identify an existing survey questionnaire or template that can be adapted for current use. Pay special attention to ensuring that survey instruments can be used across multiple sites in the event that the outbreak involves multiple jurisdictions.

Information and variables to include in a survey instrument are

  • Unique identifier for each record.
  • Date questionnaire is completed.
  • A description of the purpose of the investigation for participants.
  • Participant demographics.
  • Outcome measures.
  • Measures of exposure.
  • Possible confounders and effect modifiers.
  • Information about who participants should contact with questions.

If the survey is interviewer-administered, it should include fields for interviewer name and interview date. A cover sheet with attempts to contact, code status of interview (e.g., completed), and notes can be helpful.

In writing survey questions, borrow from other instruments that have worked well (e.g., that are demonstrated to be reliable and accurate) whenever possible. Write questions that are clear and use vocabulary understandable to the study population and that contain only one concept.

Three basic types of questions are

  • Close-ended questions. These questions ask participants to choose from predetermined response categories. An “other (specify) _ ” field can capture any other responses. They are quick for participants to respond to and easy to analyze.
  • Open-ended questions. These questions enable participants to answer in their own words and can provide rich information about new topics or context to close-ended questions; however, responses to these questions can be time-consuming to code and analyze.
  • Precoded, open-ended questions. These questions can be used on interviewer-administered surveys. They enable participants to answer unprompted, but the interviewer selects from precoded response categories.

Close-ended questions usually are used for outbreak investigations. They can have various response categories (e.g., nominal, numeric, Likert scales). Consider including “don’t know” and “refused” response categories. Ideally, code response categories in advance and on the instrument to facilitate data entry and analysis (e.g., yes = 1, no = 0). Close-ended questions could include cascading questions, which can be an efficient way to get more detailed information as one filters down through a hierarchy of questions (e.g., first you ask the participant’s state of residence, then a menu of that state’s counties drops down).

In compiling questions, consider the flow, needed skip patterns, and order (e.g., placing more sensitive questions toward the end). For self-administered surveys, the format needs to be friendly, well-spaced, and easy to follow, with clear instructions and definitions.

Content experts should review the draft questionnaire. The epidemiologist should pilot the questionnaire with a few colleagues and members of the study population and edit as necessary. This will save time in the long run; many epidemiologists have learned the hard way that a survey question was not clear or was asking about more than one concept, or that the menu of answers was missing a key response category.

Calculate the Sample Size and Select the Sample

Good sample selection can help improve generalizability of results and ensure sufficient numbers of study participants. Information about determining whom to select is covered in study design discussions in Chapter 7, but sample size is worth briefly mentioning here. If the study comprises the entire study population, it is a census; a subset of the study population is a sample. A sample can be selected through probability sampling or nonprobability sampling (e.g., purposive sampling or a convenience sample). Probability sampling is a better choice for statistical tests and statistical inferences. For probability sampling procedures other than a simple random sample (e.g., stratified or cluster sampling), consult with a survey sampling expert.

How large a sample to select depends on resources, study timeline (generally the larger the sample, the more expensive and time-consuming), the analyses to be conducted, and the effect size you want to detect. For example, to detect a difference in proportions between two groups using a chi-square test, consider how much of a difference needs to be detected to be meaningful.

Review Legal Authority, Rules, and Policies Governing Data Collection

Generally, government public health agencies have the authority to access healthcare system data (with justification). The Health Insurance Portability and Accountability Act (HIPAA) of 1996 (31) has specific language allowing for the use of personal health information by government agencies to perform public health activities.

Nonetheless, accessing data sources that are not specifically collected and maintained by public health authorities can be challenging. Many outside parties are not familiar with the legal authority that public health agencies have to investigate and control diseases and exposures that affect the public’s health and safety. The field epidemiologist may find it useful to consult his or her agency’s attorney for legal counsel regarding data collection during a specific public health event.

Other scenarios that challenge epidemiologists trying to access external data include concern by healthcare systems that requests for data on hospitalizations, clinic visits, or emergency department visits breach privacy of protected health information; concern by school officials that access to information about children during an outbreak associated with a school activity violates provisions of the Family Educational Rights and Privacy Act (32); and concerns by businesses that case-patients in an outbreak associated with a particular food item or establishment might pursue legal action or lawsuits. Legal counsel can help address these concerns.

Collect the Data

Having a written data collection section as part of the overall study protocol is essential. As with survey development, borrowing from previous data collection protocols can be helpful. This protocol can include the following:

  • Introductory letter to participants.
  • Introductory script for interviewers.
  • Instructions for recruiting and enrolling participants in the survey, including obtaining consent for participation. Although field epidemiologic investigations of an urgent public health problem are legally considered to be public health practice and not research (33), including elements of informed consent might be useful to ensure that participants are aware of their rights, participation is voluntary, and the confidentiality of their health information will be protected (see the US Department of Health and Human Services informed consent checklist [34]).
  • Instructions on conducting the interviews, especially if there are multiple interviewers: Include the importance of reading the questions verbatim, term definitions, the pace of the interview, answers to frequently asked questions, and ways to handle urgent situations.
  • Instructions related to protection of participants (e.g., maintaining confidentiality, data security).

Train staff collecting data on the protocol, reviewing instructions carefully and modifying as needed. Involve interviewers in pilot testing the survey instrument and provide feedback. Have a plan for quality checks during questionnaire administration (if the survey is not computer-based). Review the first several completed surveys to check completeness of fields, inconsistencies in responses, and how well skip patterns work. In addition, debrief interviewers about issues they might have encountered (e.g., if participants cannot understand certain questions, those questions might need rewording).

Similarly, data entry must have quality checks. When starting data entry, check several records against the completed survey instrument for accuracy and consider double data entry of a sample of surveys to check for errors.

Subsequent chapters discuss the details of data analysis. However, it is important to consider conducting some preliminary data analysis even before data collection is complete. Understanding how participants are interpreting and answering questions can enable corrections to the wording before it is too late. Many an epidemiologist has bemoaned a misinterpreted question, confusing survey formatting, or a missing confounding variable resulting in study questions without meaningful results.

Issues and Challenges with Data Collection

The important attributes of a public health surveillance system can and should be applied to data collected in response to an urgent event (see Introduction). In field investigations, tradeoffs exist between these attributes; for example, a more timely collection of data might lead to lower quality data, fewer resources might mean less complete data, and retrospective analysis of preexisting data might be more cost-effective, although prospective data collection from case-patients might enable more targeted questions about specific exposures.

The media can play important and sometimes conflicting roles during an outbreak. The media can be useful in alerting the public to an outbreak and assisting with additional case finding. In contrast, if the public believes an outbreak resulted from eating a specific food item or eating at a specific restaurant, that belief can preclude the field epidemiologist’s ability to obtain accurate data after a press release has been issued because it might cause self-report bias among study participants. In addition, with the current calls for government transparency and accountability, field epidemiologists might be reluctant to release information too early, thereby risking additional exposures to the suspected source.

Changes in technology also challenge data collection. Such changes range from laboratories moving to nonculture diagnostic methods for isolating infectious pathogens, which decreases the epidemiologist’s ability to link cases spread out in space and time, to increasing use of social media to communicate, which limits response rates from time-honored methods of data collection, such as landline telephones. Conversely, many new sources of data are opportunities made possible by the expanded use of computer technology by individuals, businesses, and health systems. It is incumbent upon field epidemiologists to adapt to these changes to be able to investigate and control urgent public health threats.


Responding to urgent public health issues expeditiously requires balancing the speed of response with the need for accurate data and information to support the implementation of control measures. Adapting preexisting protocols and questionnaires will facilitate a timely response and consistency across jurisdictions. In most epidemiologic studies the activities are not done linearly and sequentially; rather, the steps frequently are conducted in parallel and are iterative, with results informing edits or amendments. The analyses and results are only as good as the quality of the data collected (remember GIGO!).

  1. CDC. Updated guidelines for evaluating public health surveillance systems. MMWR. 2001;50 (RR13):1–35.
  2. Declich S, Carter AO. Public health surveillance: historical origins, methods, and evaluation. Bull World Health Organ. 1994:72: 285–304.
  3. Lilienfeld DE. Celebration: William Farr (1807–1883)—an appreciation on the 200th anniversary of his birth. Int J Epidemiol. 2007;36:985–7.
  4. Cameron D, Jones IG. John Snow, the Broad Street pump and modern epidemiology. Int J Epidemiol. 1983;12:393–6.
  5. Westat. Electronic Death Reporting System online reference manual. 2016. icon
  6. CDC. National Notifiable Diseases Surveillance System (NNDSS). History.
  7. CDC. National Notifiable Diseases Surveillance System (NNDSS). 2017 National notifiable conditions (historical).
  8. Cavallaro E, Date K, Medus C, et al. Salmonella Typhimurium infections associated with peanut products. N Engl J Med. 2011;365:601–10.
  9. Swaminathan B, Barrett TJ, Hunter SB, et al. PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerg Infect Dis. 2001;7:382–9.
  10. Schuler-Faccini L, Ribeiro EM, Feitosa IML, et al. Possible association between Zika virus infection and microcephaly—Brazil, 2015. MMWR. 2016;65:59–62.
  11. França GVA, Schuler-Faccini L, Oliveira WK, et al. Congenital Zika virus syndrome in Brazil: a case series of the first 1501 livebirths with complete investigation. Lancet. 2016;388:891–7.
  12. Gardner L, Chen N, Sarkar S. Vector status of Aedes species determines geographical risk of autochthonous Zika virus establishment. PLoS Negl Trop Dis. 2017;11:e0005487.
  13. Kennedy C, Yard E, Dignam T, et al. Blood lead levels among children aged <6 years—Flint, Michigan, 2013–2016. MMWR. 2016;65:650–4.
  14. Hanna-Attisha M, LaChance J, Salder RC, Schnepp AC. Elevated blood lead levels in children associated with the Flint drinking water crisis: a spatial analysis of risk and public health response. Am J Public Health. 2016;106:283–90.
  15. Chang H-G, Tserenpuntsag B, Kacica M, Smith PF, Morse DL. Hemolytic uremic syndrome incidence in New York. Emerg Infect Dis. 2004;10:928–31.
  16. Yoon PW, Ising AI, Gunn JE. Using syndromic surveillance for all-hazards public health surveillance: successes, challenges, and the future. Public Health Rep. 2017;132 (suppl I):3S–6S.
  17. Hines JA, Bancroft J, Powell M, Hedberg K. Case finding using syndromic surveillance data during an outbreak of Shiga toxin–producing Escherichia coli O26 infections, Oregon, 2015. Public Health Rep. 2017;132:448–50.
  18. Watson WA, Litovitz T, Rubin C, et al. Toxic exposure surveillance system. MMWR. 2004;53 (Suppl):262.
  19. CDC. Swine-origin influenza A (H1N1) virus infections in a school—New York City, April 2009. MMWR. 2009;58;1–3.
  20. Wagner MM, Robinson JM, Tsui F-C, Espino JU, Hogan WR. Design of a national retail data monitor for public health surveillance. J Am Med Inform Assoc. 2003;10:409–18.
  21. Rudd RA, Aleshire N, Zibbell JE, Gladden RM. Increases in drug and opioid overdose deaths—United States, 2000–2014. MMWR. 2016;64:1378–82.
  22. Birkhead GS, Klompas M, Shah NR. Uses of electronic health records for public health surveillance to advance public health. Ann Rev Public Health. 2015;36:345–9.
  23. File T, Ryan C. Computer and internet use in the United States: 2013. American Community Survey Reports.
  24. Thompson LH, Malik MT, Gumel A, Strome T, Mahmud SM. Emergency department and “Google flu trends” data as syndromic surveillance indicators for seasonal influenza. Epidemiol Infect. 2014;42:2397–405.
  25. Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PLoS ONE. 2011;6:e19467.
  26. Oh JY, Bancroft JE, Cunningham MC, et al. Comparison of survey methods in norovirus outbreak investigation, Oregon, USA, 2009. Emerg Infect Dis. 2010;16:1773–6.
  27. Herold JM. Surveys and sampling. In: Greg M, ed. Field epidemiology. 3rd ed. New York: Oxford University Press; 2008:97–117.
  28. Hulley SB, Cummings SR, Browner WS, Grady DG, Newman TB. Designing clinical research. 4th ed. Philadelphia: Lippincott Williams & Wilkins; 2013:277–91.
  29. Dillman DA, Smyth JD, Christian LM. Internet, phone, mail and mixed-mode surveys: the tailored design method. 4th ed. Hoboken, NJ: John Wiley; 2014.
  30. Czajka JL, Beyler A. Declining response rates in federal surveys: trends and implications. Mathematic Policy Research Report. 2016. iconexternal icon
  31. Health Insurance Portability and Accountability Act of 1996. Pub. L. 104–191, 110 Stat. 1936 (August 21, 1996).
  32. Family Educational Rights and Privacy Act, 20 USC § 1232g; 34 CFR Part 99 (1974).
  33. CDC. HIPAA privacy rule and public health: guidance from CDC and the US Department of Health and Human Services. MMWR. 2003;52:1–12.
  34. US Department of Health and Human Services, Office for Human Research Protections. Informed consent checklist (1998). icon