Designing and Conducting Analytic Studies in the Field

Brendan R. Jackson And Patricia M. Griffin


Analytic studies can be a key component of field investigations, but beware of an impulse to begin one too quickly. Studies can be time- and resource-intensive, and a hastily constructed study might not answer the correct questions. For example, in a foodborne disease outbreak investigation, if the culprit food is not on your study’s questionnaire, you probably will not be able to implicate it. Analytic studies typically should be used to test hypotheses, not generate them. However, in certain situations, collecting data quickly about patients and a comparison group can be a way to explore multiple hypotheses. In almost all situations, generating hypotheses before designing a study will help you clarify your study objectives and ask better questions.

Generating Hypotheses

The initial steps of an investigation, described in previous chapters, are some of your best sources of hypotheses. Key activities include the following:

  • Describe time, place, and person. Descriptive epidemiology (see Chapter 6) can help develop hypotheses. For example,
    • By examining the sex distribution among persons in outbreaks, US enteric disease investigators have learned to suspect a vegetable as the source when most patients are women. (Of course, generalizations do not always hold true!)
    • In an outbreak of bloodstream infections caused by Serratia marcescens among patients receiving parenteral nutrition (food administered through an intravenous catheter), investigators had a difficult time finding the source until they noted that none of the 19 cases were among children. Further investigation of the parenteral nutrition administered to adults but not children in that hospital identified contaminated amino acid solution as the source (1).
  • Focus on outliers. Give extra attention to the earliest and latest cases on an epidemic curve and to persons who recently visited the neighborhood where the outbreak is occurring. Interviews with these patients can yield important clues (e.g., by identifying the index case, secondary case, or a narrowed list of common exposures).
  • Determine sources of similar outbreaks. Consult health department records, review the literature, and consult experts to learn about previous sources. Be mindful that new sources frequently occur, given ever-changing social, behavioral, and commercial trends.
  • Conduct a small number of in-depth, open-ended interviews. When a likely source is not quickly evident, conducting in-depth (often >1 hour), open-ended interviews with a subset of patients (usually 5 to 10) or their caregivers can be the best way to identify possible sources. It helps to begin with a semistructured list of questions designed to help the patient recall the events and exposures of every day during the incubation period. The interview can end with a “shotgun” questionnaire (see activity 6) (Box 7.1). A key component of this technique is that one investigator ideally conducts, or at least participates in, as many interviews as possible (five or more) because reading notes from others’ interviews is no substitute for soliciting and hearing the information first-hand. For example, in a 2009 Escherichia coli O157 outbreak, investigators were initially unable to find the source through general and targeted questionnaires. During open-ended interviews with five patients, the interviewer noted that most reported having eaten strawberries, a particular type of candy, and uncooked prepackaged cookie dough. An analytic study was then conducted that included questions about these exposures; it confirmed cookie dough as the source (3).
  • Ask patients what they think. Patients can have helpful thoughts about the source of their illness. However, be aware that patients often associate their most recent food exposure (e.g., a meal) with illness, whereas the inciting exposure might have been long before.
  • Consider administering a shotgun questionnaire. Such questionnaires, which typically ask about hundreds of possible exposures, are best used on a limited number of patients as part of hypothesis-generating interviews. After generating hypotheses, investigators can create a questionnaire targeted to that investigation. Although not an ideal method, shotgun questionnaires can be used by multiple interviewers to obtain data about large numbers of patients (Box 7.1).
Box 7.1
Generating a Hypothesis in a Listeriosis Outbreak Linked to Caramel Apples

In November 2014, a US surveillance system for foodborne diseases (PulseNet) detected a cluster (i.e., a possible outbreak) of listeriosis cases based on similar-appearing Listeria monocytogenes isolates by pulsed-field gel electrophoresis of the isolates. No suspected foods were identified through routine patient interviews by using a Listeria-specific questionnaire with approximately 40 common food sources of listeriosis (e.g., soft cheese and deli meat). The outbreak’s descriptive epidemiology offered no clear leads: the sex distribution was nearly even, the age spectrum was wide, and the case-fatality rate of approximately 20% was typical. Notably, however, 3 of the 35 cases occurred among previously healthy school-aged children, which is highly unusual for listeriosis. Most cases occurred during late October and early November.

Investigators began reinterviewing patients by using a hypothesis-generating shotgun questionnaire with more than 500 foods, but it did not include caramel apples. By comparing the first nine patient responses with data from a published survey of food consumption, strawberries and ice cream emerged as hypotheses. However, several interviewed patients denied having eaten these foods during the month before illness. An investigator then conducted lengthy, open-ended interviews with patients and their family members. During one interview, he asked about special foods eaten during recent holidays, and the patient’s wife replied that her husband had eaten prepackaged caramel apples around Halloween. Although produce items had been implicated in past listeriosis outbreaks, caramel apples seemed an unlikely source. However, the interviewer took note of this connection because he had previously interviewed another patient who reported having eaten caramel apples. This event underscores the importance of one person conducting multiple interviews because that person might make subtle mental connections that may be missed when reviewing other interviewers’ notes. In fact, several other investigators listening to the interview noted this exposure—among hundreds of others—but thought little of it.

In this investigation, the finding of high strawberry and ice cream consumption among patients, coupled with the timing of the outbreak during a holiday period, helped make a sweet food (i.e., caramel apples) seem more plausible as the possible source.

To explore the caramel apple hypothesis, investigators asked five other patients about this exposure, and four reported having eaten them. On the basis of these initial results, investigators designed and administered a targeted questionnaire to patients involved in the outbreak, as well as to patients infected with unrelated strains of L. monocytogenes (i.e., a case–case study). This study, combined with testing of apples and the apple packing facility, confirmed that caramel apples were the source (2). Had a single interviewer performed multiple open-ended interviews to generate hypotheses before the shotgun questionnaire, the outbreak might have been solved sooner.

Study Designs for Testing Hypotheses

As evident in public health and clinical guidelines, randomized controlled trials (e.g., trials of drugs, vaccines, and community-level interventions) are the reference standard for epidemiology, providing the highest level of evidence. However, such studies are not possible in certain situations, including outbreak investigations. Instead, investigators must rely on observational studies, which can provide sufficient evidence for public health action. In observational studies, the epidemiologist documents rather than determines the exposures, quantifying the statistical association between exposure and disease. Here again, the key when designing such studies is to obtain a relevant comparison group for the patients (Box 7.2).

Box 7.2
Defining Exposure and Disease

Because field analytic studies are used to quantify the association between exposure and disease, defining what is meant by exposure and disease is essential. Exposure is used broadly, meaning demographic characteristics, genetic or immunologic makeup, behaviors, environmental exposures, and other factors that might influence a person’s risk for disease. Because precise information can help accurately estimate an exposure’s effect on disease, exposure measures should be as objective and standard as possible. Developing a measure of exposure can be conceptually straightforward for an exposure that is a relatively discrete event or characteristic—for example, whether a person received a spinal injection with steroid medication compounded at a specific pharmacy or whether a person received a typhoid vaccination during the year before international travel. Although these exposures might be straightforward in theory, they can be subject to interpretation in practice. Should a patient injected with a medication from an unknown pharmacy be considered exposed? Whatever decision is made should be documented and applied consistently.

Additionally, exposures often are subject to the whims of memory. Memory aids (e.g., restaurant menus, vaccination cards, credit card receipts, and shopper cards) can be helpful. More than just a binary yes or no, the dose of an exposure can also be enlightening. For example, in an outbreak of fungal bloodstream infections linked to contaminated intravenous saline flushes administered at an oncology clinic, affected patients had received a greater number of flushes than unaffected patients (4). Similarly, in an outbreak of Listeria monocytogenes infections, the association with deli meat became apparent only when the exposure evaluated was consumption of deli meat more than twice a week (5).

Defining disease (e.g., does a person have botulism?) might sound simple, but often it is not; read more about making and applying disease case definitions in Chapter 3.

Types of Observational Studies for Testing Hypotheses

Three types of observational studies are commonly used in the field. All are best performed by using a standard questionnaire specific for that investigation, developed on the basis of hypothesis-generating interviews.

Observational Study Type 1: Cohort

In concept, a cohort study, like an experimental study, begins with a group of persons without the disease under study, but with different exposure experiences, and follows them over time to find out whether they experience the disease or health condition of interest. However, in a cohort study, each person’s exposure is merely recorded rather than assigned randomly by the investigator. Then the occurrence of disease among persons with different exposures is compared to assess whether the exposures are associated with increased risk for disease. Cohort studies can be prospective or retrospective.

Prospective Cohort Studies

A prospective cohort study enrolls participants before they experience the disease or condition of interest. The enrollees are then followed over time for occurrence of the disease or condition. The unexposed or lowest exposure group serves as the comparison group, providing an estimate of the baseline or expected amount of disease. An example of a prospective cohort study is the Framingham Heart Study. By assessing the exposures of an original cohort of more than 5,000 adults without cardiovascular disease (CVD), beginning in 1948 and following them over time, the study was the first to identify common CVD risk factors (6). Each case of CVD identified after enrollment was counted as an incident case. Incidence was then quantified as the number of cases divided by the sum of time that each person was followed (incidence rate) or as the number of cases divided by the number of participants being followed (attack rate or risk or incidence proportion). In field epidemiology, prospective cohort studies also often involve a group of persons who have had a known exposure (e.g., survived the World Trade Center attack on September 11, 2001 [7]) and who are then followed to examine the risk for subsequent illnesses with long incubation or latency periods.

Retrospective Cohort Studies

A retrospective cohort study enrolls a defined participant group after the disease or condition of interest has occurred. In field epidemiology, these studies are more common than prospective studies. The population affected is often well-defined (e.g., banquet attendees, a particular school’s students, or workers in a certain industry). Investigators elicit exposure histories and compare disease incidence among persons with different exposures or exposure levels.

Observational Study Type 2: Case–Control

In a case–control study, the investigator must identify a comparison group of control persons who have had similar opportunities for exposure as the case-patients. Case–control studies are commonly performed in field epidemiology when a cohort study is impractical (e.g., no defined cohort or too many non-ill persons in the group to interview). Whereas a cohort study proceeds conceptually from exposure to disease or condition, a case–control study begins conceptually with the disease or condition and looks backward at exposures. Excluding controls by symptoms alone might not guarantee that they do not have mild cases of the illness under investigation. Table 7.1 presents selected key differences between a case–control and retrospective cohort study.

Table 7.1
Benefits and drawbacks of three observational study types commonly used in field investigations
Feature Retrospective cohort study Case–control study Case–case study
Sample size Larger Smaller Smaller
Costs More (because of size) Less Less
Study time Short Short Short
If disease is rare Inefficient Efficient Efficient (if comparison cases already identified)
If exposure is rare Efficient Inefficient Inefficient
If multiple exposures are relevant Often can examine Can examine Can examine
If patients have multiple outcomes Can examine Cannot examine Cannot examine
Natural history Can ascertain Cannot ascertain Cannot ascertain
Disease risk Can measure Cannot measure Cannot measure
Recall bias Potential problem Potential problem Generally less of a problem
Selection bias Potential problem Potential problem Potential problem
If population is not well-defined Difficult Advantageous Advantageous

Observational Study Type 3: Case–Case

In case–case studies, a group of patients with the same or similar disease serve as a comparison group (8). This method might require molecular subtyping of the suspected pathogen to distinguish outbreak-associated cases from other cases and is especially useful when relevant controls are difficult to identify. For example, controls for an investigation of Listeria illnesses typically are patients with immunocompromising conditions (e.g., cancer or corticosteroid use) who might be difficult to identify among the general population. Patients with Listeria isolates of a different subtype than the outbreak strain can serve as comparisons to help reduce bias when comparing food exposures. However, patients with similar illnesses can have similar exposures, which can introduce a bias, making identifying the source more difficult. Moreover, other considerations should influence the choice of a comparison group. If most outbreak-associated case-patients are from a single neighborhood or are of a certain race/ethnicity, other patients with listeriosis from across the country will serve as an inadequate comparison group.

Selection of Controls in Case–Control Studies

Considerations for Selecting Controls

Selecting relevant controls is one of the most important considerations when designing a case–control study. Several key considerations are presented here; consult other resources for in-depth discussion (9,10). Ideally, controls should

  • Thoroughly reflect the source population from which case-patients arose, and
  • Provide a good estimate of the level of exposure one would expect from that population. Sometimes the source population is not so obvious, and a case–control study using controls from the general population might be needed to implicate a general exposure (e.g., visiting a specific clinic, restaurant, or fair). The investigation can then focus on specific exposures among persons with the general exposure (see also next section).

Controls should be chosen independently of any specific exposure under evaluation. If you select controls on the basis of lack of exposure, you are likely to find an association between illness and that exposure regardless of whether one exists. Also important is selecting controls from a source population in a way that minimizes confounding (see Chapter 8), which is the existence of a factor (e.g., annual income) that, by being associated with both exposure and disease, can affect the associations you are trying to examine.

When trying to enroll controls who reflect the source population, try to avoid overmatching (i.e., enrolling controls who are too similar to case-patients, resulting in fewer differences among case-patients and controls than ought to exist and decreased ability to identify exposure–disease associations). When conducting case–control studies in hospitals and other healthcare settings, ensure that controls do not have other diseases linked to the exposure under study.

Commonly Used Control Selection Methods

When an outbreak does not affect a defined population (e.g., potluck dinner attendees) but rather the community at large, a range of options can be used to determine how to select controls from a large group of persons.

  • Random-digit dialing. This method, which involves selecting controls by using a system that randomly selects telephone numbers from a directory, has been a staple of US outbreak investigations. In recent years, however, declining response rates because of increasing use of caller identification and cellular phones and lack of readily available directory listings of cellular phone numbers by geographic area have made this method increasingly difficult. Even when this method was most useful, often 50 or more numbers needed to be dialed to reach one household or person who both answered and provided a usable match for the case-patient. Commercial databases that include cellular phone numbers have been used successfully to partially address this problem, but the method remains time-consuming (11).
  • Random or systematic sampling from a list. For investigations in settings where a roster is available (e.g., attendees at a resort on certain dates), controls can be selected by either random or systematic sampling. Government records (e.g., motor vehicle, voter, or tax records) can provide lists of possible controls, but they might not be representative of the population being studied (11). For random sampling, a table or computer-generated list of random numbers can be used to select every nth persons to contact (e.g., every 12th or 13th).
  • Neighborhood. Recruiting controls from the same neighborhood as case-patients (i.e., neighborhood matching) has commonly been used during case–control studies, particularly in low-and middle-income countries. For example, during an outbreak of typhoid fever in Tajikistan (12), investigators recruited controls by going door-to-door down a street, starting at a case-patient’s house; a study of cholera in Haiti used a similar method (13). Typically, the immediately neighboring households are skipped to prevent overmatching.
  • Patients’ friends or relatives. Using friends and relatives as controls can be an effective technique when the characteristics of case-patients (e.g., very young children) make finding controls by a random method difficult. Typically, the investigator interviews a patient or his or her parent, then asks for the names and contact information for more friends or relatives who are needed as controls. One advantage is that the friends of an ill person are usually willing to participate, knowing their cooperation can help solve the puzzle. However, because they can have similar personal habits and preferences as patients, their exposures might be similar. Such overmatching can decrease the likelihood of finding the source of the illness or condition.
  • Databases of persons with exposure information. Sources of data on persons with exposure information include survey data (e.g., FoodNet Population Survey [14]), public health databases of patients with other illnesses or a different subtype of the same illness, and previous studies. (Chapter 4 describes additional sources.)

When considering outside data sources, investigators must determine whether those data provide an appropriate comparison group. For example, persons in surveys might differ from case-patients in ways that are impossible to determine. Other patients might be so similar to case-patients that risky exposures are unidentifiable, or they might be so different that exposures identified as risks are not true risks.

Matching in Case–Control Studies

To help control for confounding, controls can be matched to case-patients on characteristics specified by investigators, including age group, sex, race/ethnicity, and neighborhood. Such matching does not itself reduce confounding, but it enables greater efficiency when matched analyses are performed that do (15). When deciding to match, however, be judicious. Matching on too many characteristics can make controls difficult to find (making a tough process even harder). Imagine calling hundreds of random telephone numbers trying to find a man of a particular ethnicity aged 50–54 years who is then willing to answer your questions. Also, remember not to match on the exposure of interest or on any other characteristic you wish to examine. Matched case–control study data typically necessitate a matched analysis (e.g., conditional logistic regression) (15).

Matching Types

The two main types of matching are pair matching and frequency matching.

Pair Matching

In pair matching, each control is matched to a specific case-patient. This method can be helpful logistically because it allows matching by friends or relatives, neighborhood, or telephone exchange, but finding controls who meet specific criteria can be burdensome.

Frequency Matching

In frequency matching, also called category matching, controls are matched to case-patients in proportion to the distribution of a characteristic among case-patients. For example, if 20% of case-patients are children aged 5–18 years, 50% are adults aged 19–49 years, and 30% are adults 50 years or older, controls should be enrolled in similar proportions. This method works best when most case-patients have been identified before control selection begins. It is more efficient than pair matching because a person identified as a possible control who might not meet the criteria for matching a particular case-patient might meet criteria for one of the case-patient groups.

Number of Controls

Most field case–control studies use control-to-case-patient ratios of 1:1, 2:1, or 3:1. Enrolling more than one control per case-patient can increase study power, which might be needed to detect a statistically significant difference in exposure between case-patients and controls, particularly when an outbreak involves a limited number of cases. The incremental gain of adding more controls beyond three or four is small because study power begins to plateau. Note that not all case-patients need to have the same number of controls. Sample size calculations can help in estimating a target number of controls to enroll, although sample sizes in certain field investigations are limited more by time and resource constraints. Still, estimating study power under a range of scenarios is wise because an analytic study might not be worth doing if you have little chance of detecting a statistically significant association. Sample size calculators for unmatched case–control studies are available at http://www.openepi.comexternal icon and in the StatCalc function of Epi Info (

More than One Control Group

Sometimes the choice of a control group is so vexing that investigators decide to use more than one type of control group (e.g., a hospital-based group and a community group). If the two control groups provide similar results and conclusions about risk factors for disease, the credibility of the findings is increased. In contrast, if the two control groups yield conflicting results, interpretation becomes more difficult.

Example: Using an Analytic Study to Solve an Outbreak at a Church Potluck Dinner (But Not That Church Potluck)

Since the 1940s, field epidemiology students have studied a classic outbreak of gastrointestinal illness at a church potluck dinner in Oswego, New York (16). However, the case study presented here, used to illustrate study designs, is a different potluck dinner.

In April 2015, an astute neurologist in Lancaster, Ohio, contacted the local health department about a patient in the emergency department with a suspected case of botulism. Within 2 hours, four more patients arrived with similar symptoms, including blurred vision and shortness of breath. Health officials immediately recognized this as a botulism outbreak.

  • Question: What are some of the possible source populations (i.e., persons at risk) of this outbreak, given that botulinum toxin is usually spread through food but can be a bioterrorism agent? Think about how each of these possible scenarios for the toxin source might influence the population to study.
    • If the source is a widely distributed commercial product, then the population to study is persons across the United States and possibly abroad.
    • If the source is airborne, then the population to study is residents of a single city or area.
    • If the source is food from a restaurant, then the population to study is predominantly local residents and some travelers.
    • If the source is a meal at a workplace or social setting, then the population to study is meal attendees.
    • If the source is a meal at home, then the population to study is household members and any guests.

Descriptive epidemiology and questioning of the case-patients revealed that all had eaten at the same church potluck dinner and had no other common exposures, making the potluck the likely exposure site and attendees the likely source population. Thus, an analytic study would be targeted at potluck attendees, although investigators must remain alert to case-patients among nonattendees. As initial interviews were conducted, more cases of botulism were being diagnosed, quickly increasing to more than 25. The source of the outbreak needed to be identified rapidly to halt further exposure and illness.

  • Question: What information would you need to design a study?
    • List of foods served at the potluck.
    • Approximate number of attendees.
    • A case definition.
    • Information from 5–10 hypothesis-generating interviews with a few case-patients or their family members.
  • Question: What type of study might you conduct if an estimated 75 persons attended the potluck and nearly all guests were identifiable?
    • A cohort study would be a reasonable option because a defined group exists (i.e., a cohort) of exposed persons who could be interviewed in a reasonable amount of time. The study would be retrospective because the outcome (i.e., botulism) has already occurred, and investigators could assess exposures retrospectively (i.e., foods eaten at the potluck) by interviewing attendees.
    • In a cohort study, investigators can calculate the attack rate for botulism among potluck attendees who reported having eaten each food and for those who had not. For example, if 20 of the 30 attendees who had eaten a particular food (e.g., potato salad) had botulism, you would calculate the attack rate by dividing 20 (corresponding to cell a in Handout 7.1) by 30 (total exposed, or a + b), yielding approximately 67%. If 5 of the 45 attendees who had not eaten potato salad had botulism, the attack rate among the unexposed—5 / 45, corresponding to c/ (c + d)—would be approximately 11%. The risk ratio would be 6, which is calculated by dividing the attack rate among the exposed (67%) by the attack rate among the unexposed (11%).
  • Question: What type of study might you perform if more than 200 persons attended the potluck and investigators did not have a comprehensive attendee list?
    • A case–control study would be the most feasible option because the entire cohort could not be identified and because the large number of attendees could make interviewing them all difficult. Rather than interview all non-ill persons, a subset could be interviewed as control subjects.
    • The method of control subject selection should be considered carefully. If all attendees are not interviewed, determining the risk for botulism among the exposed and unexposed is impossible because investigators would not know the exposures for all non-ill attendees. Instead of risk, investigators calculate the odds of exposure, which can approximate risk. For example, if 20 (80%) of 25 case-patients had eaten potato salad, the odds of potato salad exposure among case-patients would be 20/ 5 = 4 (exposed/ unexposed, or a/ c in Handout 7.2). If 10 (20%) of 50 selected controls had eaten potato salad, the odds of exposure among control subjects would be 10/ 40 = 0.25 (or b/ d in Handout 7.2). Dividing the odds of exposure among the case-patients (a/ c) by the odds of exposure among control subjects (b / d) yields an odds ratio of 16 (4/ 0.25). The odds ratio is not a true measure of risk, but it can be used to implicate a food. An odds ratio can approximate a risk ratio when the outcome or disease is rare (e.g., roughly <5% of a population). In such cases, a/ b is similar to a/ (a + b). The odds ratio is typically higher than the risk ratio when >5% of exposed persons in the analysis have the illness.

In the actual outbreak, 29 (38%) of 77 potluck attendees had botulism. The investigators performed a cohort study, interviewing 75 of the 77 attendees about 52 foods served (17). The attack rate among persons who had eaten potato salad was significantly and substantially higher than the attack rate among those who had not, with a risk ratio of 14 (95% confidence interval 5–42). One of the potato salads served was made with incorrectly home-canned potatoes (a known source of botulinum toxin), and samples of discarded potato salad tested positive for botulinum toxin, supporting the findings of the analytic study. (Of note, persons often blame potato salad for causing illness when, in fact, it rarely is a source. This outbreak was a notable exception.)

In field epidemiology, the link between exposure and illness is often so strong that it is evident despite such inherent study limitations as small sample size and exposure misclassification. In this outbreak, a few of the patients with botulism reported not having eaten potato salad, and some of the attendees without botulism reported having eaten it. In epidemiologic studies, you rarely find 100% concordance between exposure and outcome for various reasons, including incomplete or erroneous recall because remembering everything eaten is difficult. Here, cross-contamination of potato salad with other foods might have helped explain cases among patients who had not eaten potato salad because only a small amount of botulinum toxin is needed to produce illness.

Handout 7.1

Two-by-Two Table to Calculate the Relative Risk, or Risk Ratio, in Cohort Studies

Two- by- two tables are covered in more detail in Chapter 8.

Cohort Study Approach
Ill Not Ill
Exposed a b
Unexposed c d
Risk Ratio = Incidence in exposed over Incidence in unexposed = a over a+b over c over c+d

Handout 7.2

Two-by-Two Table to Calculate the Odds Ratio in Case–Control Studies

A risk ratio cannot be calculated from a case–control study because true attack rates cannot be calculated.

Case-Control Study Approach
III (Cases) Not III (Controls)
Exposed a b
Unexposed c d
Odds ratio = Odds of exposure in cases over Odds of exposure in controls = a/c over b/d = ad over bc


Outbreaks with Universal Exposure

What kind of study would you design if your hypothesis-generating interviews lead you to believe that everyone, or nearly everyone, was exposed to the same suspected infection source? How would you test hypotheses if all barbecue attendees, ill and non-ill, had eaten the chicken or if all town residents had drunk municipal tap water, and no unexposed group exists for comparison? A few factors that might be of help are the exposure timing (e.g., a particularly undercooked batch of barbeque), the exposure place (e.g., a section of the water system more contaminated than others), and the exposure dose (e.g., number of chicken pieces eaten or glasses of water drunk). Including questions about the time, place, and frequency of highly suspected exposures in a questionnaire can improve the chances of detecting a difference (18).


Cohort, case–control, and case–case studies are the types of analytic studies that field epidemiologists use most often. They are best used as mechanisms for evaluating—quantifying and testing—hypotheses identified in earlier phases of the investigation. Cohort studies, which are oriented conceptually from exposure to disease, are appropriate in settings in which an entire population is well-defined and available for enrollment (e.g., guests at a wedding reception). Cohort studies are also appropriate when well-defined groups can be enrolled by exposure status (e.g., employees working in different parts of a manufacturing plant). Case–control studies, in contrast, are useful when the population is less clearly defined. Case–control studies, oriented from disease to exposure, identify persons with disease and a comparable group of persons without disease (controls). Then the exposure experiences of the two groups are compared. Case–case studies are similar to case–control studies, except that controls have an illness not linked to the outbreak. Case–control studies are probably the type most often appropriate for field investigations. Although conceptually straightforward, the design of an effective epidemiologic study requires many careful decisions. Taking the time needed to develop good hypotheses can result in a questionnaire that is useful for identifying risk factors. The choice of an appropriate comparison group, how many controls per case-patient to enroll, whether to match, and how best to avoid potential biases are all crucial decisions for a successful study.


This chapter relies heavily on the work of Richard C. Dicker, who authored this chapter in the previous edition.

  1. Gupta N, Hocevar SN, Moulton-Meissner HA, et al. Outbreak of Serratia marcescens bloodstream infections in patients receiving parenteral nutrition prepared by a compounding pharmacy. Clin Infect Dis. 2014;59:1–8.
  2. Angelo K, Conrad A, Saupe A, et al. Multistate outbreak of Listeria monocytogenes infections linked to whole apples used in commercially produced, prepackaged caramel apples: United States, 2014–2015. Epidemiol Infect. 2017;145:848–56.
  3. Neil KP, Biggerstaff G, MacDonald JK, et al. A novel vehicle for transmission of Escherichia coli O157: H7 to humans: multistate outbreak of E. coli O157: H7 infections associated with consumption of ready-to-bake commercial prepackaged cookie dough—United States, 2009. Clin Infect Dis. 2012;54:511–8.
  4. Vasquez AM, Lake J, Ngai S, et al. Notes from the field: fungal bloodstream infections associated with a compounded intravenous medication at an outpatient oncology clinic—New York City, 2016. MMWR. 2016;65:1274–5.
  5. Gottlieb SL, Newbern EC, Griffin PM, et al. Multistate outbreak of listeriosis linked to turkey deli meat and subsequent changes in US regulatory policy. Clin Infect Dis. 2006;42:29–36.
  6. Framingham Heart Study: A Project of the National Heart, Lung, and Blood Institute and Boston University. Framingham, MA: Framingham Heart Study; 2017. icon
  7. Jordan HT, Brackbill RM, Cone JE, et al. Mortality among survivors of the Sept 11, 2001, World Trade Center disaster: results from the World Trade Center Health Registry cohort. Lancet. 2011;378:879–87.
  8. McCarthy N, Giesecke J. Case– case comparisons to study causation of common infectious diseases. Int J Epidemiol. 1999;28:764–8.
  9. Rothman KJ, Greenland S. Modern epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.
  10. Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in case–control studies. I. Principles. Am J Epidemiol. 1992;135:1019–28.
  11. Chintapalli S, Goodman M, Allen M, et al. Assessment of a commercial searchable population directory as a means of selecting controls for case–control studies. Public Health Rep. 2009;124:378–83.
  12. Centers for Disease Control and Prevention. Epidemiologic case studies: typhoid in Tajikistan.
  13. Dunkle SE, Mba-Jonas A, Loharikar A, Fouche B, Peck M, Ayers T. Epidemic cholera in a crowded urban environment, Port-au-Prince, Haiti. Emerg Infect Dis. 2011;17:2143–6.
  14. Centers for Disease Control and Prevention. Foodborne Diseases Active Surveillance Network (FoodNet): population survey.
  15. Pearce N. Analysis of matched case–control studies. BMJ. 2016;352:1969.
  16. Centers for Disease Control and Prevention. Case studies in applied epidemiology: Oswego: an outbreak of gastrointestinal illness following a church supper.
  17. McCarty CL, Angelo K, Beer KD, et al. Notes from the field.: large outbreak of botulism associated with a church potluck meal—Ohio, 2015. MMWR. 2015;64:802–3.
  18. Tostmann A, Bousema JT, Oliver I. Investigation of outbreaks complicated by universal exposure. Emerg Infect Dis. 2012;18:1717–22.