Methods and Data Sources

Image of plumbing water valve

In 2020, Collier et al. estimated the overall burden of waterborne disease caused by 17 pathogens. This analysis also estimated the number of emergency department (ED) visits, hospitalizations, deaths, and what these cost our healthcare system.

In this analysis, CDC authors defined waterborne disease as one that was caused by a pathogen (bacteria, virus, or parasite) spread through water. The researchers chose diseases based on the following criteria from surveillance data, administrative data (for example, hospital records), or literature reports:

  • Likely to cause substantial illness or death
  • Caused by waterborne pathogens in the United States
  • Had available data to accurately determine related health outcomes

These estimates did not include exposures from toxins (such as harmful algal blooms) and chemicals (such as lead).

Estimating U.S. Illnesses for 17 Waterborne Pathogens

Data used in this study were from 2000 through 2015. Calculations were based on a U.S. population of 319 million in 2014, the most recent year for which data were available for all surveillance sources. CDC estimated the burden of disease using models that had data from various sources. The models showed the uncertainty associated with each estimate.

The data used were divided into three main categories:

  • Active surveillance data: Public health officials gather data from state and local health departments, laboratories, hospitals, and other settings.
  • Passive surveillance data: Public health officials rely on state and local health departments, laboratories, hospitals, and other healthcare sources to report data to surveillance systems.
  • Hospital administrative data: Data sources include the Health Care Utilization Project’s National Inpatient Sample (HCUP NIS) hospitalization database, the Health Care Utilization Project’s National Emergency Department Sample (HCUP NEDS) emergency department visit database, and, in the case of otitis externa, the National Ambulatory Medical Care Survey (NAMCS), which surveys visits to physicians’ offices. All of these administrative data sources use complex sample survey weighting methods and are considered nationally representative. These data were used for pathogens that we know to be waterborne and for which there is no national surveillance system.

Diseases

Diseases included in this analysis were

  • campylobacteriosis
  • cryptosporidiosis
  • giardiasis
  • Legionnaires’ disease
  • nontuberculous mycobacterial (NTM) infection
  • norovirus infection
  • acute otitis externa (“swimmer’s ear”)
  • Pseudomonas pneumonia and septicemia
  • Shiga-toxin producing Escherichia coli (STEC) infection O157
  • non-O157 STEC infection
  • salmonellosis
  • shigellosis
  • vibriosis (including infections caused by Vibrio alginolyticus, parahaemolyticus, vulnificus and other species)

Researchers looked at the impact of waterborne disease in respiratory diseases and in enteric (intestinal) diseases. Respiratory diseases were Legionnaires’ disease, NTM infection, and Pseudomonas pneumonia. Diseases with primarily enteric effects, such as diarrhea and vomiting, were campylobacteriosis, cryptosporidiosis, giardiasis, norovirus infection, salmonellosis, and shigellosis.

Methods

Illnesses

For each pathogen, CDC gathered data from surveillance systems or administrative data and corrected for underreporting and underdiagnosis. CDC multiplied the adjusted number by the proportion of illnesses acquired in the United States (that is, not during international travel) and the proportion transmitted by water to yield an estimated number of illnesses that are domestically acquired and waterborne. CDC then added the estimates for each of the pathogens to arrive at a total, and used an uncertainty model to generate a point estimate and 95% credible interval (upper and lower limits) (Figure 1).

Figure 1: Illness burden estimate

calculation for illnesses in surveillance system/administrative database times multiplier to correct for underreporting times disease-specific multiplier to correct of underdiagnosis times estimated proportion of domestically acquired illness times estimated proportion transmitted through water

*Probability distributions were used to model uncertainty in each of the data inputs. Point estimates were bounded by a 95% credible interval.

**To calculate the total estimated number of domestically acquired waterborne illnesses for the selected diseases and incorporate the uncertainty for all the individual estimates, a similar modeling process was used. Estimates for individual diseases are not additive because of the modeling process.

Emergency Department Visits, Hospitalizations, and Deaths

For each pathogen with surveillance data available, CDC multiplied the estimated number of reported illnesses (after correcting for underreporting) by the pathogen-specific hospitalization and death rate from surveillance or administrative data to estimate hospitalizations and deaths.

Most surveillance systems do not tally emergency department (ED) visits if the patient was not admitted to the hospital, so CDC estimated the number of ED visits using the ratio of ED visits to hospitalizations in HCUP data and the pathogen-specific hospitalization rate. Because some people with illnesses that were not laboratory-confirmed would also have had an ED visit, been hospitalized, or died, CDC doubled the estimates to correct for under-diagnosis.

CDC multiplied the adjusted ED visit, hospitalization and death estimates by the proportion of illnesses that were acquired within the United States (vs. international travel-related) and the proportion transmitted by water.

Finally, CDC used an uncertainty model to generate a point estimate and 95% credible intervals for both hospitalizations and deaths. (See Figures 2, 3, and 4.)

Figure 2: Emergency department (ED) visit burden estimate

Illnesses in surveillance system/administrative database times multiplier to correct for underreporting times hospitalization rate in surveillance/literature data times ratio of hospitalizations to ED visits/admin data times underdiagnosis for ED visits times estimated proportion domestically acquired times estimated proportion transmitted through water equals domestic waterborne ED visits.

*Probability distributions were used to model uncertainty in each of the data inputs. Point estimates were bounded by a 95% credible interval.

**To calculate the total estimated number of domestically acquired ED visits for the selected diseases and incorporate the uncertainty for all the individual estimates, a similar modeling process was used. Estimates for individual diseases are not additive because of the modeling process.

Figure 3: Hospitalization burden estimate

Illnesses in surveillance system/administrative database times multiplier to correct for underreporting times hospitalization rate in surveillance/literature data times underdiagnosis for hospitalization times estimated proportion domestically acquired times estimated proportion transmitted through water equals domestic waterborne hospitalization estimate

*Probability distributions were used to model uncertainty in each of the data inputs. Point estimates were bounded by a 95% credible interval.

**To calculate the total estimated number of hospitalizations for the selected diseases and incorporate the uncertainty for all the individual estimates, a similar modeling process was used. Estimates for individual diseases are not additive because of the modeling process.

Figure 4: Death burden estimate

Illnesses in surveillance system/administrative database times multiplier to correct for underreporting times death rate in surveillance/administrative data times underdiagnosis for deaths times estimated proportion domestically acquired times estimated proportion transmitted through water equals domestic waterborne deaths estimate.

*Probability distributions were used to model uncertainty in each of the data inputs. Point estimates were bounded by a 95% credible interval.

**To calculate the total estimated number of deaths for the selected diseases and incorporate the uncertainty for all the individual estimates, a similar modeling process was used. Estimates for individual diseases are not additive because of the modeling process.

Healthcare Cost

CDC estimated the total direct healthcare cost using the sum of insurer and out-of-pocket payments for ED visits and hospitalizations attributed to waterborne disease transmission in the United States. CDC calculated the cost per ED visit or hospitalization for different insurance sources (commercial insurance, Medicare and Medicaid) using the IBM® MarketScan® Research Databases. CDC created a weighted average cost by multiplying the cost per insurance source by the proportion of ED visits or hospitalizations with the corresponding insurance source in HCUP NEDS or NIS. To obtain the total direct healthcare cost, CDC multiplied the total number of ED visits and hospitalizations attributed to U.S. waterborne disease (calculated using surveillance and administrative data as described above) by the weighted average cost per ED visit or hospitalization (calculated using insurance data).

Figure 5: Total direct healthcare costs of ED visits and hospitalizations

ED visits and hospitalizations for each disease times proportion of people with each type of insurance times cost per ED visit or hospitalization by type of insurance equals total cost for each disease.

*The Healthcare Cost and Utilization Project’s National Inpatient Sample (HCUP NIS) and the Healthcare Cost and Utilization Project’s National Emergency Department Sample (HCUP NEDS)