Skip directly to search Skip directly to A to Z list Skip directly to site content
CDC Home

APPENDIX A: Data and Other Resources

Since the 1990 guidelines for investigating clusters of health events (1) were published, a substantial increase has occurred in the number of sources of available data that can help public health agencies respond to cancer cluster inquiries and conduct cancer cluster investigations. These sources include data on cancer diagnoses, demographics, and environmental quality.

Cancer Registries

The state cancer registry is a vital data source for suspected cancer cluster investigations. The state central cancer registry, which receives reports of all new cancer cases from clinical facilities in the state, will have numerator data (i.e., the number of new cancer cases) for calculating the SIR as well as data for the appropriate comparison measures for reference populations. In 1990, many states did not have a cancer registry, and the majority of states with registries lacked resources to gather complete data. Today, every state has a statewide central cancer registry for collecting, managing, and analyzing high-quality data on incident (i.e., newly diagnosed) cases of cancer and cancer mortality among residents.

Two federal programs support central cancer registries which compile data on cancer incidence: CDC's National Program of Cancer Registries (NPCR) supports central registries covering 96% of the U.S. population with registries in 45 states, the District of Columbia, Puerto Rico, and the Pacific Islands (2) and NCI's Surveillance, Epidemiology and End Results (SEER) Program includes five state registries and a number of regional and special population registries (3). Together, these programs collect data for the entire US population (3). Uniform national data standards for all registries are developed and promoted by the North American Association of Central Cancer Registries (NAACCR) (4).

The state and national registries have data on cancer type (e.g., organ site, histology, and many other fields) as well as detailed demographic information on the individuals with cancer. Although state registries most often group cancer statistics by county, many registries are also able to characterize data on individual cases by geographic location (geocoding). Age, sex, and race/ethnicity geocoded information permits researchers to calculate the SIR at various geographic levels. The majority of states have internet sites for cancer statistics; SEER (available at and NPCR (available at also present cancer statistics.

Completeness of the NPCR and SEER registries varies by state, although in general they have a high level of completeness and accuracy. NAACCR certifies registries annually based on completeness overall (95% and 90% for Gold and Silver, respectively) and for specific data items such as race, age, and gender (5). There might be a delay (≥1 year) between cancer diagnosis and the availability of complete data in the cancer registry. Preliminary data might be available for more recent years; however, these data might not contain all cancer cases from these years. The state registry will have information on which years have complete information.

Limitations and cautions to the use and interpretation of data from cancer registries include the following:

  • Registry information generally contains patient address at date of diagnosis only.
  • The majority of registries do not collect information on possible risk factors (e.g., smoking history). Cancer registries do have fields for usual occupation and industry, but the data are often incomplete.
  • The types of cancer that are most likely to be underreported occur in persons with late-stage cancers that are treated with palliative care (e.g., persons who might not be hospitalized for surgery or treatment). Other likely underreported types include those who have been diagnosed in a physician's office without hospitalization (e.g., early stage melanoma). Many hospitals routinely collect cancer data for their own purposes and for most hospitals reporting to central registries is routine. However, reporting from nonhospital facilities is less reliable. Consequently data for cancer patients who are never hospitalized for diagnosis and treatment tend to be less complete and might be reported later than other cases (6).
  • Codes and rules for counting cancer cases do change. Some histology classifications change from benign to malignant and vice-versa, depending on the coding edition. Ovarian cancers and hematopoietic cancers are prominent examples. These are for the most part exceptions, and they will be known by the cancer registry personnel.
  • Occasionally, changes in diagnostic criteria might change how a cancer is diagnosed, possibly creating changes in the frequency in which the cancer is detected and reported. These types of changes are adopted at different rates by physicians and hence in reports to the registries.
  • Data on race and ethnicity are captured in registry data; however this data is collected inconsistently with some providers relying on a patient's self-report and others assessing race based on observation.
  • Many registries are aware of "quirks" or "anomalies" in possible mismatching of numerator and denominator data of their regions as a result of rapidly growing or shrinking areas or large population centers that straddle county or other borders.

These limitations notwithstanding, the existence of population-based cancer registries has greatly reduced the resource intensity of determining how many and of what type of cancers have occurred in a given area in a state. These registries thus present efficient opportunities for answering questions that the public has about cancer concerns, including suspected cancer clusters.

State Cancer Profiles

Cancer incidence and mortality data, compiled largely from registry data, are also available on State Cancer Profiles (available at, a collaborative effort between CDC and NCI (7). The data on this site include state- and county-level cancer incidence and death rates. Statistical assessments are provided for upward and downward trends in rates by county and comparisons to state rates. A mapping capability also is provided; however, the maps do not reflect statistical differences in cancer incidence or death rates. Although the target audiences for this information are health planners, policy makers, and cancer information providers engaged in cancer control planning, the media as well as members of the public also use this site. In addition to data on cancer incidence and mortality, the site provides risk behavior data based on CDC's Behavioral Risk Factor Surveillance System (8).

Data on Deaths

In addition to incidence data from cancer registries, data on deaths compiled by state vital records offices might be a useful supplement in identifying data on cancer cases. Death records are most useful for cancer with high mortality and a short survival period such as pancreatic, liver, lung, and some types of brain cancer. However, death records are not very useful for cancers with lower mortality, such as breast, thyroid, prostate, or colon cancers, from which patients are likely to survive. Death records increasingly are submitted to state health agencies online, and they are often available within weeks or even days after death. When survival is likely to be short (within 2 years), death records can help to fill in gaps in the cancer registry case count, since registries might have a 1–2 year lag in ascertaining complete records.

Limitations and cautions in the use of death records in cancer cluster investigations include the following:

  • Death records might be limited by the requirement that the residence of the deceased is recorded as the address at the time of death; this address might or might not be the place where the individual resided at the time of the cancer diagnosis.
  • Death records are not necessarily completed by the physician who best knew the patient's medical history, meaning that the given cause(s) of death might not always be accurate.

U.S. Census Bureau

The U.S. Census Bureau's American FactFinder (available at can provide valuable data for use in determining the denominator for incidence calculation (9). State, county, census tract, and census block level data are available. Census data include total population figures, along with socioeconomic status, race/ethnicity, age, sex, and many other useful characteristics of a population.

Limitations and cautions about the use of census data include the following:

  • Census numbers might be inaccurate for intercensal years when substantial population changes (rapid growth, shrinkage, or aging changes) occur.
  • Census boundaries occasionally change, most often in rapidly growing areas that are often subdivided, making comparison between years or combining data from different years difficult. American FactFinder allows a user to see the changes between census years (e.g., between 2000 and 2010).
  • The census tract is defined by the U.S. Census Bureau, and it is a relatively homogeneous unit with respect to population characteristics. A census tract generally contains between 1,000 and 8,000 persons, with an optimum size of 4,000 persons (10). Cancer clusters of concern frequently are confined to areas smaller than a census tract. Because census tracts are subdivided into census blocks and block groups, blocks and block groups might be combined if a census tract does not give the needed geographic boundaries. The number of cases occurring within a block or a block group might be far too small to allow reporting of cancer cases without privacy concerns or creating statistically unstable rates. Registries often will not release data at the block group level or even the census tract level because of privacy concerns.
  • Census units might not be similar to contamination boundaries.
  • The state demographer is the best resource for information regarding changes in population size.

Zip codes can be and often are used as geographic areas for cluster investigations, especially if they are a better fit for communities at issue. There are two major limitations to using zip codes for cancer cluster investigations: 1) zip code boundaries might change more often than census boundaries, and 2) zip codes cross county and census boundaries. Moreover, a person might have a post office box or a rural route address that is in a different zip code than the actual residence. Real estate sites, such as (available at, often can be useful for researching population changes and demographic information.

National Environmental Public Health Tracking Network

One resource that was not available during the development of the 1990 Guidelines is CDC's National Environmental Public Health Tracking Network (Tracking Network), a nationwide surveillance network that provides health, environmental hazard, and exposure data and information to better inform and protect communities (11). The Tracking Network ( is a web-based system of integrated data and information derived from a variety of sources, including federal, state, and local agencies and registries.

Along with other selected health outcomes, the Tracking Network offers data and health messaging on several categories of cancers, including leukemia (by subtype), pediatric cancers, brain cancer, and other cancer types. The website will include additional types of cancers in the future. The cancer data are derived from a compilation of registry data, including NPCR and NCI's SEER programs. Cancer health outcomes data available for many states can be viewed in map, table, or graph format. Annual age-adjusted rates and annual number of cases are available for each selected cancer category for each state, and 5-year average annual rates are available by county. Other information, including demographic and socioeconomic characteristics, health behaviors, and biomonitoring data are also available. Because of a limited or low number of case counts and data confidentiality and human protection laws, health data are protected from being viewed on the Tracking Network at a higher geographic resolution, such as by census tract. In some cases, a request for individual or identifiable data might be granted by state cancer registries directly.

Environmental data primarily derived from federal, state, and local regulatory environmental protection departments (or agencies) are available on the Tracking Network. However, state and local jurisdictions might provide more detailed environmental data, along with staff members who are knowledgeable about issues surrounding a particular situation.

Data from State and Territorial Environmental Agencies

State and local environmental protection agencies routinely collect environmental data. Because these data are collected in places and at times according to regulatory purposes, they might be useful in identifying environmental hazards in cancer cluster investigations, or they might only approximate the environmental conditions at the site of the potential cancer cluster. Environmental agencies regularly collect data on water quality and air quality for compliance with air and water quality standards. These agencies also often permit and regulate industrial or other facilities that generate, transport, or store hazardous waste or other chemicals. The agencies will therefore have records of compliance and noncompliance that might indicate emissions into the environment. The state agencies are also involved, along with the Environmental Protection Agency (EPA), in monitoring pollution and in the oversight of the cleanup of contaminated sites. Although some states conduct surveillance on pesticide-related illness and injury, not all states regularly collect and maintain data on pesticide use or exposure; if collected, the data are usually kept at the state department of agriculture and sometimes by the state environmental protection agencies.

EPA collects environmental data for regulatory purposes, and the agency publishes the data on its website. A viewer can use tools on the EPA website to view information on air quality or water quality or to see if there are local Superfund sites, brownfields (12), or releases from manufacturing facilities (14). The information is available at the zip code level and can be displayed on a map.

The staff located within state or local environmental protection departments can be a helpful resource for providing information about local environmental conditions that might lead to exposure to contamination. The staff's assistance should be engaged in evaluating available environmental data for relevance to a cancer cluster inquiry or investigation because the data collection areas are determined by regulatory requirements and might not provide information specific to a particular site of public health interest. EPA's list of State and Territorial Environmental Agencies is available at

Sources of information on the association between specific environmental contaminants and cancer are available. Weight-of evidence-evaluations of carcinogens are published by the International Agency for Research on Cancer (IARC) (IARC cancer classifications are available at and the National Toxicology Program (NTP's Report on Carcinogens is available at These evaluations tend to focus on exposures that have been of concern for some time and therefore on which there are substantial data. Not all potential carcinogens have been evaluated by these organizations. Other sources of information include PubMed (available at, the ATSDR Toxic Substance Portal (available at, and the ATSDR series of Toxicological Profiles on various chemicals (available at

By using the community members' local knowledge about the hazards and risk factors in their community as well as data from environmental and other databases, the investigator can make more informed decisions during the investigation process. For example, information provided by the concerned community members and by available databases can be useful in defining the geographic area and time period for the population at risk, increasing the accuracy and precision of the population definition. Readily available information on environmental hazards in the area of interest can be reviewed to determine if any of the hazards have a space and/or time pattern that can be related to the suspected cancer cluster. A thorough evaluation of environmental hazards with input from the community is appropriate because it might suggest some relevant public health interventions that turn out to be valuable, independent of any suspected cancer cluster. For example, in a community concerned about contaminants in private well systems, proper maintenance of private well systems might be an appropriate public health education program, regardless of whether contaminants are found, particularly if residents express confusion over how to maintain these wells.


Biomonitoring is the measurement, usually in blood or urine, of chemical compounds, elements, or their metabolites in the body. Although biomonitoring indicates exposure to a substance at some level, it might not indicate when the exposure occurred or what effects the exposure might have on health in the future. Because of the long latency period associated with the development of cancer, the limitations of current environmental data also apply to using or collecting current biomonitoring data. The relevant exposure might have occurred years before and might not be detectable at the time that samples for biomonitoring are collected. Although a substance is detected in the body, it might not be a carcinogen or it might not be at levels known to cause the disease. For the U.S., CDC's National Health and Nutrition Examination Survey (NHANES) provides reference data for over 200 chemicals in the blood and urine for a selection of the survey's participants (14). Biomonitoring is a relatively new field, and there is a need for more research to permit an understanding of which substances at what concentrations in the body contribute to cancer.


  1. CDC. Guidelines for investigating clusters of health events. MMWR 1990;39(No. RR-11).
  2. CDC. National Program of Cancer Registries. Available at
  3. National Cancer Institute. Surveillance, Epidemiology and End Results (SEER) Program. Available at
  4. North American Association of Central Cancer Registries. NAACCR Standards for Cancer Registries. Available at
  5. North American Association of Central Cancer Registries. Certification levels. Available at
  6. Penberthy L, McClish D, Peace S, Gray L, Martin J, Overton S, Radhakrishnan S, Gillam C, Ginder G.Hematologic malignancies: an opportunity to fill a gap in cancer surveillance. Cancer Causes Control 2012;23:1253–64.
  7. National Cancer Institute. State cancer profiles, 2011. Available at
  8. CDC. Behavioral Risk Factor Surveillance System survey data. Atlanta, GA: US Department of Health and Human Services, CDC; 2013. Available at
  9. US Census Bureau. American fact finder. Available at
  10. US Census Bureau. Census bureau glossary. Available at
  11. Tango T. Statistical methods for disease clustering. New York, NY: Springer; 2010.
  12. Environmental Protection Agency. My environment query. Available at
  13. Environmental Protection Agency. Toxic release inventory. Available at
  14. CDC. Fourth national report on human exposure to environmental chemicals, updated tables, 2012. Available at

Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of Health and Human Services.

References to non-CDC sites on the Internet are provided as a service to MMWR readers and do not constitute or imply endorsement of these organizations or their programs by CDC or the U.S. Department of Health and Human Services. CDC is not responsible for the content of pages found at these sites. URL addresses listed in MMWR were current as of the date of publication.

All MMWR HTML versions of articles are electronic conversions from typeset documents. This conversion might result in character translation or format errors in the HTML version. Users are referred to the electronic PDF version ( and/or the original MMWR paper copy for printable versions of official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices.

**Questions or messages regarding errors in formatting should be addressed to The U.S. Government's Official Web PortalDepartment of Health and Human Services
Centers for Disease Control and Prevention   1600 Clifton Rd. Atlanta, GA 30333, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO
A-Z Index
  1. A
  2. B
  3. C
  4. D
  5. E
  6. F
  7. G
  8. H
  9. I
  10. J
  11. K
  12. L
  13. M
  14. N
  15. O
  16. P
  17. Q
  18. R
  19. S
  20. T
  21. U
  22. V
  23. W
  24. X
  25. Y
  26. Z
  27. #