Cautionary Notes for U.S. and Puerto Rico Data

U.S. Cancer Statistics Public Use Database

alert icon
Note

Before using the U.S. and Puerto Rico (2005–2017) data, analysts should read and understand the following information. If you have questions, please contact CDC at uscsdata@cdc.gov.

Case Inclusions and Exclusions

NPCR- and SEER-supported cancer registries report all incident cases coded as in situ (non malignant) and invasive (malignant; primary site only), and non-malignant (including borderline and benign) central nervous system tumors according to the International Classification of Diseases for Oncology, Third Edition (ICD-O-3), with the following exceptions—

  • In situ cancers of the cervix are not reported.
  • Basal and squamous cell carcinomas of the skin are not reported, except when these occur on the skin of the genital organs.
  • In situ cancers of the urinary bladder are re-coded as invasive behavior because the information needed to distinguish between in situ and invasive bladder cancers is not always available or reliable. Stage for these cases remains coded as in situ.1

Additionally, in these public use databases—

  • Cancer cases that were identified only through death certificate or autopsy reports have been excluded.
  • Cases with an unknown age or with sex other than male or female have been excluded from the database. The frequency counts will not change based on whether Known Age or Male or Female Sex is checked on the SEER*Stat Selection tab.
  • Malignant Behavior is a default selection for this database, as this restriction is used by CDC’s NPCR and NCI’s SEER Program for generating most official cancer statistics. Malignant behavior is defined by the variable Behavior Code ICD-O-3. This database includes in situ and nonmalignant central nervous system (CNS) cases. These nonmalignant cases can be analyzed by unselecting the Malignant Behavior check box on the SEER*Stat Selection tab.

Suppression Rules2 3

Suppressing Fewer Than 16 Cases

The suppression rule is fewer than 16 cases for the time period based on rate stability. This suppression rule is enforced automatically in these databases.

When the number of cases used to compute the incidence rates are small, those rates tend to have poor reliability. Therefore, to discourage misinterpretation and misuse of counts, rates, and trends that are unstable because of the small number of cases, these statistics are not shown in tables and figures if the counts are fewer than 16 for the time period. A count of fewer than about 16 in a numerator results in a standard error of the rate that is about 25% or more as large as the rate itself. Equivalently, a count of fewer than about 16 results in the width of the 95% confidence interval around the rate being at least as large as the rate itself. These relationships were derived under the assumption of a Poisson process and with the standard population age distribution close to the observed population age distribution.

Another important reason for employing a cell suppression threshold value is to protect the confidentiality of patients whose data are included in a report by reducing or eliminating the risk of identity disclosure. The cell suppression threshold value of 16 is recommended to protect patient confidentiality given the low level of geographic and clinical detail provided.

Complementary Cell Suppression

Complementary cell suppression is necessary to prevent users from subtracting to find suppressed counts. This practice should be employed when any suppression occurs in the data presentation. In addition, when information from other cells, tables, or figures can be used to determine a suppressed cell, at least one other cell must also be suppressed. When analyzing data at the state or regional levels, counts for national and regional data must be suppressed if a single state in a region or division is suppressed. Rates, confidence intervals (CIs), and populations can be shown at the national and regional levels. Rates, confidence intervals (CIs), and populations can be shown at the national and regional levels. This suppression should occur when a single or multiple years of data are being presented.

Case-Level Data

As a further mechanism to protect data confidentiality and due to data sharing agreements with some states, the case listing function in SEER*Stat has been disabled for these databases.

Benign Central Nervous System (CNS) Tumors

Cancer registries began collecting information on nonmalignant brain and other central nervous system tumors with cases diagnosed in 2004. Collection of these tumors is in accordance with Public Law 107-260, the Benign Brain Tumor Cancer Registries Amendment Act, which mandates that NPCR registries collect data on all brain and other central nervous system tumors with a behavior code of 0 (benign) or 1 (borderline), in addition to in situ and malignant tumors. Data for nonmalignant brain and other nervous system tumors were available from all registries contributing to this report.

Primary Site Variables48

Beginning in diagnosis year 2010, some of the lymphoma and leukemia ICD-O-3 codes were updated based on changes from the World Health Organization. The appropriate site recode variables to include these updates are Site recode ICD-O-3/WHO 2008 for all ages and International Classification of Childhood Cancer (ICCC) site recode ICD-O-3/WHO 2008 and ICCC site rec extended ICD-O-3/WHO 2008 for the childhood cancer recodes.

Consider reviewing the variable Site recode ICD-O-3/WHO 2008 before using the directly coded primary site. See more information on the SEER primary site recodesexternal icon.

Stage

A merged variable, Merged Summary Stage, has been created to span three time periods when two different staging schemes were used. The coding logic for this merged variable is—

  • For NPCR-registries—
    • If a case was diagnosed between 2005 and 2015, stage at diagnosis is recorded using the Derived SEER Summary Stage 2000 variable value.
    • If a case was diagnosed in 2016 or 2017, stage at diagnosis is recorded using the SEER Summary Stage 2000 variable value.
  • For SEER-only registries (Connecticut, Hawaii, Iowa, and New Mexico) —
    • If a case was diagnosed between 2005 and 2015, stage at diagnosis is recorded using the Derived SEER Summary Stage 2000 variable value.
    • If a case was diagnosed in 2016 and 2017, the best available data from either Derived SEER Summary Stage 2000 or SEER Summary Stage 2000 is used.

Reporting Delay9

NPCR and SEER registries annually submit all eligible years of data to CDC and NCI, respectively. As a result, cases submitted in previous years may be deleted, and new cases diagnosed in previous years may be added. The addition of new cases is called a reporting delay. This reporting delay may cause an appearance of decreasing trends. For example, reporting of melanoma cases diagnosed in an outpatient facility may be delayed. As a result, the trend in incident melanoma cases might superficially appear to have dropped in the most recent year.

References

1Young JL Jr, Roffers SD, Ries LAG, Fritz AG, Hurlbut AA (eds). SEER Summary Staging Manual – 2000: Codes and Coding Instructions. National Cancer Institute, NIH Pub. No. 01-4969, Bethesda, MD, 2001.

2Federal Committee on Statistical Methodology. Report on Statistical Disclosure Limitations Methodology (Statistical Working Paper 22).pdf iconexternal icon [PDF-745KB] Washington, DC: Office of Management and Budget; 2005.

3Doyle P, Lane JI, Theeuwes JM, Zayatz LM. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam: Elsevier Science; 2001.

4Fritz A, Percy C, Jack A, Shanmugaratnam K, Sobin L, Parkin D, et al., editors. International Classification of Diseases for Oncology, Third Edition.external icon Geneva: World Health Organization; 2000.

5International Classification of Diseases for Oncology, Third Edition, First Revision.external icon Geneva: World Health Organization, 2013.

6Ruhl J, Adamo M, Dickie L. (January 2015). Hematopoietic and Lymphoid Neoplasm Coding Manual.pdf iconexternal icon [PDF-806KB] National Cancer Institute, Bethesda, MD.

7Surveillance, Epidemiology, and End Results Program. 2007 Multiple Primary and Histology Coding Rules.external icon Bethesda, MD: US Department of Health and Human Services, National Cancer Institute; Revised August 24, 2012; Accessed January 25, 2017.

8Surveillance, Epidemiology, and End Results Program. Hematopoietic and Lymphoid Neoplasm Database.external icon Bethesda, MD: US Department of Health and Human Services, National Cancer Institute; 2016.

9Clegg LX, Feuer EJ, Midthune DN, Fay MP, Hankey BF. Impact of reporting delay and reporting error on cancer incidence rates and trends.external icon Journal of the National Cancer Institute 2002;94(20):1537–1545.