Definition and Calculation of Cancer Prevalence
Prevalence is the number of people with a specific disease or condition in a given population at a specific time. This measure includes both newly diagnosed and pre-existing cases of the disease. It is different from incidence, because incidence measures only the number of newly diagnosed cases in a given population at a specific time.
There are different types of prevalence. For example—
- Annual prevalence is the number of people with the disease at any time during a year.
- Period prevalence is the number of people with the disease at any time during a specified number of years, such as the last 10 years.
- Limited-duration prevalenceexternal icon is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 17 years).
Cancer incidence data submitted to CDC’s National Program of Cancer Registries (NPCR) in the 2020 data submission period were used to create a data set in SEER*Stat for this analysis.2 The data set included data from 42 NPCR central cancer registries that met the United States Cancer Statistics (USCS) publication criteria for all years 2001 through 2017 and that conducted linkage with the National Death Index and/or active patient follow-up for all years 2001 through 2017. These registries include Alabama, Alaska, Arizona, Arkansas, California, Colorado, Delaware, Florida, Georgia, Idaho, Illinois, Kansas, Kentucky, Louisiana, Maine, Maryland, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Washington, West Virginia, Wisconsin, and Wyoming. These data cover 86% of the U.S. population.
Cases from these registries were included in the analysis if—
- The case was an invasive cancer diagnosed from 2001 through 2017.
- The age of the case was known and was 0 through 99 years.
- The sex of the case was known.
- The case was not identified solely on the basis of a death certificate or autopsy.
Because NPCR data are available from 2001, 17-year limited-duration prevalence estimates are included in addition to 5-year estimates.
Calculation of Limited-Duration Prevalence
Limited-duration prevalence is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 17 years).
In this report, the limited-duration prevalence was calculated using SEER*Stat software. It estimates, among the people diagnosed with cancer in the last 5 or 17 years, the proportion who were still alive on January 1, 2018.1,2 The date of start of follow-up (month, day, and year) was set to the date of diagnosis. The date of last follow-up (month, day, and year) was set either to the date of last contact (if the case was actively followed) or to the date of death if the case was matched to the state death files or to the National Death Index. Cases not linking to the state death files or to the National Death Index were presumed to be alive on the prevalence date.
For patients diagnosed with multiple tumors, prevalence calculations include the first tumor of each cancer type in the previous x years (where x = 5 or 17 in this report). For example, assume a woman was diagnosed first with thyroid cancer 9 years ago and then breast cancer 3 years ago. The thyroid cancer would contribute to the 17-year limited-duration prevalence estimates for all cancer sites and for thyroid cancer. The breast cancer would contribute to the 5-year limited-duration prevalence estimate for all cancer sites and for breast cancer, but not to the 17-year limited-duration prevalence estimate for breast cancer because it was not her first tumor in the previous 17 years; she is already counted in this estimate for thyroid cancer.
NPCR prevalence proportions were calculated for each combination of age, sex, and race group. For this section of the report, race was categorized as White, Black, and all other races. The all other races group includes Indian Health Service-linked American Indian, Alaska Native, and Asian/Pacific Islander cases. Cases with unknown race were combined with White race. Then, cancer prevalence counts at January 1, 2018, for the U.S. population were estimated by multiplying the age-, sex-, and race-specific NPCR prevalence proportions by the corresponding U.S. population estimates based on the average of the 2017 and 2018 population estimates from the U.S. Census Bureau.3 The sum of the counts by race was used to estimate the U.S. cancer prevalence counts for all races combined. Due to concerns related to the completeness and quality of Hispanic vital status information within the cancer registry database, prevalence information is not presented for this population.
- Surveillance Research Program, National Cancer Institute SEER*Stat software version 8.3.9.external icon
- Gail MH, Kessler L, Midthune D, Scoppa S. Two approaches for estimating disease prevalence from population-based registries of incidence and total mortality.external icon Biometrics 1999;55(4):1137-1144.
- National Program of Cancer Registries SEER*Stat Database: NPCR Prevalence Analytic file 2001–2017 (42 NPCR central cancer registries). United States Department of Health and Human Services, Centers for Disease Control and Prevention. Released June 2021, based on the 2020 submission.