Definition and Calculation of Cancer Prevalence
Prevalence is the number of people with a specific disease or condition in a given population at a specific time. This measure includes both newly diagnosed and pre-existing cases of the disease. It is different from incidence, because incidence measures only the number of newly diagnosed cases in a given population at a specific time.
There are different types of prevalence. For example—
- Annual prevalence is the number of people with the disease at any time during a year.
- Period prevalence is the number of people with the disease at any time during a specified number of years, such as the last 10 years.
- Limited-duration prevalence is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 19 years).
Cancer incidence data submitted to CDC’s National Program of Cancer Registries (NPCR) in the 2022 data submission period were used to create a data set in SEER*Stat for this analysis.2 The data set included data from 39 NPCR central cancer registries that met the United States Cancer Statistics (USCS) publication criteria for all years 2001 through 2019 and that conducted linkage with the National Death Index and/or active patient follow-up for all years 2001 through 2019. These registries include Alabama, Alaska, Arizona, Arkansas, California, Colorado, Delaware, Florida, Georgia, Idaho, Illinois, Kansas, Kentucky, Louisiana, Maine, Maryland, Minnesota, Mississippi, Missouri, Montana, Nebraska, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Utah, Vermont, West Virginia, Wisconsin, and Wyoming. These data cover 83% of the U.S. population.
Cases from these registries were included in the analysis if—
- The case was an invasive cancer diagnosed from 2001 through 2019.
- The age of the case was known and was 0 through 99 years.
- The sex of the case was known.
- The case was not identified solely on the basis of a death certificate or autopsy.
Because NPCR data are available from 2001, 19-year limited-duration prevalence estimates are included in addition to 5-year estimates.
Calculation of Limited-Duration Prevalence
Limited-duration prevalence is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 19 years).
In this report, the limited-duration prevalence was calculated using SEER*Stat software. It estimates, among the people diagnosed with cancer in the last 5 or 19 years, the proportion who were still alive on January 1, 2020.1 2 The date of start of follow-up (month, day, and year) was set to the date of diagnosis. The date of last follow-up (month, day, and year) was set either to the date of last contact (if the case was actively followed) or to the date of death if the case was matched to the state death files or to the National Death Index. Cases not linking to the state death files or to the National Death Index were presumed to be alive on the prevalence date.
For patients diagnosed with multiple tumors, prevalence calculations include the first tumor of each cancer type in the previous x years (where x = 5 or 19 in this report). For example, assume a woman was diagnosed first with thyroid cancer 9 years ago and then breast cancer 3 years ago. The thyroid cancer would contribute to the 19-year limited-duration prevalence estimates for all cancer sites and for thyroid cancer. The breast cancer would contribute to the 5-year limited-duration prevalence estimate for all cancer sites and both the 5-year and 19-year estimates for breast cancer, but not to the 19-year limited-duration prevalence estimate for all cancer sites because it was not her first tumor in the previous 19 years as the woman is already counted in this estimate for thyroid cancer.
NPCR prevalence proportions were calculated for each combination of age, sex, and race and ethnicity group. For this section of the report, race and ethnicity were categorized as non-Hispanic White, non-Hispanic Black, non-Hispanic Indian Health Service-linked American Indian and Alaska Native, non-Hispanic Asian and Pacific Islander, and Hispanic. Cases with unknown race were combined with White race. Then, cancer prevalence counts at January 1, 2020, for the U.S. population were estimated by multiplying the age-, sex-, and race and ethnicity-specific NPCR prevalence proportions by the corresponding U.S. population estimates based on the average of the 2019 and 2020 population estimates from the U.S. Census Bureau.3 The sum of the counts by race and ethnicity was used to estimate the U.S. cancer prevalence counts for all races combined. Cancer prevalence counts and percentages for each of the 39 states by sex, race and ethnicity were estimated directly in SEER*Stat.
Prevalence percentage is the percentage of the population alive with cancer. The U.S. prevalence percentage estimates are based on the states included in the analysis.
- Surveillance Research Program, National Cancer Institute SEER*Stat software version 8.4.
- Gail MH, Kessler L, Midthune D, Scoppa S. Two approaches for estimating disease prevalence from population-based registries of incidence and total mortality. Biometrics 1999;55(4):1137–1144.
- National Program of Cancer Registries SEER*Stat Database: NPCR Prevalence Analytic file 2001–2019 (39 NPCR central cancer registries). United States Department of Health and Human Services, Centers for Disease Control and Prevention. Released June 2023, based on the 2022 submission.