Definition and Calculation of Cancer Prevalence
Prevalence is the number of people with a specific disease or condition in a given population at a specific time. This measure includes both newly diagnosed and pre-existing cases of the disease. It is different from incidence, because incidence measures only the number of newly diagnosed cases in a given population at a specific time.
There are different types of prevalence. For example—
- Annual prevalence is the number of people with the disease at any time during a year.
- Period prevalence is the number of people with the disease at any time during a specified number of years, such as the last 10 years.
- Limited-duration prevalenceexternal icon is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 15 years).
Cancer incidence data submitted to National Program of Cancer Registries (NPCR) in the 2019 data submission period were used to create a data set in SEER*Stat for this analysis.2 The data set included data from 44 NPCR central cancer registries that met the United States Cancer Statistics (USCS) publication criteria for all years 2001 through 2016 and that conducted linkage with the National Death Index and/or active patient follow-up for all years 2001 through 2016. These registries include Alabama, Alaska, Arizona, Arkansas, California, Colorado, Delaware, District of Columbia, Florida, Georgia, Idaho, Illinois, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming. These data cover 94% of the U.S. population.
Cases from these registries were included in the analysis if—
- The case was an invasive cancer diagnosed from 2001 through 2016. Cases diagnosed in 2017 do not have adequate follow-up time to be included in the analysis.
- The age of the case was known and was 0 through 99 years.
- The sex of the case was known.
- The case was not identified solely on the basis of a death certificate or autopsy.
Because NPCR data are available from 2001, 16-year limited-duration prevalence estimates are included in addition to 5-year estimates.
Calculation of Limited-Duration Prevalence
Limited-duration prevalence is the number of people alive on a certain day who were diagnosed with the disease during a specified number of years (such as the last 5 or 16 years).
In this report, the limited-duration prevalence was calculated using SEER*Stat software. It estimates, among the people diagnosed with cancer in the last 5 or 16 years, the proportion who were still alive on January 1, 2017.1,2 The date of start of follow-up (month, day, and year) was set to the date of diagnosis. The date of last follow-up (month, day, and year) was set either to the date of last contact (if the case was actively followed) or to the date of death if the case was matched to the state death files or to the National Death Index. Cases not linking to the state death files or to the National Death Index were presumed to be alive on the prevalence date.
For patients diagnosed with multiple tumors, prevalence calculations include the first tumor of each cancer type in the previous x years (where x = 5 or 16 in this report). For example, assume a woman was diagnosed first with thyroid cancer 9 years ago and then breast cancer 3 years ago. The thyroid cancer would contribute to the 16-year limited-duration prevalence estimates for all cancer sites and for thyroid cancer. The breast cancer would contribute to the 5-year limited-duration prevalence estimate for all cancer sites and for breast cancer, but not to the 16-year limited-duration prevalence estimate for breast cancer because it was not her first tumor in the previous 16 years; she is already counted in this estimate for thyroid cancer.
NPCR prevalence proportions were calculated for each combination of age, sex, and race group. For this report, race was categorized as white, black, and other races. The other races group contains Indian Health Service-linked American Indian and Alaska Native cases and Asian/Pacific Islander cases. Cases with unknown race were combined with white race. Then, cancer prevalence counts at January 1, 2017, for the U.S. population were estimated by multiplying the age-, sex-, and race-specific NPCR prevalence proportions by the corresponding U.S. population estimates based on the average of the 2016 and 2017 population estimates from the U.S. Census Bureau.3 U.S. cancer prevalence counts for all races combined were estimated by summing the counts for whites/unknown, blacks, and other races.
- Surveillance Research Program, National Cancer Institute SEER*Stat software version 8.3.6.external icon
- Gail MH, Kessler L, Midthune D, Scoppa S. Two approaches for estimating disease prevalence from population-based registries of incidence and total mortality. Biometrics 1999;55(4):1137-1144.
- National Program of Cancer Registries SEER*Stat Database: NPCR Prevalence Analytic file 2001–2016 (45 NPCR central cancer registries). United States Department of Health and Human Services, Centers for Disease Control and Prevention. Released June 2020, based on the November 2019 submission.