Skip Navigation LinksSkip Navigation Links
Centers for Disease Control and Prevention
Safer Healthier People
Blue White
Blue White
bottom curve
CDC Home Search Health Topics A-Z spacer spacer
Blue curve MMWR spacer

HIV Prevalence Estimates and AIDS Case Projections for the United States : Report Based upon a Workshop -- Appendix A

Appendix A HIV-Prevalence Estimates and AIDS Case Projections: Statistical Methods

Back-calculation and extrapolation, the statistical methods used to estimate HIV prevalence and to predict future AIDS cases, are based on surveillance data on AIDS. Preliminary analyses of the surveillance data must be carried out before using these methods for making estimates and projections. This appendix summarizes these analyses and explains the methods then used to derive estimates of HIV prevalence and AIDS case projections.

Section 1 below describes the methods by which AIDS surveillance data were adjusted for reporting delays, the methods used to account for the effect of the 1987 change in the AIDS surveillance definition, and the methods used to examine time trends in adjusted data on AIDS incidence. Sections 2 and 3 summarize the statistical methods used to estimate cumulative HIV incidence from back-calculation and to predict future AIDS cases, respectively. Section 4 describes the methods by which HIV-prevalence estimates were derived from estimates of cumulative HIV incidence. Section 5 contains estimates of length of period of survival after an AIDS diagnosis. These estimates were used to predict future deaths among persons diagnosed as having AIDS, numbers of persons alive with AIDS, and deaths associated with HIV infection.

  1. Analysis of AIDS surveillance data Adjusting for reporting delays.

AIDS cases reported through September 1989 were adjusted for reporting delays estimated from a maximum likelihood statistical procedure (A1). This procedure is based on the assumption that reporting delays are independent of time.

Nationally the proportion of cases reported less than 3 months after diagnosis changes over time, but the proportions reported with delays of greater than 1 quarter are nearly constant over calendar time. Because of this pattern, only AIDS cases diagnosed through June 1989 were used. Back-calculation analyses were based on quarterly incidence, with delays estimated for each risk group separately. The CDC extrapolation analyses were based on monthly incidence, with delays estimated for all risk groups combined. (Separating the risk groups changed the figures only slightly.) Analyses of AIDS cases in individual cities, in risk-behavior groups, and in demographic groups were based on reported cases adjusted for reporting delays in the corresponding groups.

Accounting for the revision in the AIDS case definition.

Data on AIDS incidence were analyzed both for all cases and separately for cases with diagnoses consistent with the pre-1987 case definition. Consistent cases are those with a diagnosis (definitive or presumptive) of a disease that fit the pre-1987 definition. Cases not consistent with the pre-1987 definition are mainly those diagnosed on the basis of wasting syndrome, HIV-associated encephalopathy, or disseminated tuberculosis.

Examining time trends in AIDS incidence.

To choose the time period to model and the type of model to fit, it is important to identify changes in trends of AIDS incidence, especially in projecting AIDS cases by extrapolation. Because variations in the numbers of cases diagnosed from period to period can obscure changing trends, adjusted data on incidence were plotted (Figures 1-3) with smoothed curves obtained from the lowess procedure (A2). This procedure requires no assumption about the overall trend in the data. A fitted value is computed for each month by weighted-least-squares linear regression by using the adjusted number of cases diagnosed during an interval around the month (here, the 25% of months closest to the chosen month). Adjusted data on incidence, not the smoothed data, were used in back-calculation and extrapolation analyses.

2. Statistical methods used to estimate cumulative HIV infections from back-calculation

Back-calculation can be used to estimate cumulative HIV incidence (i.e., the total number of past HIV infections needed to account for the observed number of AIDS cases). The resulting estimate of cumulative HIV incidence can then be used to estimate HIV prevalence.

Five analysts used back-calculation to estimate cumulative HIV infections (Appendix B, Table B1). All analyses summarized in Table B1, with the exception of Brookmeyer's, used a standard incubation period distribution (median, approximately 10 years) that did not change with time (A3). Brookmeyer, Harris, and Rosenberg used adjusted AIDS incidence through either mid-1987 or mid-1989 and a maximum likelihood procedure to estimate the total number infected and the parameters in a probability distribution, with an assumed form for HIV incidence (A4,A5). Hyman fit adjusted AIDS incidence through mid-1989 extended by his extrapolation projections through 1993. He used numerical quadrature methods to estimate the parameters in the distribution of the dates of HIV infection among persons who are now infected (A6). Each of these analyses was based on CDC estimates of reporting delays for AIDS cases.

Hay fit his estimates of adjusted AIDS incidence among adults and adolescents through 1989 by using a least squares method, subject to the constraint that HIV incidence in each year must be nonnegative (A7,A8). He excluded pediatric cases because the incubation time may be shorter for children (A9). Hay's estimates of adjusted incidence are less than the corresponding CDC estimates. Using the CDC estimates increased the estimates of cumulative HIV infections 14%-25% (A8) and therefore might be expected to increase his AIDS case projections by approximately 20%.

To account for the effect of recent changes in the distribution of incubation periods on back-calculation analyses, Brookmeyer used a method that he developed with Liao (A10). They extended the back-calculation procedure to permit the distribution of incubation periods to change for a specified proportion of those infected, with the change starting at a specified calendar time. They also extended the procedure to include a model for a two-stage incubation period; stage one was HIV seroconversion to CD4+ cell depletion, and stage two was CD4+ cell depletion to diagnosed disease meeting the AIDS case definition. Brookmeyer applied this method to AIDS cases diagnosed through June 1989, assuming a change for the incubation distribution starting in July 1987. On the basis of data from clinical trials, he assumed that therapy reduces the risk of progression within each stage by 65%. During the second stage, in a clinical trial setting, the risk of progression to AIDS among treated patients with CD4+ cell counts of less than 500/mm3 has been estimated to be approximately one-third the risk for untreated patients (A11); this estimate is consistent with data presented at the workshop. Brookmeyer obtained good fits to recent data on AIDS incidence under the assumption that 10% of those in stage one and 50% of those in stage two began receiving treatment in July 1987.

3. Statistical methods used to make AIDS case projections

AIDS case projections were derived both from extrapolation from recent data on AIDS incidence and from back-calculation. Projections based on consistent cases (see Section 1) were increased by 18% because analyses of surveillance data show that approximately 15% of cases diagnosed during the last year are not consistent with the pre-1987 definition (1/0.85 = 1.18). Projections based on reported AIDS cases were also increased by 18%, corresponding to CDC's estimate that nationally approximately 85% of all diagnosed cases are ultimately reported to the surveillance system.

Projections derived from extrapolation.

Two extrapolation models were fit to adjusted AIDS incidence. Hyman fit a quadratic spline function of time, with a knot during 1987. The model was nearly linear after 1987. T. Green (CDC) fit a function of time to adjusted monthly incidence for July 1987 through June 1989 using weighted least squares (A1). The weights incorporated uncertainties in the estimates of reporting delays; the variability in the number of cases reported was assumed to be proportional to this number of cases. A linear function of time gave an adequate fit to the data, as suggested by the smoothed trend in Figure 1. Unweighted least squares gave results nearly identical to those from the weighted analyses. The predictions from the consistent case series were nearly the same as the projections from all cases after the former were inflated by 18%.

Projections derived from back-calculation.

With the exception of Hyman, the analysts listed in Table B1 also predicted future AIDS cases from back-calculation. In addition, Gail made predictions by using a modification of back-calculation (Gail, written communication). He first estimated HIV incidence through June 1987 by applying standard back-calculation to AIDS cases diagnosed through that date. He then calculated future AIDS cases under the assumption of a distribution of incubation periods reflecting increasing proportions of infected persons without AIDS who were receiving therapy, with an effect starting in July 1987 (implying secular changes in the distribution of incubation periods beginning then). Gail assumed, as did Brookmeyer, that improved medical care can reduce the risk of progression to AIDS by 65%. From fitting adjusted AIDS incidence data, Gail estimated that improved care gradually phased in to cover 60% of those in groups with good access to health care (e.g, homosexual men, transfusion recipients, persons with hemophilia) and 20% of those in other groups. Improved care does not require treating all infected persons in these groups. It may require only more effective patient management (e.g., close follow-up of patients until they are at high risk of developing AIDS before giving treatments such as zidovudine or prophylactic pentamidine).

Hay's back-calculation results in Table B1 imply that approximately 10,000 HIV infections occurred among adults and adolescents from 1986 through 1988. In an alternative analysis, Hay assumed that there were 15,000 new infections in both 1986 and 1987; 20,000, in 1988; 25,000, in 1989; and 30,000, in 1990. Because data from active-duty U.S. military personnel indicate that many more than 10,000 HIV infections occurred among adult U.S. residents during 1986-1988, CDC used the results from this alternative analysis in making projections.

For each year, the plausible range for AIDS case projections (Tables B2 and B3) consists of the smallest and largest predictions from extrapolation (Hyman; CDC), from back-calculation without therapy effects fit to adjusted AIDS incidence (through June 1987 by Rosenberg; June 1989 by Hyman; December 1989 by Hay), and from Brookmeyer's and Gail's modifications of back-calculation incorporating effects of improved medical care. These modifications assume 0 to 150,000 new HIV infections per year since July 1987. Projections obtained from reported cases (Table B2) were increased by 18% to obtain projections for all diagnosed cases (Table B3), corresponding to the estimate that 85% of all cases diagnosed are eventually reported.

Projections for cases within groups defined by risk behavior, by race/ethnicity, and by gender were based on projections for all cases and projections for the proportions of cases within each group. Projections of the proportions of cases in a group were made by extrapolating the trend in the monthly proportion of cases in that group (after adjusting for reporting delays) observed for July 1987 through June 1989 (A1). The predicted range for each month in the group was then obtained by multiplying the predicted range for all cases by the predicted proportion for that group. The annual prediction is the sum of the monthly predictions.

4. Estimation of HIV prevalence from back-calculation estimates of cumulative HIV incidence

Because back-calculation is based on numbers of reported AIDS cases, this method estimates cumulative HIV incidence--the total number of persons infected with HIV in the past--for those persons who have been or will be diagnosed with AIDS and whose diagnosis will ultimately be reported. This estimate includes persons diagnosed as having AIDS who have died, but excludes two groups of infected persons who will never be reported as AIDS cases: a) all diagnosed but unreported AIDS patients, and b) all those infected with HIV who are never diagnosed as having AIDS. These patients are never diagnosed as having AIDS because they either have an unrecognized disease or they die from another cause (e.g., pneumococcal pneumonia, endocarditis, influenza (A12)) before developing a disease that fits within the surveillance definition for AIDS.

Estimates of HIV prevalence can be obtained from back-calculation results by making the corresponding adjustments, i.e, by subtracting deaths associated with reported AIDS cases and then adjusting the resulting estimate for the number of persons with HIV infection who will never be reported as having AIDS.

Adjusting for deaths among persons reported as having AIDS.

The first step in estimating HIV prevalence from the back-calculation estimates of cumulative HIV incidence is to subtract deaths among patients reported to have AIDS. This step gives an estimate of HIV prevalence for infected persons who will ultimately have a reported diagnosis of AIDS. The survival analyses described in Section 5 of this appendix yield estimates of approximately 12,000 deaths for persons diagnosed with AIDS through 1985; 61,000 through 1988; and 74,000 through June 1989. The last two estimates are approximately 4% higher than the corresponding numbers of reported deaths, in part due to reporting delays.

Adjustment for incomplete ascertainment of life-threatening symptomatic HIV infection by AIDS surveillance data.

If ascertainment of severe HIV infection has not been constant, a correct estimate of cumulative HIV incidence would require adjusting for under-ascertainment before carrying out back-calculation. Although the analysis by Buehler et al. (A13) suggests that ascertainment improved in the mid-1980s, the information on trends in ascertainment is not sufficiently reliable to adjust reported AIDS cases before applying back-calculation. CDC made the simplifying assumption that ascertainment has been approximately constant and adjusted the estimates from back-calculation based on AIDS incidence for reported cases to account for incomplete ascertainment. Because data are not available on the time from HIV infection to life-threatening symptomatic HIV infection (resulting in death) among persons who die without meeting the AIDS surveillance definition, CDC also assumed that this time period has the same distribution as that of the time from HIV infection to AIDS. On the basis of the results of Buehler et al. (A13) and the following analyses, CDC assumed that reported AIDS cases represent 75% of all severe health problems associated with HIV infection. Because 1/0.75 = 1.33, CDC estimated HIV prevalence as 1.33 times the back-calculation estimate of cumulative HIV incidence minus deaths. Note that this adjustment includes diagnosed but unreported AIDS cases.

Although definitions of life-threatening symptomatic HIV infection may vary, the most important measure of such disease not meeting the AIDS surveillance definition is the extent of HIV-related death. The CDC estimate of ascertainment is based on comparing surveillance data with vital statistics data. Buehler et al. (A13) estimated that, for male U.S. residents 25-44 years of age (the group most affected by the HIV/AIDS epidemic), at least 70%-90% of the actual number of deaths that occurred in 1987 and were attributable to HIV infection were reported to the CDC AIDS surveillance system. A reasonable estimate, then, is that in 1987 approximately three-fourths of deaths attributable to HIV infection were reported. Similar analyses indicate that at least two-thirds of such deaths during 1985-1986 were reported; part of the improvement in reporting from 1985-1986 to 1987 could be due to the expansion of the surveillance definition in 1987. Preliminary analyses, based on provisional mortality data, suggest that the proportion of deaths that were attributable to HIV infection and reported to CDC remained at approximately three-fourths in 1988. These results can be used to estimate HIV prevalence (by adjusting prevalence estimates obtained from back-calculation) if nearly all deaths of persons reported to have AIDS are reported to the surveillance system and if ascertainment is similar for other men and women.

Surveillance data indicate that most deaths are reported of persons whose AIDS diagnoses had been reported earlier. Among all adults and adolescents with reported diagnoses, at least 90% of those diagnosed before 1984 have been reported as having died. Approximately 89% and 87% of those diagnosed as having AIDS in 1984 and 1985, respectively, are also known to have died (A14). The corresponding percentages of male U.S. residents 25-44 years of age at diagnosis and known to have died are about the same. Reporting of deaths is likely to have improved recently because many local surveillance units now routinely use death certificates as an adjunct to both case finding and mortality follow-up. Because these data indicate that at least 90% of deaths associated with reported AIDS cases are reported, an analysis of ascertainment of life-threatening symptomatic HIV infection based on deaths is indicative of comparable HIV-associated health problems detected with the CDC AIDS surveillance system.

Ascertainment estimates based on male U.S. residents who are 25-44 years of age at the time of death can be applied to all persons with life-threatening symptomatic HIV infection if ascertainment is similar for most other HIV-infected persons. For example, ascertainment may be less complete for IVDUs in New York City (A12) and for children less than 13 years of age. Male U.S. residents 25-44 years of age at diagnosis (representing approximately two-thirds of all AIDS cases diagnosed through 1986; Table A1) are used as a reference group because nearly all male U.S. residents 25-44 years of age at death were already members of this age group at diagnosis (22,540 (98%) of 22,965, based on deaths reported through September 1990). Of the remaining cases, approximately two-thirds represent adult and adolescent male U.S. residents of other ages; the proportion of IVDUs who are members of these latter age groups is lower than that in the reference group. Most other AIDS cases are among females and among male nonresidents (i.e., males living in U.S. territories); the proportion of IVDUs who are members of these latter groups is higher than that for the reference group (Table A1). Because resident males in groups other than the 25-44 age group may have more complete ascertainment than the reference group but females, children, and nonresidents may have less complete ascertainment, average ascertainment for HIV-infected persons (ascertainment without regard to age, gender, or mode of transmission) may very well be similar to that for men 25-44 years of age at diagnosis.

Table A1 illustrates that ascertainment of life-threatening symptomatic HIV infection by the AIDS surveillance data among male U.S. residents 25-44 years of age at death may be similar to that for all persons infected with HIV. Table A1 shows the distribution of reported AIDS cases diagnosed through 1986 among the three groups described above, as well as the proportion of cases in each group (i.e., IVDUs, children) assumed to have less complete ascertainment. Assuming arbitrarily that ascertainment is 50% for IVDUs and children and 80% for other HIV-infected persons, the level of ascertainment for male U.S. residents 25-44 years of age at diagnosis would be estimated as 75%. Because this estimate is nearly the same as that of Buehler et al. (A13), the assumptions of 50% and 80% ascertainment in the two groups are plausible. Because overall ascertainment is also very close to 75%, it is reasonable to apply the estimate of 75% ascertainment to all persons infected with HIV, as was proposed in the adjustment method.

5. Survival after a diagnosis of AIDS

Using the Kaplan-Meier procedure and AIDS surveillance data, CDC estimated survival after a diagnosis of AIDS. Follow-up time was censored at the end of 1988 to allow most deaths that had occurred in 1988 to be reported. (Analyses of deaths from October 1987 to September 1990 indicate that less than 85% of deaths are reported within 1 year of death.) The resulting survival estimates were used to predict future deaths among persons diagnosed with AIDS, numbers of persons alive with AIDS, and deaths associated with symptomatic HIV infection (Tables 2, B2, and B3), and to estimate HIV prevalence (Section 4).

The AIDS surveillance data show that 13% of persons reported to have AIDS die in the same month as the diagnosis of AIDS was made; an unknown proportion of these persons represent late diagnoses of AIDS (perhaps even at death) rather than rapid progression of disease. Survival patterns associated with the other 87% of AIDS cases diagnosed in the period 1984 through the first half of 1987 seem to follow an exponential distribution for the first 24 months after diagnosis, with the risk of death depending on the date of diagnosis. For persons diagnosed as having AIDS in 1984 or 1985, approximately 45% survived at least 12 months and 50% survived at least 11 months; for those whose AIDS was diagnosed in the period 1986 through the first half of 1987, 55% survived at least 12 months and 50% survived at least 14 months. Survival greater than 24 months after diagnosis among these persons appears to be better than that predicted by an exponential distribution, in part due to delays in reporting deaths and to deaths never reported. Improved survival among persons diagnosed with AIDS in 1986 and 1987, compared with the survival period for persons whose AIDS was diagnosed in earlier years, has also been found in San Francisco (A15) and in another analysis of national AIDS surveillance data (A16), especially for patients with P. carinii pneumonia.

Because there are limited data on survival for persons whose AIDS has been diagnosed since mid-1987, CDC estimated survival for these persons by modifying death rates for persons diagnosed as having AIDS from January 1986 through June 1987. The modification was based on the study in San Francisco (A15), which found that patients taking zidovudine survived longer after a diagnosis of AIDS than patients not on antiviral therapy (median survival, 21 vs. 14 months; 1-year survival, 85% vs. 56%). Under an assumption of exponential survival, these median survival times imply a reduction in mortality risk of 35% for patients taking zidovudine.

The number of persons who die each year after being diagnosed as having AIDS was estimated by applying the estimated survival distribution both to the adjusted AIDS incidence through June 1989 and to the range of predicted AIDS incidence from July 1989 through December 1993 (Table 2). The assumption was that 13% of persons with AIDS receive the diagnosis and die in the same month and that the remaining 87% have an exponential distribution of survival, with median survival of 11 months for patients diagnosed through December 1985, 14 months for patients diagnosed in January 1986-June 1987, and 16.3-18.7 months for patients diagnosed after that date. This change in survival corresponds to assumptions that patients taking zidovudine have a 35% reduction in the risk of death and that 40%-75% of persons with AIDS are being treated. Total deaths associated with HIV infection were estimated as 1.33 x the number of deaths among persons with reported diagnoses of AIDS, reflecting the estimate that approximately 75% of all HIV-associated deaths are deaths of persons who have reported AIDS diagnoses (1/0.75 = 1.33).


A1. Karon JM, Devine OJ, Morgan WM. Predicting AIDS incidence by extrapolating from recent trends. In: Mathematical and Statistical Approaches to AIDS Epidemiology, C. Castillo-Chavez, ed. Lecture Notes in Biomathematics, vol. 83, Berlin; Springer Verlag, 1989:58-88. A2. Cleveland WS. Robust locally weighted regression and smoothing scatterplots. J Amer Statist Assoc 1979;74:829-36. A3. Brookmeyer R, Goedert JJ. Censoring in an epidemic with an application to hemophilia-associated AIDS. Biometrics 1989;45:325-35. A4. Brookmeyer R, Gail MH. A method for obtaining short-term projections and lower bounds on the size of the AIDS epidemic. J Amer Statist Assoc 1988;83:301-8. A5. Brookmeyer R, Damiano A. Statistical methods for short-term projections of AIDS incidence. Stat Med 1989;8:23-34. A6. Hyman JM, Stanley EA. Using mathematical models to understand the AIDS epidemic. Math Biosci 1988;90:415-73. A7. Hay JW. Projecting the medical costs of HIV/AIDS: an update with focus on epidemiology. In: New Perspectives on HIV-Related Illness: Progress in Health Services Research -- Conference Proceedings. National Center for Health Services Research, Pub. No. DHHS (PHS) 89-3449. Rockville, MD: September, 1989:84-97. A8. Hay JW, Wolak FA. Bootstrapping HIV/AIDS projection models: back-calculation with linear inequality-constrained regression. Working Paper in Economics E-90-5, Hoover Institution, Stanford, California, 1990. A9. Auger I, Thomas P, De Gruttola V, et al. Incubation periods for paediatric AIDS patients. Nature 1988;336:575-7. A10. Brookmeyer R, Liao J. Statistical modelling of the AIDS epidemic for forecasting health care needs. Biometrics 1990;46:1151-63. A11. Volberding PA, Lagakos SW, Koch MA, et al. Zidovudine in asymptomatic human immunodeficiency virus infection: a controlled trial in persons with fewer than 500 CD4-positive cells per cubic millimeter. N Engl J Med 1990;322:941-9. A12. Stoneburner RL, Des Jarlais DC, Benezra D, et al. A larger spectrum of severe HIV-1 related disease in intravenous drug users in New York City. Science 1988;242:916-9. A13. Buehler JW, Devine OJ, Berkelman RL, Chevarley FM. Impact of the human immunodeficiency virus epidemic on mortality trends in young men, United States. Am J Public Health 1990;80:1080-6. A14. CDC. HIV/AIDS Surveillance Report, October 1990: Centers for Disease Control, Atlanta, Georgia: pp. 1-18. A15. Lemp GF, Payne SF, Neal D, et al. Survival trends for patients with AIDS. JAMA 1990;263:402-6. A16. Harris JE. Improved short-term survival from AIDS among patients initially diagnosed with Pneumocystis carinii pneumonia, 1984 through 1987. JAMA 1990;263:397-401.

Disclaimer   All MMWR HTML documents published before January 1993 are electronic conversions from ASCII text into HTML. This conversion may have resulted in character translation or format errors in the HTML version. Users should not rely on this HTML document, but are referred to the original MMWR paper copy for the official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices.

**Questions or messages regarding errors in formatting should be addressed to

Page converted: 08/05/98


Safer, Healthier People

Morbidity and Mortality Weekly Report
Centers for Disease Control and Prevention
1600 Clifton Rd, MailStop E-90, Atlanta, GA 30333, U.S.A


Department of Health
and Human Services

This page last reviewed 5/2/01