Skip Navigation LinksSkip Navigation Links
Centers for Disease Control and Prevention
Safer Healthier People
Blue White
Blue White
bottom curve
CDC Home Search Health Topics A-Z spacer spacer
Blue curve MMWR spacer

A Method for Classification of HIV Exposure Category for Women Without HIV Risk Information

Amy Lansky, Ph.D., M.P.H.
Patricia L. Fleming, Ph.D., M.S.
Robert H. Byers, Jr, Ph.D.
John M. Karon, Ph.D.
Pascale M. Wortley, M.D., M.P.H.
Division of HIV/AIDS Prevention --- Surveillance and Epidemiology
National Center for HIV, STD, and TB Prevention


An increasing number of cases of human immunodeficiency virus (HIV) and acquired immunodeficiency syndrome (AIDS) among women is reported to state and territorial health departments without exposure risk information (i.e., no documented exposure to HIV through any of the recognized routes of HIV transmission). Because surveillance data are used to plan prevention and other services for HIV-infected persons, developing methods to accurately estimate exposure risk for HIV and AIDS cases initially reported without risk information and assisting states to analyze and interpret trends in the HIV epidemic by exposure risk category is important. In this report, a classification model using discriminant function analysis is described. The purpose of the classification model is to develop a proportionate distribution of exposure risk category for cases among women reported without risk information. The distribution was estimated based on behavioral and demographic data obtained from interviews with HIV-infected women; the interviews were conducted in 12 states during 1993--1996. Variables used in the analysis were alcohol abuse, noninjection-drug use, and crack use; year of HIV/AIDS diagnosis; age; employment; and region. As a result of the classification procedure, nearly all cases among women with no reported risk were classified into an exposure risk category: 81%, heterosexual contact; and 16%, injection-drug use. These proportions are higher than the current redistribution fractions (calculated from risk reclassification patterns and weighted by demographic characteristics) and reflect the increasing proportion of cases among women attributable to heterosexual contact with an infected partner.

This report provides one method that could be applied to HIV surveillance data at the national level to estimate the proportion of cases in exposure risk categories. However, because the study in this report is limited in sample size and geographic representativeness, other models are also needed for adjusting risk exposure data at the national, state, and local levels.


Women account for a steadily increasing proportion of cases of acquired immunodeficiency syndrome (AIDS), representing 23% of cases reported to CDC in 1999 (1). Since 1995, an average of approximately 11,600 cases of AIDS have been diagnosed in women each year. The expansion of the AIDS case definition in 1993 (2) was associated with a large increase in the total (i.e., men and women) number of reported cases, from 42,290 reported cases in 1992 (3) to 72,967 reported cases in 1995 (4). An increase in the number of reported cases of human immunodeficiency virus (HIV) infection also is expected as additional states implement reporting of HIV infection, including cases among women with a previous diagnosis of HIV infection (5).


For surveillance purposes, each reported case of HIV or AIDS is counted once in a list of exposure risk categories (i.e., men who have sex with men; injection-drug users; men who have sex with men and are injection-drug users; recipients of clotting factor for hemophilia or other coagulation disorders; persons who have had heterosexual contact with a partner who is HIV-infected or who has one of the risks already listed; or recipients of HIV-infected blood or blood components other than clotting factor or of HIV-infected tissue [1]). Among persons who have AIDS and are reported as having multiple possible routes of HIV acquisition, a single exposure risk category is assigned based on the most probable or efficient mode of transmission (1). However, all risk information is retained in the database.

In 1994, the proportion of women with AIDS infected through heterosexual contact surpassed the proportion infected through injection-drug use; overall, heterosexual transmission accounted for 40% of AIDS cases reported among women in 1999 (1). A total of 11% of these women reported heterosexual contact with an injection-drug user, and the other 29% reported sexual contact with men of unspecified or other risks (e.g., men who have sex with men and women). Thirty-two percent of the cases among women were initially reported with no exposure risk category, which is common for recently reported cases.

HIV and AIDS cases reported without exposure risk information (i.e., no documented exposure to HIV through any of the routes listed in the exposure risk categories) are assigned to a "no reported risk" category. Cases in this category might be reclassified into a defined category after follow-up by the local health department as part of routine surveillance or a supplemental surveillance project.

The proportion of all reported AIDS cases in the United States initially reported without exposure risk information increased from 5% in the early 1980s (3) to approximately 20% in 1999 (1). The proportion of HIV cases reported without exposure risk information is higher among women than men; in 1998, a total of 51% of HIV cases among women and 37% of HIV cases among men were reported without exposure risk information (1). Tracking trends in the proportionate distribution of cases by exposure risk category is an important step in understanding the dynamics of HIV transmission and in planning effective prevention programs at the state and local levels.

Because of the number of cases reported without exposure risk information, local health departments are unable to conduct follow-up and ascertain risk information for all cases. To analyze surveillance trends, CDC has used a statistical adjustment to assign a risk for cases reported without exposure risk information (6). This adjustment method has been based on historical patterns of reclassification of AIDS cases initially reported without risk, which accounts for sex, race/ethnicity, and geographic region (7). However, these adjustments from the AIDS case surveillance database might be biased because an increasing proportion of recent AIDS cases are not followed up to ascertain risk. A method based on AIDS case surveillance also might not be representative of HIV cases. As more states implement HIV reporting, several strategies will need to be used to track epidemiologic trends for newly reported cases of HIV infection or AIDS without exposure risk information.

In this report, methods of making statistical adjustments to the HIV infection surveillance data for cases reported without exposure risk information are described. The information used for the adjustments included behavioral and demographic data from interviews with women who have HIV infection but not AIDS and women with a recent AIDS diagnosis. Using discriminant function analysis, cases were classified into an exposure risk category. Results from the classification were compared with the exposure risk category noted in the case report. For cases reported without exposure risk information, the redistribution fractions derived from the classification model were also compared with those fractions derived from the current method of redistribution, which is based primarily on demographic information.

Statistical Redistribution of Exposure Risk

Materials and Methods
All states and territories in the United States report cases of AIDS to CDC through the HIV/AIDS Reporting System (HARS); as of December 1999, a total of 33 states and territories also report cases of HIV infection without AIDS (1). HIV and AIDS cases are reported to state health departments, which forward the data to CDC with no personally identifying information. The Supplement to HIV/AIDS Surveillance (SHAS) is a surveillance project in which persons who have been reported to state or local health departments in 12 states are interviewed using a standardized, confidential questionnaire. Participants must be aged >18 years, give consent, and be able to complete the interview. SHAS has been ongoing since 1990; detailed methods of this project have been described elsewhere (8). Data from HARS and SHAS are linked by using an identification number assigned by the state health department.

Data from women who completed a SHAS interview from January 1993 through December 1996 were analyzed. The analysis was restricted to women with a diagnosis of HIV infection (not AIDS), regardless of when they learned of their diagnosis, and women who had learned of their AIDS diagnosis within the 12 months before the interview. Women whose exposure risk category was transfusion or hemophilia were excluded because these categories account for a small proportion of cases.

Trained interviewers administered a 45-minute standardized questionnaire to eligible persons who gave oral consent to be interviewed. The instrument included, but was not limited to, questions regarding sociodemographics, sexual behaviors during the previous year, and substance use during the previous 5 years. Each health department ensured privacy during the interview. The SHAS project was approved by local human subjects review boards. Names and other personal identifiers were removed before data were sent to CDC.

The variables for exposure risk category used in this analysis came from HARS. In most instances, exposure risk information in HARS came from medical records; however, in some states, risk information obtained during the SHAS interview might be used to determine exposure risk category for cases initially reported without exposure risk information. The race/ethnicity variable also came from HARS. Behavioral and demographic data that were used as independent variables in the model to predict exposure risk category came from SHAS. A history of the following behaviors and diseases was examined: crack use (previous 5 years, >5 years ago, or never); noninjection-drug use, including crack but excluding marijuana (previous 5 years); sexually transmitted disease (previous 10 years); alcohol abuse (as defined by the CAGE questions*) (9); exchange of sex for money or drugs (previous 5 years); and number of male sex partners (previous 5 years). Demographic variables that came from SHAS were age, years of education, household income in the previous year, employment status, year of interview, disease status (AIDS or HIV infection) at the time of the interview, and region of the country where the person lived at the time of the interview.

We used the chi-square test to assess the bivariate relation between each of the independent variables and exposure risk category. Discriminant function analysis was used to classify respondents by exposure category using SPSS version 7.5 (10). Variables were entered into the analysis using a backward elimination procedure to select a minimum subset of predictors. Data from the interview and case report form were used to predict membership in three exposure risk categories (injection-drug users, heterosexual contact, and no reported risk). The data were randomly split into two parts with an equal number of observations in each. With one part of the data, a discriminant function analysis was conducted to identify the classification model, which was then applied to the other part of the data to classify exposure risk category. These analyses were conducted repeatedly using split random samples (Table 1). The overall correct classification never varied more than two percentage points for any of the randomly generated split samples; the data from one analysis is presented in this report.


Of 1,297 women who were interviewed, 410 (32%) had injection-drug use as their exposure risk category in HARS; 638 (49%), heterosexual contact; and 249 (19%), no reported risk (Table 1). Women whose exposure risk category was injection-drug use were more likely than those whose exposure risk category was heterosexual contact and those with no reported risk to be white, older, and unemployed; have lower income; and have received their diagnosis before 1993. In addition, most other risk behaviors were more prevalent among injection-drug users than among other groups (Table 1).

In the discriminant function analysis, the variables selected by backward elimination as having the ability to discriminate among the exposure categories were alcohol abuse, noninjection-drug use, crack use, diagnosis year, age, employment, and region. The classification resulting from the discriminant function analysis was able to correctly categorize 72% of women (not including those whose HARS exposure risk category was no reported risk). Classification was considered "correct" if it was assigned to the same category as the exposure risk category reported in HARS.

Nearly all (97.5%) women with no reported risk were classified to heterosexual contact (81%) or injection-drug use (16%) (Table 2). These proportions (redistribution fractions) are the basis for adjustments made to the no reported risk category when analyzing trends.

Redistribution fractions were compared with those from the current method of adjusting exposure risk information, which is based on data from reclassified AIDS cases initially reported with no reported risk. The redistribution fractions derived from this study, which included HIV and AIDS cases, would distribute a higher proportion of women with no reported risk into the heterosexual contact category (81%) than the current method (69%--70%) (Tables 2 and 3). This difference is consistent with trends toward an increasing proportion of women with known risk being classified in the heterosexual contact category (1,11).


Exposure risk information is used to monitor trends in routes of HIV transmission, to plan prevention programs, and to allocate resources to priority populations at risk for HIV infection. However, with an increasing proportion of cases reported with no exposure risk information, statistical adjustments must be made to the surveillance data to monitor trends. Findings in this report indicate that a statistical model based on data reported in interviews and case report forms of persons with HIV/AIDS can be used to classify most cases among women into the same exposure risk category recorded on their case report form. In addition, the model can classify nearly all cases among women reported without risk. Behaviors, including crack use, other noninjection-drug use, and alcohol use, were stronger predictors of exposure risk category than demographic characteristics. These findings emphasize the need for behavioral surveillance to improve HIV prevention planning at the state and local levels.

The findings indicate that use of crack and other noninjection-drugs was more prevalent among injection-drug users than among women in the heterosexual contact exposure category (Table 1). Crack is a risk for heterosexual transmission of HIV because of its relation with risky sexual behaviors (12). Injection-drug use in combination with crack use has also been associated with a higher prevalence of risky sexual behaviors (13). Given that the model in this report would likely classify crack users into the injection-drug use exposure category, rather than the heterosexual contact category, the link between crack use and heterosexual transmission of HIV should be further explored.


Additional research is needed to address the limitations in this risk adjustment method. Classification results based on the method used in this report might differ from other populations according to background prevalence of infection and risk behaviors and the distribution of exposure risk categories. The data available for analysis from 12 states participating in SHAS were too sparse to make reliable estimates by sex, region, and race at the national level, which is currently done with the AIDS surveillance data; SHAS data would be needed from a large number of additional states to be able to make such estimates. Therefore, at the present, this method will not be adopted as a statistical method to adjust HIV and AIDS surveillance data at the national level to examine trends in exposure risk category until SHAS interview data are available on all or a representative sample of new HIV/AIDS cases in additional states. This method could be used in areas that conduct SHAS to make estimates for classifying cases reported without exposure risk information.

The study in this report highlights the complexities of estimating exposure risk without a reference method for comparison. The self-reported data from SHAS might be biased by recall or social desirability, which might result in over- or underreporting of risk behaviors. During the study period in which the proportion of cases reported without exposure information was lower, some states updated a small proportion of HARS records with data obtained from SHAS. Thus, the findings in this report might overestimate the proportion of cases "correctly" classified. HARS data abstracted from medical record review might be biased by what health-care providers document or how the record abstractor interprets the documentation. Without a reference --- for example, knowing whether self-report or chart review provides more accurate and valid risk data --- a comparison of results can only be made from different methods of adjusting risk and deciding which methods are best from a practical point of view. Thus, the reference might be a combination of interview and chart review cross-validated with biological tests (e.g., testing for sexually transmitted diseases, including HIV). If exposure risk information was obtained from SHAS interviews and from medical chart reviews on a representative sample of cases in all states with high or moderate prevalence of HIV, an accurate probability distribution for exposure risk category for HIV infection at the state and national levels could be produced.

As more states adopt HIV reporting and the volume of reported cases increases, an important task for CDC is to develop methods to accurately estimate risk for HIV and AIDS cases initially reported without risk exposure information and to assist states in analyzing and interpreting trends in risk exposures. The methods in this report provide one possible solution that can be applied to HIV surveillance data at the national level. However, until these methods are evaluated and verified in additonal states, the method of applying demographic data and the risk reclassification information from investigated HIV cases to cases with no reported risk (the method is used with the AIDS data for cases with no reported risk) will be used as a short-term, retrospective adjustment to the HIV data. The future application of the discriminant function analysis and classification will depend on having complete, high quality data from chart reviews and interviews with representative samples of cases as described in this report.


Women account for a steadily increasing proportion of AIDS cases in the United States. At the same time, the proportion of cases among women reported without exposure risk information is increasing, and collecting this information on all cases is not practical. Therefore, statistical adjustments to surveillance data are needed to monitor trends in exposure categories. Reliable data on exposure risk categories are crucial for HIV prevention because allocation and direction of resources is based on this kind of risk information. The model presented in this report is one potential option for making the needed statistical adjustments to exposure risk information, particularly at the state and local levels. Given the limitations of sample size and geographic representativeness of the data, other options, including investigations on a sample of cases and statistical estimation, should continue to be explored.


The authors thank Allyn K. Nakashima, M.D., and Teresa A. Hammett, M.P.H., for contributing to the design of this analysis.


  1. CDC. HIV/AIDS surveillance report. Atlanta, GA: US Department of Health and Human Services, CDC, 1998. (vol 11, no. 2).
  2. CDC. 1993 Revised classification system for HIV infection and expanded surveillance case definition for AIDS among adolescents and adults. MMWR 1992;41(No. RR-17).
  3. Hammett TA, Ciesielski CA, Bush TJ, Fleming PL, Ward JW. Impact of the 1993 expanded AIDS surveillance case definition on reporting of persons without HIV risk information. J Acquir Immune Defic Syndr 1997;14:259--62.
  4. CDC. HIV/AIDS Surveillance Report. Atlanta, GA: US Department of Health and Human Services, Public Health Service, CDC, 1996:10. (vol 8, no. 2).
  5. CDC. CDC guidelines for national human immunodeficiency virus case surveillance, including monitoring for human immunodeficiency virus infection and acquired immunodeficiency syndrome. MMWR 1999;48(RR-13).
  6. Fleming PL, Jaffe HW. AIDS among heterosexuals in surveillance reports [Letter]. New Engl J Med 2001;344:612--3.
  7. Green TA. Using surveillance data to monitor trends in the AIDS epidemic. Stat Med 1998;17:143--54.
  8. Buehler JW, Diaz T, Hersh BS, Chu SY. The supplement to HIV-AIDS Surveillance Project: an approach for monitoring HIV risk behaviors. Public Health Rep 1996;111(suppl 1):133--7.
  9. Ewing JA. Detecting alcoholism: The CAGE Questionnaire. JAMA 1984;252:1905--7.
  10. SPSS, Inc. SPSS Base 7.5 for Windows. Chicago, IL: SPSS, Inc.;1997:245--52.
  11. CDC. Diagnosis and reporting of HIV and AIDS in states with integrated HIV and AIDS surveillance---United States, January 1994--June 1997. MMWR 1998;47:309--14.
  12. Edlin BR, Irwin KL, Faruque S, et al. Intersecting epidemics---crack cocaine use and HIV infection among inner-city young adults. New Engl J Med 1994;331:1422--7.
  13. Booth RE, Watters JK, Chitwood DD. HIV risk-related sex behaviors among injection drug users, crack smokers, and injection drug users who smoke crack. Am J Public Health 1993;83:1144--8.

* The CAGE questions ask if the respondent had ever wanted to Cut down on their drinking, had Annoyed others with their drinking, felt Guilty about drinking, or needed a drink in the morning as an Eye-opener.

Table 1

Table 1
Return to top.
Table 2

Table 2
Return to top.

Disclaimer   All MMWR HTML versions of articles are electronic conversions from ASCII text into HTML. This conversion may have resulted in character translation or format errors in the HTML version. Users should not rely on this HTML document, but are referred to the electronic PDF version and/or the original MMWR paper copy for the official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices.

**Questions or messages regarding errors in formatting should be addressed to

Page converted: 5/10/2001


Safer, Healthier People

Morbidity and Mortality Weekly Report
Centers for Disease Control and Prevention
1600 Clifton Rd, MailStop E-90, Atlanta, GA 30333, U.S.A


Department of Health
and Human Services

This page last reviewed 5/10/2001