Task 1a: How to Generate Age-Adjusted Prevalence Rates and Means in SUDAAN

In this example, you will generate age-adjusted prevalence rates and standard errors for high blood pressure (HBP) by sex and race in persons 20 years and older. An optional second example is available demonstrating how to generate age-adjusted means and standard errors for Body Mass Index (BMI) by sex and race/ethnicity for persons 20 years and older.

To calculate age-adjusted prevalence rates, you will need to know the age standardizing proportions that you want to use, and then apply them to the populations under comparison. This is called the direct method for age standardization. Typically, Census data are used as the standard population structure.  For age standardization in NHANES, NCHS recommends using the 2000 Census population.  A spreadsheet with the year 2000 U.S. population structure by age is attached below.  Calculate the standard age proportions by dividing the age-specific Census population (P) by the total Census population number (T). The standardizing proportions (P/T) should sum to 1 (see the table below for the standard age proportions used in this module.)



For your convenience, standard proportions for different NHANES population age groupings are provided in the Excel spreadsheet attached below. This file uses the 2000 Census as the standard population.  The adjustment factors were calculated for four age groupings:

  1. all ages,
  2. ages 6 years and older,
  3. ages 20 years and older using 10 year age intervals, and
  4. for the blood pressure example in this module, for ages 20 years and older using 20 year age intervals.

For other age groupings, you can combine the smaller age groups provided in order to reflect the age and subpopulation you are using in your analysis.


Standard Proportions for NHANES Population Groupings link: ageadjtwt.xls


Example of How to Calculate Standard Age Proportions


Here is an example of how to calculate the standard age proportions by dividing the age-specific Census population (P) by the total Census population number (T). The standardizing proportions should sum to 1.

Census Population
(in thousands)
Standard Proportions for 20-year Age Groups Based on the 2000 U.S. Census Standard Population
Age Group Total Census Population
(in thousands)
Standard Age Proportions
  P T P/T
20-39 77,670 195,850 .396579
40-59 72,816 195,850 .371795
60+ 45,364 195,850 .231626
Total:   195,850 Sum:   1


As you can see each "standard age proportion", also referred to as “age adjustment weight”, is simply the proportion of people in the 2000 Census - the standard population - in a specific age category.  For example, the standard age proportion for people 20-39 years old is:

Equation for the standard age proportion for people 20-39 years old

77,670 thousand people age 20-39 years over 195,850 thousand population ages 20+ equals 0.396579





Klein RJ, Schoenborn, CA. Age Adjustment using the 2000 projected U.S. population. Healthy People Statistical Notes, no. 20. Hyattsville, Maryland: National Center for Health Statistics. January 2001.


Step 1: Generate Age-Adjusted Prevalence Rates

 The SUDAAN procedure, proc descript, is used to generate age-adjusted percentages (prevalence rates) and standard errors.  The age standardization variable and proportions are provided in the STDVAR and STDWGT  statements.  The age-adjusted estimates are requested on the print statement along with the sample size (nsum). The SUDAAN program used to obtain weighted age-adjusted prevalence rates and standard errors for high blood pressure by sex and race, among persons 20 years and older follows here.


These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.


SUDAAN Procedure for Generating Age-Adjusted Prevalence Rates
Statements Explanation

proc sort data =analysis_data;
 by sdmvstra sdmvpsu;
run ;

Use the proc sort procedure to sort the dataset by strata (sdmvstra) and PSU (sdmvpsu).

proc descript data =analysis_data design=wr;

Use the proc descript procedure to generate adjusted means and specify the sample design using the design option WR (with replacement).

subpopn ridageyr >=20 ;

Use the subpopn statement to select the sample persons 20 years and older (ridageyr >=20) because only those individuals are of interest in this example.

Note: For accurate estimates, it is preferable to use subpopn in SUDAAN to select a subpopulation for analysis, rather than select the study population in the SAS program while preparing the data file. 

NEST  sdmvstra sdmvpsu;  Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects.
WEIGHT wtmec4yr; Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for four years of data (wtmec4yr) is used.
subgroup  riagendr age race ; Use the subgroup statement to list the categorical variables for which statistics are requested. These names will also appear in the table statement below. In this example, gender (riagendr), age (age), and race-ethnicity (race) are of interest.
levels   2 3 4 ; Use the levels statement to define the number of categories in each of the subgroup variables. The level must be an integer greater than 0. In this example, there are two genders, three age groups, and four race-ethnicity groups.
var hbp;

Use the var statement to name the variable(s) to be analyzed. In this example, the high blood pressure variable (hbp) is used.

catlevel 1 ;

Use the catlevel statement to indicate the var statement variable(s) are categorical and select the level of each variable to be analyzed. In this example, you are interested in hbp=1, i.e., persons who have high blood pressure.

table riagendr * race ; Use the table statement to specify cross-tabulations that estimates are requested. If a table statement is not present, a one—dimensional distribution is generated for each variable on the subgroup statement. In this example, the estimates are for gender (riagendr) by race-ethnicity (race).

stdvar age;
stdwgt 0.3966 0.3718 0.2316 ;

Use the stdvar and stdwgt statements to yield standardized estimates of the mean. In the example, age is the standardizing variable as defined on the stdvar statement (note that age must also appear on the subgroup statement). The stdwgt statement specifies the population proportions based on the 2000 Census estimates. The number of proportions listed should equal the number of levels in the stdvar variable and should be listed in the same order as the respective level of the variable (see levels statement above). Their sum should equal 1.

nsum= "Sample Size" percent= "Percent" sepercent= "SE" ;

Use the print statement to assign names and format the desired statistics and to view the output. If the statement print is used alone, all of the default statistics are printed with default labels and formats.

In this example, the sample size (nsum), adjusted percent (percent), and standard error of the percent (sepercent) are requested.

Note: For a complete list of statistics that can be requested on the print statement see SUDAAN Users Manual.

rtitle " Age-standardized prevalence of persons 20 years and older with high blood pressure: NHANES 1999-2002" ; Use the rtitle statement to assign a heading to the of output.
rfootnote "Age adjusted by the direct method to the year 2000 Census population projections using the age groups 20-39, 40-59, and 60+" ; Use the rfootnote statement to specify a footnote to the tables.



Note:  To calculate the unadjusted prevalence, use the program code above, EXCEPT DO NOT USE the stdvar and stdwgt statements.


Highlights from the output include:


Step 2: Generate Age-Adjusted Means (Optional)

Click to see optional step, "Generate Age-Adjusted Means".




close window icon Close Window