Task 1b: How to Generate Age-Adjusted Prevalence Rates and Means Using SAS 9.2 Survey Procedures

In this task, you will generate age-adjusted prevalence rates and standard errors for high blood pressure (HBP) by sex and race in persons 20 years and older. An optional second example is available demonstrating how to generate age-adjusted means and standard errors for Body Mass Index (BMI) by sex and race/ethnicity for persons 20 years and older.

To calculate age-adjusted prevalence rates, you will need to know the age standardizing proportions that you want to use, and then apply them to the populations under comparison. This is called the direct method for age standardization. Typically, Census data are used as the standard population structure.  For age standardization in NHANES, NCHS recommends using the 2000 Census population.  A spreadsheet with the year 2000 U.S. population structure by age is attached below.  The standard age proportions are calculated by dividing the age-specific Census population (P) by the total Census population number (T). The standardizing proportions (P/T) should sum to 1 (please see the table below for the standard age proportions used in this module.)

Attachment

For your convenience, standard proportions for different NHANES population age groupings are provided in the Excel spreadsheet attached below. This file uses the 2000 Census as the standard population.  The adjustment factors were calculated for four age groupings:

1. all ages,
2. ages 6 years and older,
3. ages 20 years and older using 10 year age intervals, and
4. for the blood pressure example in this module, for ages 20 years and older using 20 year age intervals.

For other age groupings, you can combine the smaller age groups provided in order to reflect the age and subpopulation you are using in your analysis.

Example of How to Calculate Standard Age Proportions

Here is an example of how to calculate the standard age proportions by dividing the age-specific Census population (P) by the total Census population number (T). The standardizing proportions should sum to 1.

Standard Proportions for 20-year Age Groups Based on the 2000 U.S. Census Standard Population

Age Group Age-Specific
Census Population
(in thousands)
Total Census Population
(in thousands)
Standard Age Proportions
P T P/T
20-39 77,670 195,850 .396579
40-59 72,816 195,850 .371795
60+ 45,364 195,850 .231626
Total:   195,850 Sum:   1

As you can see each "standard age proportion", also referred to as “age adjustment weight”, is simply the proportion of people in the 2000 Census - the standard population - in a specific age category.  For example, the standard age proportion for people 20-39 years old is:

Reference

Klein RJ, Schoenborn, CA. Age Adjustment using the 2000 projected U.S. population. Healthy People Statistical Notes, no. 20. Hyattsville, Maryland: National Center for Health Statistics. January 2001.

Step 1: Recode High Blood Pressure Variable

You will recode the discrete variable, hbp, as (0, 100), for absence (0) or presence (100) of the health condition of interest, to use in the SAS Surveyreg procedure.

if hbp=1 then hbpx=100;

if hbp=2 then hbpx=0;

run;

Step 2: Generate Age-Adjusted Prevalence Rates

The SAS Surveyreg procedure is used to generate age-adjusted percentages (prevalence rates) and standard errors. The SAS Survey program used to obtain weighted age-adjusted prevalence rates and standard errors for high blood pressure by race, among persons 20 years and older follows here.

 These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

SAS Survey Procedure for Generating Age-Adjusted Prevalence Rates

Statements Explanation
PROC SURVEYREG DATA=analysis_data nomcar;

Use the SAS Survey procedure, proc surveyreg, to calculate significance. Use the nomcar option to read all observations.

STRATA sdmvstra;

Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification.

CLUSTER sdmvpsu;

Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering.

CLASS race age;

Use the class statement to specify the discrete variables used to select the subpopulations of interest (i.e., race [race] and age [age]).

WEIGHT wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

DOMAIN sel;

Use the domain statement to specify the subpopulations of interest.

 When using proc surveyreg, use a domain statement to select the population of interest. Do not use a where or by-group statement to analyze subpopulations with the SAS Survey Procedures.
MODEL hbpx=race age race*age /noint solution VADJUST=none;

Use a model statement with the noint option to  produce HBP means for the 12 possible race and age combinations (note that race has four groups and age has three groups so multiplying these together equal a total of 12 groups). The solution option produces a printed version of the age-adjusted prevalences. The vadjust option specifies whether or not to use variance adjustment.

ESTIMATE 'NH White' race 1 0 0 0 age .3966 .3718 .2316 race*age .3966 .3718 .2316 0 0 0 0 0 0 0 0 0;

Use the estimate statement to produce the age-adjusted prevalence of HBP for non-Hispanic whites. Please refer to the estimate statement in the SAS Manual for more information about using vectors. The vector (vectors are location indicators) 1 0 0 0 points to the non-Hispanic whites; the vectors .3966, .3718 and .2316 correspond to the proportion of 20-39 , 40-59, and 60+ years adults in the U.S. population (Klein and Schoenborn, 2001).

ESTIMATE 'NH Black' race 0 1 0 0 age .3966 .3718 .2316 race*age 0 0 0 .3966 .3718 .2316 0 0 0 0 0 0;

Use the estimate statement to  produce the age-adjusted prevalence of HBP for non-Hispanic blacks. The vector 0 1 0 0 points to the non-Hispanic blacks; the vectors .3966, .3718 and .2316 correspond to the proportion of 20-39 , 40-59, and 60+ years adults in the U.S. population (Klein and Schoenborn. 2001).

ESTIMATE 'Mex Amer' race 0 0 1 0 age .3966 .3718 .2316 race*age 0 0 0 0 0 0 .3966 .3718 .2316 0 0 0;

Use the estimate statement to produce the age-adjusted prevalence of HBP for Mexican-Americans. The vector 0 0 1 0 points to the Mexican-Americans; the vectors .3966, .3718 and .2316 correspond to the proportion of 20-39 , 40-59, and 60+ years adults in the U.S. population (Klein and Schoenborn, 2001).

TITLE 'Age-standardized prevalence of persons 20 years and older with high blood pressure: NHANES 1999-2002';

Use the title statement to label the output.

var estimate label estimate stderr;

title 'Age-standardized prevalence of persons 20 years and older with high blood pressure: NHANES 1999-2002';

run;

Use the proc print procedure to print the estimate and standard error.

 Note:  Program code to produce age-adjusted estimates by race-ethnicity is provided above. To see program code to produce age-adjusted estimates by race-ethnicity and gender and for gender only, please go to the Sample Code and Datasets page to download the programs.

The code for estimating the crude (unadjusted) prevalence for HBP by race/ethnicity and gender follows:

 These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

SAS Survey Procedure for Generating Unadjusted Prevalence Rates

Statements Explanation
proc surveymeans data=analysis_data mean nobs stderr;

Use the proc surveymeans procedure to obtain number of observations, mean, standard error and confidence intervals.

strata sdmvstra;

Use the stratum statement to define the strata variable (sdmvstra).

cluster sdmvpsu;

Use the cluster statement to define the PSU variable (sdmvpsu).

class riagendr race;

Use the class statement to specify the discrete variables used to select the subpopulations of interest (i.e., gender [riagendr] and race [race]).

weight wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response. In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

var hbpx;

Use the var statement to specify which  variable(s) will be analyzed. In this example, the high blood pressure variable (hbpx) is used.

domain sel sel*riagendr sel*race sel*riagendr*race;

Use the domain statement to specify the subpopulations of interest.

run;

Use the ods statement to output the SAS dataset of estimates from the subdomains listed on the domain statement.  This set of commands will output four datasets for each domain specified in the domain statement above (unadj for sel  unadj1 for sel*riagendr, unadj2 for sel*race, and undadj3 for sel*riagendr*race).

data stats;

if sel=1;

Use the data statement to name the temporary SAS dataset (stats) append the four datasets, created in the previous step, if age is greater than or equal to 20 (sel).

proc print;

var race riagendr n mean stderr;

run;

Use the print statement to print the number of observations, the mean, and standard error of the mean in a printer-friendly format.

Highlights from the output include:

• The output lists the adjusted percentages (prevalence rates) and their standard errors.
• Hypertension prevalence in Mexican Americans changed from a crude prevalence of 17% to 26% in the age-standardized estimate. The Mexican Americans are younger with a mean age of 38 years, and because hypertension increases with age, the age-adjusted estimate among Mexican Americans is higher than the crude estimate.
• Similarly, the non-Hispanic whites are somewhat older, so their age-adjusted prevalence is lower than the crude estimate.
• According to the unadjusted estimates, the difference between HBP prevalence for Mexican-American and non-Hispanic white groups is approximately 12%.
• However, the age-adjusted estimates show about a 2% difference between HBP prevalence for Mexican-American and Non-Hispanic white groups.
• Non-Hispanic blacks have a higher age-adjusted prevalence of HBP (41%) than other race/ethnicity groups.

Step 3: Generate Age-Adjusted Means (Optional)

The SAS Surveyreg procedure is used to generate age-adjusted means and standard errors. The SAS Survey program used to obtain weighted adjusted means and standard errors for BMI, by race among persons 20 years and older follows here.

 These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

SAS Survey Procedure for Generating Adjusted Means

Statements Explanation
PROC SURVEYREG DATA=analysis_data nomcar;

Use the SAS Survey procedure, proc surveyreg, to calculate significance. Use the nomcar option to read all observations.

STRATA sdmvstra;

Use the strata statement to specify the strata (sdmvstra) and account for design effects of stratification.

CLUSTER sdmvpsu;

Use the cluster statement to specify PSU (sdmvpsu) to account for design effects of clustering.

CLASS race age;

Use the class statement to specify the discrete variables used to select the subpopulations of interest (i.e., race [race] and age [age]).

WEIGHT wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

DOMAIN sel;

Use the domain statement to specify the subpopulations of interest.

MODEL bmxbmi=race age race*age /noint solution vadjust=none;

Use a model statement with the noint option to produce BMI means for the 12 possible race and age combinations (note that race has 4 groups and age has 3 groups so multiplying these together equal a total of 12 groups). The solution option produces a printed version of the age-adjusted means. The vadjust option specifies whether or not to use variance adjustment.

ESTIMATE 'NH White' race 1 0 0 0 age .3966 .3718 .2316 race*age .3966 .3718 .2316 0 0 0 0 0 0 0 0 0;

Use the estimate statement to produce the age-adjusted mean BMI for non-Hispanic whites. Please refer to the estimate statement in the SAS Manual for more information about using vectors.  The vector (vectors are location indicators) 1 0 0 0 points to the non-Hispanic whites; the vectors .3966, .3718 and .2316 correspond to the proportion of 20-39 , 40-59, and 60+ years adults in the U.S. population (Klein and Schoenborn, 2001).

ESTIMATE 'NH Black' race 0 1 0 0 age .3966 .3718 .2316 race*age 0 0 0 .3966 .3718 .2316 0 0 0 0 0 0;

Use the estimate statement to produce the age-adjusted mean BMI for non-Hispanic blacks. The vector 0 1 0 0 points to the non-Hispanic blacks; the vectors .3966, .3718 and .2316 correspond to the proportion of 20-39 , 40-59, and 60+ years adults in the U.S. population (Klein and Schoenborn, 2001).

ESTIMATE 'Mex Amer' race 0 0 1 0 age .3966 .3718 .2316 race*age 0 0 0 0 0 0 .3966 .3718 .2316 0 0 0;

Use the estimate statement to produce the age-adjusted mean BMI for Mexican-Americans.  The vector 0 0 1 0 points to the Mexican-Americans; the vectors .3966, .3718 and .2316 correspond to the proportion of 20-39 , 40-59, and 60+ years adults in the U.S. population (Klein and Schoenborn, 2001).

TITLE 'Age-adjusted means & standard errors of body mass index: NHANES 1999-2002';

Use the title statement to label the output.

var estimatelabel estimate stderr;

title 'Age-adjusted means & standard errors of body mass index: NHANES 1999-2002';

run;

Use the proc print procedure to print the estimate and standard error.

 Note:  Program code to produce age-adjusted estimates by race-ethnicity is provided above. To see program code to produce age-adjusted estimates by race-ethnicity and gender and for gender only, please go to the Sample Code and Datasets page to download the programs.

The code for estimating the crude (unadjusted) prevalence for Body Mass Index by race/ethnicity and gender follows:

SAS Survey Procedure for Generating Unadjusted Means

Statements Explanations

proc surveymeans data=ANALYSIS_DATA nobs mean stderr;

Use the proc surveymeans procedure to obtain number of observations, mean, standard error and confidence intervals.

strata sdmvstra;

Use the stratum statement to define the strata variable (sdmvstra).

cluster sdmvpsu;

Use the cluster statement to define the PSU variable (sdmvpsu).

class riagendr race;

Use the class statement to specify the discrete variables used to select the subpopulations of interest (i.e., gender [riagendr] and race [race]).

var bmxbmi;

Use the var statement to specify which  variable(s) will be analyzed. In this example, the Body Mass Index variable (bmxbmi) is used.

weight wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response. In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

domain sel sel*riagendr sel*race sel*riagendr*race;

Use the domain statement to specify the subpopulations of interest.

run;

Use the ods statement to output the SAS dataset of estimates from the subdomains listed on the domain statement.  This set of commands will output four datasets for each domain specified in the domain statement above (unadj for sel,  unadj1 for sel*riagendr, unadj2 for sel*race, and undadj3 for sel*riagendr*race).

data stats;

if sel=1;

Use the data statement to name the temporary SAS dataset (stats) append the four datasets, created in the previous step, if age is greater than or equal to 20 (sel).

proc print;

var race riagendr n mean stderr;

title "Mean Body Mass Index: NHANES 1999-2002";

run;

Use the print statement to print the number of observations, the mean, and standard error of the mean in a printer-friendly format.

Highlights from the output include:

• The output lists the sample sizes, adjusted means, and their standard errors.
• After adjusting for age, there appears to be no significant difference in BMI by race/ethnicity.
• The unadjusted and adjusted means for BMI by race/ethnicity do not appear to be different.