Task 1c: How to Generate Age-Adjusted Proportions or Prevalence Rates and Means Using Stata

In this module, you will generate age-adjusted prevalence rates and standard errors for high blood pressure (HBP) in persons 20 years and older in the United States by sex and race/ethnicity. An optional second example is available demonstrating how to generate age-adjusted means and standard errors for Body Mass Index (BMI)  in persons 20 years and older in the United States by sex and race/ethnicity.

To calculate age-adjusted prevalence rates, you will need to know the age standardizing proportions that you want to use, and then apply them to the populations under comparison. This is called the direct method for age standardization. Typically, Census data are used as the standard population structure.  For age standardization in NHANES, the National Center for Health Statistics (NCHS) recommends using the 2000 Census population.  A spreadsheet with the year 2000 U.S. population structure by age is attached below.  Calculate the standard age proportions by dividing the age-specific Census population (P) by the total Census population number (T). The standardizing proportions (P/T) should sum to 1 (see the table below in Step 2 for the standard age proportions used in this module.)

 

Step 1: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of  commands. The general format of this command is below:

svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(variance method)

 

To define the survey design variables for your high blood pressure analysis, use the weight variable for 4 years of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization.  Here is the svyset command for 4 years of MEC data:

svyset [w= wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

 

Step 2: Create age standard proportions

For age standardization in NHANES, NCHS recommends using the 2000 Census population. To get the correct Census age distribution, you need to know two things:  the age group of interest (e.g. all ages, ages 6 and older, adults 20 and older) and how wide the age strata are for adjustment (e.g. 5 year, 10 year or 20 year age intervals).  In general, the more tightly you want to control for age, the narrower the age strata should be.

Attachment

For your convenience, standard proportions for different NHANES population age groupings are provided in the Excel spreadsheet attached below. This file uses the 2000 Census as the standard population.  The adjustment factors were calculated for four age groupings:

  1. all ages,
  2. ages 6 years and older,
  3. ages 20 years and older using 10 year age intervals, and
  4. for the blood pressure example in this module, for ages 20 years and older using 20 year age intervals.

For other age groupings, you can combine the smaller age groups provided in order to reflect the age and subpopulation you are using in your analysis.

Standard Proportions for NHANES Population Groupings link: ageadjtwt.xls

 

Example of How to Calculate Standard Age Proportions

Here is an example of how to calculate the standard age proportions by dividing the age-specific Census population (P) by the total Census population number (T). The standardizing proportions should sum to 1.

 

Standard Age Proportions
Standard Proportions for 20-year Age Groups Based on the 2000 U.S. Census Standard Population
Age Group Age-Specific
Census Population
(in thousands)
Total Census Population
(in thousands)
  P T P/T
20-39 77,670 195,850 .396579
40-59 72,816 195,850 .371795
60+ 45,364 195,850 .231626
Total:   195,850 Sum:   1

 

As you can see each "standard age proportion", also referred to as "age adjustment weight", is simply the proportion of people in the 2000 Census - the standard population - in a specific age category.  For example, the standard age proportion for people 20-39 years old is:

Equation for the standard age proportion for people 20-39 years old

77,670 thousand people age 20-39 years over 195,850 thousand population ages 20+ equals 0.396579

Reference

Klein RJ, Schoenborn, CA. Age Adjustment using the 2000 projected U.S. population. Healthy People Statistical Notes, no. 20. Hyattsville, Maryland: National Center for Health Statistics. January 2001.

 

Step 3: Create new variables for analysis

You will need to create variables for age, race, standard weight, and high blood pressure. First, create variables to apply the age standard proportions and race/ethnicity groups to your analyses.  Next,  assign the census proportion for each corresponding age strata using the std_wgt variable. This variable is usually referred to as standard weight in statistical manuals. Finally, code the outcome variable a dichotomous variable, where the absence of of the outcome is coded as 0 and the presence of the outcome is coded as 100. Using 100 will express the proportion as a percentage (e.g., 0.23 would be represented as 23). The dichotomous variable, hbp, is already coded as 2 for the absence of outcome and 1 for the presence of outcome, so it will need to be recoded as a new variable, hpbx. Here is the code for creating the variables:

Code to generate variables
Variable Code to generate variables
Age

gen age=1 if ridageyr >=20 & ridageyr <40
replace age=2 if ridageyr >=40 & ridageyr <60
replace age=3 if ridageyr >=60 abd ridageyr <.

Race

gen race =1 if ridreth1 == 3
replace race =2 if ridreth1 == 4
replace race =3 if ridreth1 == 1
replace race =4 if ridreth1 == 2 | ridreth1 ==5

Standard Weight

gen std_wgt=.3966 if age==1
replace std_wgt=.3718 if age==2
replace std_wgt=.2316 if age==3

High Blood Pressure

gen hbpx=100 if hbp==1
replace hbpx=0 if hbp==2

 

Step 4: Generate age-adjusted proportions

The Stata command, svy: mean, is used to generate age-adjusted proportions and standard errors. Using svy:mean  is not a mistake - a proportion is the mean of a dichotomous variable. 

The general form of the command is just like the mean command from descriptive statistics but uses the stdize and stdweight options.

svy, subpop(condition):  mean depvar, stdize(agevar) stdweight(ageweightvar)

 

Here is the STATA command and output for the age-adjusted prevalence of high blood pressure and standard errors for men and women age 20 years and older: 

svy, subpop(if ridageyr >=20 & ridageyr <.): mean hbpx, stdize(age) stdweight(std_wgt) over(riagendr)

Stata output of mean high blood pressure by gender

And here is the STATA command and output for the age-adjusted prevalence of high blood pressure and standard errors for race (non-Hispanic white, non-Hispanic black, Mexican American and other) age 20 years and older. 

svy, subpop(if ridageyr >=20 & ridageyr <.): mean hbpx, stdize(age) stdweight(std_wgt) over(race)

Stata output of mean high blood pressure by race/ethnicity

Info iconIMPORTANT NOTE

To calculate the unadjusted prevalence, use the program code above, EXCEPT DO NOT USE the stdize and stdweight options.

 

Step 5: Compare results of crude and age-adjusted estimates

To understand how much age standardization matters, it is helpful to compare the estimates from the crude and age adjusted analyses. The following table summarizes the results:

Crude proportion with hypertension
Variable Mean Age % with hypertension Standard error

Gender

     

Male

45 27% 1.22

Female

47 31% 1.07

Race

     
Non-Hispanic white 48 30% 1.15
Non-Hispanic black 43 37% 1.51
Mexican American 38 17% 1.21
Other 43 28% 2.30
Age-Adjusted proportion with hypertension
Variable Mean Age % with hypertension Standard error
Gender      
Male 45 28% 1.21
Female 47 30% 0.71
Race      
Non-Hispanic white 48 28% 0.97
Non-Hispanic black 43 41% 1.05
Mexican American 38 26% 0.98
Other 43 31% 2.25

 

Highlights from the output include:

 

Optional Step: Generate age-adjusted means

Click to see optional step," Generate Age-Adjusted Means".

close window icon Close Window