Task 2: How to Generate Percentiles in SUDAAN

In this example, you will use SAS-callable SUDAAN to generate percentiles and standard errors for total cholesterol levels of persons 20 years and older by sex and age group.

Step 1: Sort data

To calculate the percentiles and standard errors, you will use SAS-callable SUDAAN because this software takes into account the complex survey design of NHANES data when determining variance estimates. The data from analysis_Data must be sorted by strata first and then PSU (unless the data have already been sorted by PSU within strata). The SAS proc sort statement must precede the SUDAAN statements.

WARNING

The design variables, sdmvstra and sdmvpsu, are provided in the demographic data files and are used to calculate variance estimates. Before you call SUDAAN into SAS, the data must first be sorted by these variables.

Step 2:  Use proc descript to generate percentiles in SUDAAN

The SUDAAN procedure proc descript is used to generate percentiles and standard errors. These estimates are requested on the print statement along with the sample size (nsum). The general program for obtaining weighted percentiles and standard errors is below.

IMPORTANT NOTE

These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

Generate Percentiles in SUDAAN
Statements Explanation

PROC SORT DATA =analysis_data;
BY sdmvstra sdmvpsu ;

RUN ;

Use the proc sort procedure to sort the dataset by strata (sdmvstra) and PSU (sdmvpsu). The data statement refers to the dataset, analysis_Data.

proc descript< data=analysis_data design=wr;

Use proc descript procedure to generate means and specify the sample design using the design option WR (with replacement).

subpopn ridageyr >= 20 ;

Use the subpopn statement to select the sample persons 20 years and older (ridageyr >=20) because only those individuals are of interest in this example. Please note that for accurate estimates, it is preferable to use subpopn in SUDAAN to select a subpopulation for analysis, rather than select the study population in the SAS program while preparing the data file.

NEST  sdmvstra sdmvpsu;

Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects.

weight wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

subgroup  riagendr age ;

Use the subgroup statement to list the categorical variables for which statistics are requested. This example uses gender (riagendr) and age (age). These variables will also appear in the table statement.

levels      2   3   ;

Use the levels statement to define the number of categories in each of the subgroup variables. The level must be an integer greater than 0. This example uses two genders and three age groups.

var lbxtc;

Use the var statement to name the variable(s) to be analyzed. In this example, the total cholesterol variables (lbxtc) is used.

percentile 5 25 50 75 95 ;

Use the percentile statement to request select percentiles.

table   riagendr * age;

Use the table statement to specify cross-tabulations for which estimates are requested. If a table statement is not present, a one—dimensional distribution is generated for each variable in the subgroup statement. In this example, the estimates are for gender (riagendr) by age (age).

PRINT

nsum= "Sample Size"

qtile= "Quantile"

style=nchs

nsumfmt= F7.0

qtilefmt= F9.2

;

Use the print statement to assign names, format the statistics desired, and view the output. If the statement print is used alone, all of the default statistics are printed with default labels and formats.

In this example, the sample size (nsum) and quantile (qtile) are requested.

Note: For a complete list of statistics that can be requested on the print statement see SUDAAN Users Manual.

Use the style option equal to NCHS to produce output which parallels a table style used at NCHS.

rtitle "Percentiles of total cholesterol  by sex and age: NHANES 1999-2002" ;

Use the rtitle statement to assign a heading for each page of output.

Step 3: Review output

The output will list the sample sizes, percentiles and their standard errors.

• Reviewing the output of the program, note that 50% of the sampled population has a total cholesterol measurement less than the 50th percentile and 50% of the sampled population has a total cholesterol measurement of greater than the 50th percentile.