## Task 4a: How to Generate Proportions Using SUDAAN

In this example, you will look at the proportion of examined persons 20 years and older with measured high blood pressure by sex, age, and race-ethnicity.

### Step 1: Determine variables of interest

According to the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure, a person with hypertension is defined as either having elevated blood pressure (systolic pressure of at least 140 mmHg or diastolic of at least 90 mmHg) or taking antihypertensive medication. You will need to define a categorical variable (hbp) indicating persons with high blood pressure (1= high blood pressure; 2= no high blood pressure).

### Step 2: Sort data

To calculate the proportions and standard errors, use SAS-callable SUDAAN because the software takes into account the complex survey design of NHANES data when determining variance estimates. If the standard errors are not needed, you simply could use a SAS procedure, i.e., proc freq with the weight statement. The data from analysis_Data must be sorted by strata first and then PSU (unless the data have already been sorted by PSU within strata). The SAS proc sort statement must precede the SUDAAN statements.

WARNING

The design variables sdmvstra and sdmvpsu are provided in the demographic data files and are used to calculate variance estimates. Before you call SUDAAN into SAS, the data must be sorted by these variables.

### Step 3: Use proc descript to generate proportions

In this example, you will use proc descript in SUDAAN to generate proportions. Previously, you created a categorical variable, hbp, to indicate whether or not a person had high blood pressure. That categorical variable will be identified in the procedure and the weighted percent (prevalence) of sample persons with the value hbp=1 (high blood pressure) will be estimated along with the standard error.

You can code your variables in this example in two possible ways. Using catlevel option in SUDAAN, persons with high blood pressure, as defined above, are assigned a value of 1. All other sample persons are assigned a value of 2. The weighted percentage of sample persons with a value equal to 1 is an estimate of the prevalence of high blood pressure in the U.S. An alternate method of coding the variables is to assign persons with high blood pressure, as defined above, a value of 100, and persons without high blood pressure a value of 0. The weighted mean of sample persons with a value equal to 100 (which will be expressed as a percent) is an estimate of the prevalence of high blood pressure in the U.S.  To see this method in SAS Survey Procedures, but without the catlevel option, see Task 4b: How to Generate Proportions using SAS Survey Procedure.

The SUDAAN procedure, proc descript, is used to generate percents and standard errors.  You request those estimates on the print statement along with the sample size (nsum). The general program for obtaining weighted percents and standard errors is shown below.

IMPORTANT NOTE

These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

Generate Proportions in SUDAAN
Statements Explanation

PROC SORT DATA =analysis_data;

BY sdmvstra sdmvpsu ;

RUN ;

Use the proc sort procedure to sort the dataset by strata (sdmvstra) and PSU (sdmvpsu). The data statement refers to the dataset, analysis_data.

PROC descript data= analysis_data design=wr ;

Use the proc descript procedure to generate means and specify the sample design using the design option WR (with replacement).

subpopn ridageyr >=20 ;

Use the subpopn statement to select sample persons 20 years and older (ridageyr >=20) because only those individuals are of interest in this example. Please note that for accurate estimates, it is preferable to use subpopn in SUDAAN to select a subpopulation for analysis, rather than select the study population in the SAS program while preparing the data file.

NEST  sdmvstra sdmvpsu;

Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects.

weight wtmec4yr;

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 4 years of data (wtmec4yr) is used.

subgroup   riagendr age race;

Use the subgroup statement to list the categorical variables for which statistics are requested. This example uses gender (riagendr) , age (age), and race/ethnicity (race). These variables will also appear in the table statement.

levels 2   3   4   ;

Use the levels statement to define the number of categories in each of the subgroup variables. The level must be an integer greater than 0. This example uses two genders, three age groups, and four race/ethnicity categories.

var hbp;

Use the var statement to name the variable(s) to be analyzed. In this example, the high blood pressure variable (hbp) is used.

catlevel 1 ;

Use the catlevel statement to indicate that the variable(s) on the var statement are categorical and to select the level of each variable to be analyzed. This example indicates the variable hbp is categorical and that hbp=1, i.e., persons who have high blood pressure.

IMPORTANT NOTE

Note that the catlevel statement may be omitted if you code the variable as 100 equals has HBP and 0 equals does not have HBP.

table riagendr * age * race ;

Use the table statement to specify cross-tabulations that estimates are requested. The example uses estimates are gender (riagendr) by age (age) and by race/ethnicity (race).

print nsum= "Sample Size"

percent="Percent"

sepercent="SE" /

style=NCHS

nsumfmt=f8.0

percentfmt=f8.4

sepercentfmt=f8.4    ;

Use the print statement to assign names, format the statistics desired, and view the output. If the statement print is used alone, all of the default statistics are printed with default labels and formats.

In this example, sample size (nsum), percent (percent), and standard error of the percent (sepercent) are requested.  The percent represents the proportion of persons with hbp=1 or with high blood pressure.

Note: For a complete list of statistics that can be requested on the print statement see SUDAAN Users Manual.

Use the style option equal to NCHS to produce output which parallels a table style used at NCHS.

rtitle "Prevalence of SPs with measured high blood pressure : NHANES 1999-2002" ;

run ;

Use the rtitle statement to assign a heading for each page of output.

### Step 4: Review Output

The percents in the output are the proportions of sample persons with high blood pressure.

• Reviewing the output, you will see tables for both genders, males only, and females only sorted by age group followed by race/ethnicity.
• The " Other" race/ethnicity category is only included to complete the totals. It is not reported.
• In the table for females, notice that the proportion of black females with high blood pressure is twice that of other races in the 20-39 years age group, and nearly twice that of  other races in the 40-59 years age group.
• Given the low proportion of high blood pressure in the 20-39 years age group, you will also want to consider using an arcsine of Clopper-Pearson transformation for standard error estimation.