## Task 4c: How to Generate Proportions using Stata

Stata software can be used to calculate proportions and standard errors for NHANES data because the software takes into account the complex survey design of NHANES data when determining variance estimates. If the standard errors are not needed, you simply could use a standard Stata command, i.e., svy: proportion with the weight statement. In this example, you will be looking at the proportion of examined persons 20 years and older with measured high blood pressure, by sex, age, and race-ethnicity.

WARNING

There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.

### Step 1: Determine variables of interest

According to the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure, a person with hypertension is defined as either having elevated blood pressure (systolic pressure of at least 140 mmHg or diastolic of at least 90 mmHg) or taking antihypertensive medication.

You can code your variables in this example in two possible ways. Persons with high blood pressure, as defined above, are assigned a value of 1. All other sample persons are assigned a value of 2. The weighted percentage of sample persons with a value equal to 1 is an estimate of the prevalence of high blood pressure in the U.S.

IMPORTANT NOTE

An alternate method of coding the variables is to assign persons with high blood pressure, as defined above, a value of 100, and persons without high blood pressure a value of 0. The weighted mean of sample persons with a value equal to 100 (which will be expressed as a percent) is an estimate of the prevalence of high blood pressure in the U.S. This method can be used with SAS Survey Procedures.

### Step 2: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of  commands. The general format of this command is below:

svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)

To define the survey design variables for your cholesterol analysis, use the weight variable for four-yours of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization.  Here is the svyset command for fur years of MEC data:

svyset [w= wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

### Step 3: Use svy:proportion to generate proportions

In this example, you will use svy: proportion in Stata to generate proportions. You created a categorical variable, hbp, to indicate whether or not a person had high blood pressure. That categorical variable will be identified in the procedure and the weighted percent (prevalence) of sample persons with the value hbp=1 (high blood pressure) will be estimated along with the standard error.

The general format of the svy:proportion command is:

svy, subpop(if condition) vce(linearized): proportion varname

To generate the proportion of persons aged 20 years and older (ridageyr >=20 & ridageyr <.) with high blood pressure (hbp), the command would  be:

svy, subpop(if ridageyr >=20 & ridageyr <. ) vce(linearized): prop hbp

### Step 4:  Use over option of svy:proportion command to generate means and standard errors for different subgroups in Stata

The general format of the svy:proportion command with the over option is:

svy, subpop(if condition) vce(linearized): proportion varname, over(var1)

Here is the command to generate the proportion of people aged 20 years and older (ridageyr >=20 & ridageyr <.) by gender (riagendr) with hypertension (hbp):

svy, subpop( if ridageyr >=20 & ridageyr <. ) vce(linearized): proportion varname, over(rigendr)

#### Output of svy:prop by Gender

Here is the command to generate the proportion of people aged 20 years and older (ridageyr >=20 & ridageyr <.) by gender (riagendr), race-ethnicity (race), and age (ridageyr) with hypertension (hbp):

svy, subpop( if ridageyr >=20 & ridageyr <. ) vce(linearized): proportion varname, over(rigendr race ridageyr)

#### Output of svy:prop by Gender, Age, and Race-Ethnicity

Highlights from the output include:

• Reviewing the output, you will see proportions for all persons, both genders, the four race categories, and three age groups, and finally the 24 gender-race-age groups.
• The percents in the output are the proportions of sample persons with high blood pressure.
• The " Other" race/ethnicity category is only included to complete the totals. It is not reported.
• In the groups for females, notice that the proportion of black females with high blood pressure is twice that of other races in the 20-39 years age group, and nearly twice that of  other races in the 40-59 years age group.
• Given the low proportion of high blood pressure in the 20-39 years age group, you will also want to consider using an arcsine of Clopper-Pearson transformation for standard error estimation.