Task 1c: How to Set Up a t-test in NHANES Using Stata

In this task, you will use Stata commands to calculate a t-statistic and assess whether the mean systolic blood pressures (SBP) in males and females age 20 years and older are statistically different.

 

Step 1: Set Up Stata to Produce Means

Follow the steps in the summary table below to produce the mean SBP and the t-test to test whether the mean SBP between males and females obtained is statistically significant different using the Stata command svy:mean.

 

warning iconWARNING

There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.

 

Step 2: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of  commands. The general format of this command is below:

svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)

 

To define the survey design variables for your SBP analysis, use the weight variable for 4 years of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization.  Here is the svyset command for four years of MEC data:

svyset [w= wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

 

Step 3:  Use svy:mean to generate means and standard errors in Stata

Now, that the svyset has been defined you can use the Stata command, svy: mean, to generate means and standard errors.  The general command for obtaining weighted means and standard errors of a subpopulation is below.

svy: mean varname, subpop(if condition)

 

Use the svy : mean command  with the systolic blood pressure variable (bpxsar) to estimate the mean systolic blood pressure for people age 20 years and older. Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0.

svy: mean bpxsar, subpop(if ridageyr>=20 & ridageyr<.)

 

Output of svy:mean

Output of svy:mean

 

Step 4:  Use over option of svy:mean command to generate means and standard errors for different subgroups in Stata

You can also add the over() option to the svy:mean command to generate the means for different subgroups.  When you do this, you can type a second command, estat size, to have the output display the subgroup observation numbers.  Here is the general format of these commands for this example:

svy: mean varname, subpop(if condition) over(var1 var2)

estat size

 

Use the svy : mean command  with the systolic blood pressure variable (bpxsar) to estimate the mean systolic blood pressure for people age 20 years and older. Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0. Use the over option to get stratified results. This example produces estimates by gender. Use the estate size post estimation command to display the number of subpopulation observations and weighted numbers.

svy: mean bpxsar, subpop(if ridageyr>=20 & ridageyr<.) over(riagendr)

estat size, obs size

 

Output of svy:mean with over option

Output of svy:mean with over option

 

Step 5a: Test the hypothesis using the lincom post estimation command

If you have already done some estimations, then you can use the lincom command to test the hypothesis that the difference between the mean for the subpopulations equal 0. Use square brackets around the variable you are estimating. After the variables in square brackets, put the stratifier that you want to test (e.g. the variable in the over option). If you used labels for the variable, you can use labels instead of the coded values. Here is the general format of these commands for this example:

lincom [varname]stratval1 - [varname]stratval2

 

Because you have done some prior estimation, you can use the lincom  post estimation command to test the hypothesis that the difference between mean SBP (bpxsar) for males and females equal 0.  This example uses labeled values (male, female) instead of the coded values (1,2) for the gender variable (riagendr).

lincom [bpxsar]male - [bpxsar]female

 

Output of lincom post estimation command

Output of lincom post estimation command

Step 5b: Test the hypothesis using svy:reg command

The svy:reg command could also be used to calculate the t-statistic. The difference between using svy:reg and lincom is that svy:reg can be used without prior estimation. The xi prefix is used before the command to denote a categorical variable and the i prefix before categorical variables. Here is the general format of these commands for this example:

xi: svy, subpop(if condition): reg dependentvar i.varname

 

Use the svy:reg command with the xi prefix to calculate the t-statistic and assess whether the mean SBP (bpxsar) for males and females age 20 years and older are statistically different. The i prefix denotes the categorical variable, which in this example is riagendr. Use the char function choose the reference group for the categorical variable.

char riagendr[omit]2

xi:svy, subpop(if ridageyr.=20 & ridageyr<.):reg bpxsar i.riagendr,

 

Output of svy:reg command

Output of svy:reg command

Step 6: Review Stata means and t-test output

Here a table summarizing the results of the previous analyses:

Summary of Results
Variable Subpopulation analyzed Number of
respondents
with data
Mean p value

Systolic blood
pressure (bpxsar)

Adults age 20 and older

9,056

123

n/a

Men age 20 and older

4,301

124

0.0132
(men vs women)

Women age 20 and older

4755

122

 

According to the stratified analysis, men's mean blood pressure is 2 points higher than women's.  This difference is statistically significant (i.e. a difference this big or bigger would happen just by chance (in a sample of this size) only 1.3% of the time). 9,056 respondents had information on systolic blood pressure (SBP).

 

close window icon Close Window