Task 3c: How to Perform Chi-Square Test Using Stata

In this task, you will use the chi-square test in Stata to determine whether gender and blood pressure cuff size are independent of each other. The chi-square statistics is requested from the Stata command svy:tabulate.


warning iconWARNING

There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.

Step 1: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of  commands. The general format of this command is below:

svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)


To define the survey design variables for your blood pressure cuff size (bpacsz) analysis, use the weight variable for four-yours of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization.  Here is the svyset command for four years of MEC data:

svyset [w= wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

Step 2: Regroup blood pressure cuff size variable

In this example, a new variable (cuff_size) is created to regroup blood pressure cuff size (bpacsz) from five categories to four categories. This collapses the infant (1) and child (2) groups. Use the gen command to create a new variable.

gen cuff_size=1 if bpacsz==1 | bpacsz==2
replace cuff_size=2 if bpacsz==3
replace cuff_size=3 if bpacsz==4
replace cuff_size=4 if bpacsz==5


Step 3: Generate chi-square statistics using svy:tabulate

Now, that the svyset has been defined you can use the Stata command, svy: tabulate, to produce two-way tabulations with tests of independence.  Some of the options for the tab command include:

The general command for generating two-way tabulations is below.

svy:tabulate varname, subpop(if condition) options


Use the svy : tabulate command  to produce two-way tabulations for gender (riagendr) and blood pressure cuff size (cuff_size) with tests of independence for people age 20 years and older. (See Section 5.4 of Korn and Graubard Analysis of Data from Health Surveys, pp 207-211).  Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0. The options specified for this example, use the column, rows, obs, percent, pearson, null and wald test statistic options.

svy:tab riagendr cuff_size, subpop (if ridageyr >=20 & ridageyr<.) column row obs percent pearson null wald


Output of svy:tabulate command with column, row, obs, percent, pearson, null and wald options

Output of svy:tabulate command with column, row, obs, percent, pearson, null and wald options

Step 4: Review output

Here is a table summarizing the output: 

Variable Men
age 20 and older
age 20 and older
p value
Cuff size      
(1) Infant 0% 0% <0.0001
(2) Child 1.5% 5%  
3 Adult 29% 44%  
4 Large 58% 41%  
5 Thigh



Men have a larger cuff size than women for example, 70% of men had cuff size of 4 or 5 compared to 51% of women.  Cuff size varies significantly according to gender (p<0.0001).  NOTE:  The grayed cells have too few observations to create stable estimates and should probably not be reported.



close window icon Close Window