## Task 3c: How to Perform Chi-Square Test Using Stata

In this task, you will use the chi-square test in Stata to determine whether gender and blood pressure cuff size are independent of each other. The chi-square statistics is requested from the Stata command svy:tabulate.

WARNING

There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.

### Step 1: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of  commands. The general format of this command is below:

svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)

To define the survey design variables for your blood pressure cuff size (bpacsz) analysis, use the weight variable for four-yours of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization.  Here is the svyset command for four years of MEC data:

svyset [w= wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

### Step 2: Regroup blood pressure cuff size variable

In this example, a new variable (cuff_size) is created to regroup blood pressure cuff size (bpacsz) from five categories to four categories. This collapses the infant (1) and child (2) groups. Use the gen command to create a new variable.

gen cuff_size=1 if bpacsz==1 | bpacsz==2
replace cuff_size=2 if bpacsz==3
replace cuff_size=3 if bpacsz==4
replace cuff_size=4 if bpacsz==5

### Step 3: Generate chi-square statistics using svy:tabulate

Now, that the svyset has been defined you can use the Stata command, svy: tabulate, to produce two-way tabulations with tests of independence.  Some of the options for the tab command include:

• column and row to display column and row percentages (if you do not specify this you will get cell proportions);
• obs lists the number of observations in each cell; count lists the weighted n in each cell and by adding format(%11.0fc) you will display the counts with commas rather than scientific notation;
• ci gives the confidence interval around each estimate, but can only be used with either row or column, not both; and
• the Pearson (Rao-Scott correction F-statistic) chi-square (pearson), null-based (null), and Wald (wald) test statistics.

The general command for generating two-way tabulations is below.

svy:tabulate varname, subpop(if condition) options

Use the svy : tabulate command  to produce two-way tabulations for gender (riagendr) and blood pressure cuff size (cuff_size) with tests of independence for people age 20 years and older. (See Section 5.4 of Korn and Graubard Analysis of Data from Health Surveys, pp 207-211).  Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0. The options specified for this example, use the column, rows, obs, percent, pearson, null and wald test statistic options.

svy:tab riagendr cuff_size, subpop (if ridageyr >=20 & ridageyr<.) column row obs percent pearson null wald

### Step 4: Review output

Here is a table summarizing the output:

Variable Men
age 20 and older
(n=4312)
Women
age 20 and older
(n=4782)
p value
Cuff size
(1) Infant 0% 0% <0.0001
(2) Child 1.5% 5%