Task 2: Key Concepts about Examining the Relationship Between Supplement Use and a Categorical Outcome Using a Chi-Square Test

The chi-square test is used to test the independence of two variables cross classified in a two-way table. For example, suppose we wished to test the hypothesis that calcium supplement use is independent of osteoporosis treatment status and that we have the following observed frequencies obtained as a result of the cross-classification of osteoporosis and supplement use for women.

Osteoporosis Treatment Status and Supplement Use


Osteoporosis Treatment Status - Yes 

 Osteoporosis Treatment Status - No


Supplement Use - Yes




Supplement Use - No








In a simple random sample setting (unweighted data), the expected cell frequencies under the null hypothesis that osteoporosis treatment status and calcium supplement use are independent could be obtained by multiplying the marginal total for the ith row by the proportion of individuals in the jth column.

For example, the expected value of supplement users who received treatment for osteoporosis would be 721*(202/1187)=123;  the expected value of supplement users who did not receive treatment for osteoporosis 721*(985/1187)=598.

Thus, if Oij   = the observed frequency of the ith row and jth column, where i=1,2, … i and j=1,2, … j and Eij   = the expected frequency of the ith row and jth column. Then the formula to test the null hypothesis of independence, using the chi-square statistic, would be  

Equation 1. Equation to Test the Null Hypothesis

Equation: The chi-square statistic is equal to the difference between observed minus expected squared divided by expected, and summed over the rows and columns.

This statistic has degrees of freedom equal to the number of rows minus 1, multiplied by the number of columns minus 1.

In a complex sample setting, you would use a statistic similar to equation (1) above, modified to account for survey design with degrees of freedom equal to the number of PSUs minus the number of strata containing observations. This statistic can be obtained through SAS proc surveyfreq (chisq, based on the Rao-Scott chi-square with an adjusted F statistic). The analogous procedure in SUDAAN version 10.0 (proc crosstab), provides limited chi-square statistics based on Wald chi-square and does not provide an F adjusted p-value. However, SUDAAN regression models do provide F adjusted chi-square statistics which are recommended for analyzing NHANES data.

The Cochran Mantel Haenzel Test, an extension of the Pearson Chi-Square, can be applied to stratified two-way tables to test for homogeneity or independence in a non-survey setting. For a complex sample its analogue can be obtained in SUDAAN proc crosstab (cmh).


Agresti A. An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics. 1996. New York.


close window icon Close Window to return to module page.