Task 2b: How to Calculate a Chi-Square Test Using SAS

In this task, you will use the chi-square test in SAS to determine whether calcium supplement use and treatment for osteoporosis are independent of each other for men and women ages 50 years and older.


Step 1: Determine variables of interest

This example uses the demoadv dataset (download at Sample Code and Datasets).  This dataset contains a created variable anycalsup that has a value of 1 for those who report calcium supplement use, and a value of 2 for those who do not. A participant was considered not to have any calcium supplement use if the daily average amount of calcium supplement use was zero; otherwise, a participant was considered a supplement user (see Supplement Code under Sample Code and Module 9, Task 4 for more information). The variable treatosteo indicates treatment for osteoporosis.  A participant was coded as having had treatment for osteoporosis if he or she responded “yes” to OSQ.070 (“{Were you/Was SP} treated for osteoporosis?”) from the osteoporosis questionnaire, and was set to “no” if he or she responded “no” to OSQ.070 or to OSQ.060 (“Has a doctor ever told {you/SP} that {you/s/he} had osteoporosis, sometimes called thin or brittle bones?”) from the osteoporosis questionnaire. (The SAS code to create this variable is found in the “Supplement Program” sample SAS code.) The demoadv dataset for this example only includes those with MEC weights (wtmec2yr>0).


Step 2: Create Variable to Subset Population

In order to subset the data in SAS Survey Procedures, you will need to create a variable for the population of interest. In this example, the sel variable is set to 1 if the sample person is age 50 years or older, and 2 if the sample person is younger than 50 years.


Step 3:  Set Up SAS to Perform Chi-Square Test

The chi-square statistic is requested from the SAS surveyfreq procedure.  The summary table below provides an example of how to code for a chi-square test in SAS. 

Calculating the chi-square test Using SAS surveyfreq Procedure

Statements Explanation

proc surveyfreq data =demoadv;

Use the SAS Survey procedure, proc surveyfreq, to examine the relationship between two categorical variables.

Strata sdmvstra;   

Use the strata statement to specify the strata variable (sdmvstra) and account for design effects of stratification.

cluster sdmvpsu;  

Use the cluster statement to specify PSU(sdmvpsu) to account for design effects of clustering.

weight wtmec2yr;

Use the weight statement to account for the unequal probability of sampling and non-response.  In this example, the MEC weight for 2 years of data (wtmec2yr) is used.

table sel*riagendr*anycalsup*treatosteo/col row nostd nowt wchisq wllchisq chisq chisq1;  

Use the table statement to specify cross-tabulations for which estimates are requested. In the example, the estimates are for age greater than or equal to 50 years (sel) by gender (riagendr) and by osteoporosis treatment (treatosteo). The options after the slash will output the column percent (col), row percent (row), Wald chi-square (wchisq), and Wald log linear chi-square (wllchisq), and suppress the standard deviation (nostd) and weighted sums (nowt). Use the chisq option to obtain the Rao-Scott chi-square and the chisq1 to obtain the Rao-Scott modified chi-square.

format riagendr gender. anycalsup yesnos. treatosteo yesno.; run ;

Use the format statement to read the SAS formats.



For complex survey data such as NHANES, we recommend using the Rao-Scott F adjusted chi-square statistic because it yields a more conservative interpretation than the Wald chi-square.


SAS version 9.2 and version 9.1.3 produce different estimates of the Rao-Scott Chi-Square test. This is because in version 9.1.3 SURVEYFREQ uses the total (over all tables) sample size in the Rao-Scott computations. In version 9.2, the procedure uses the individual two-way sample size. SAS recommends the use of version 9.2.


Step 4: Review output



close window icon Close Window to return to module page.