## Task 3b: How to Perform Chi-Square Test Using SAS

In this task, you will use the chi-square test to determine whether age group and osteoporosis treatment status are independent of each other.

### Step 1: Examine Relationship Between Two Categorical Variables

The PROC SURVEYFREQ procedure is used in SAS to examine the relationship between two categorical variables and obtain chi-square statistics.  Use the STRATA statement to specify the strata variable to account for the design effects of stratification.  Use the CLUSTER statement to specify PSU to account for design effects of clustering.  Use the WEIGHT statement to account for the unequal probability of sampling and non-response.  Use the WHERE statement to specify the subpopulation of interest.

Use the TABLE statement to create a cross tab of the categorical variables age group (AGEGRP) and osteoporosis treatment status (TREATOSTEO).  The options included after the backslash instruct SAS to output the column percent (COL), row percent (ROW), Wald chi-square (WCHISQ), and Wald log linear chi-square (WLLCHISQ), and suppress the standard deviation (NOSTD) and weighted sums (NOWT).  The CHISQ option is used to obtain the Rao-Scott chi-square and the CHISQ1 option is used to obtain the Rao-Scott modified chi-square.  Use the FORMAT statement to read the SAS formats.

#### Sample Code

*-------------------------------------------------------------------------;
* Use the PROC SURVEYFREQ procedure to perform a chi-square test in SAS.  ;
* This test will be used to determine whether age group and treatment for ;
* osteoporosis are independent of each other in respondents aged 20 and   ;
* over.       ;
*-------------------------------------------------------------------------;

proc surveyfreq data=DEMOOSTS;
strata SDMVSTRA;
cluster SDMVPSU;
weight WTINT2YR;
where RIDAGEYR >= 20 ;
table AGEGRP*TREATOSTEO/col row nostd nowt wchisq wllchisq
chisq chisq1;
format AGEGRP AGEGRP. TREATOSTEO YESNO. ;
run ;

IMPORTANT NOTE

For complex survey data such as NHANES, using the Rao-Scott F adjusted chi-square statistic is recommended since it yields a more conservative interpretation than the Wald chi-square.

#### Output of Program

```
The SURVEYFREQ Procedure

Data Summary

Number of Strata                  15
Number of Clusters                30
Number of Observations          5041
Sum of Weights             205284669

Table of AGEGRP by treatOSTEO

Row     Column
AGEGRP     treatOSTEO     Frequency    Percent    Percent    Percent
--------------------------------------------------------------------
20-39            Yes             2     0.0924     0.2375     2.2097
No          1738    38.8105    99.7625    40.5042

Total          1740    38.9029    100.000
--------------------------------------------------------------------
40-59            Yes            36     1.0062     2.6126    24.0624
No          1358    37.5077    97.3874    39.1446

Total          1394    38.5139    100.000
--------------------------------------------------------------------
>= 60            Yes           227     3.0831    13.6521    73.7279
No          1662    19.5001    86.3479    20.3512

Total          1889    22.5832    100.000
--------------------------------------------------------------------
Total            Yes           265     4.1817               100.000
No          4758    95.8183               100.000

Total          5023    100.000
--------------------------------------------------------------------
Frequency Missing = 18

Rao-Scott Chi-Square Test

Pearson Chi-Square    341.6678
Design Correction       0.6712

Rao-Scott Chi-Square  509.0778
DF                           2
Pr > ChiSq              <.0001

F Value               254.5389
Num DF                       2
Den DF                      30
Pr > F                  <.0001

Sample Size = 5023

Rao-Scott Modified Chi-Square Test

Pearson Chi-Square    341.6678
Design Correction       1.5353

Rao-Scott Chi-Square  222.5434
DF                           2
Pr > ChiSq              <.0001

F Value               111.2717
Num DF                       2
Den DF                      30
Pr > F                  <.0001

Sample Size = 5023

Wald Chi-Square Test

Chi-Square      91.2484

F Value         45.6242
Num DF                2
Den DF               15
Pr > F           <.0001

Num DF                2
Den DF               14

Sample Size = 5023

Wald Log-Linear Chi-Square Test

Chi-Square    1216.9520

F Value        608.4760
Num DF                2
Den DF               15
Pr > F           <.0001

Num DF                2
Den DF               14