## Task 2a: How to Identify and Recode Skip Patterns Using SAS

The second task is to check the data for skip patterns. To do this, you will use the:

• sample person questionnaire, and
• SAS proc freq procedure to get cross tabulations tables, and then
• recode as necessary.

### Step 1: Check codebook for skip patterns

Check the codebook to determine if a skip pattern affects the variables in your analysis. See the Locate Variables module Task 1 for more information on how to locate background information on variables in the documentation.

### Step 2: Check data for skip patterns

After you have used the codebook to discover if a skip pattern affects variables in your analysis, you will use cross tabulations obtained by the SAS proc freq procedure to determine the presence of skip patterns.

Program to Check Data for Skip Patterns
Statements Explanation

Proc freq data =demo_BP1;

Use the proc freq procedure to determine the frequency of each value of the variables listed.
where ridstatr=2 and ridageyr>=20; Use the where statement to select participants who were interviewed and examined in the MEC and who were age 20 years and older.

table BPQ020 BPQ030 BPQ050a BPQ020*(BPQ030 BPQ050a)/ list missing ;

title 'Check skip pattern for BP questionnaire' ;

run ;
Use the table statement to list the variables to be included in the output frequency table and the cross tabulation frequency table for the skip patterns. Note that a star (*) indicates that a crosstab will be constructed with BPQ.020 as the row variable and BPQ.030 and BPQ.050a as the column variables.

Highlighted items from the proc freq output for skip patterns:

• Notice the high number of missing values (n=6,646) for BPQ.030 compared to the much lower number of missing values in the prior question BPQ.020 (n = 106).
• Further down, the output includes a cross tabulation of BPQ.020 responses by BPQ.030 responses. Note the large number of missing values in BPQ.030 (n=6,540) for those who responded with a "No," coded as "2," for BPQ.020. These respondents were not asked because of a skip pattern and therefore, these responses will need to be recoded before the data are further analyzed.

### Step 3: Recode data as necessary

To recode the missing data due to skip patterns, you can either:

• directly recode the variable, or
• create a derived variable.

Using the SAS if, then, and else statements you can either recode the variable directly or create a new variable (derived from the values of the variables in the skip pattern sequence).

Option 1 - Directly Recode Variable and Check After Recode
Statements Explanation

Data demo_BP2a;
set demo_BP1;

Use data and set statements to refer to your analytic dataset.

If BPQ030= 1 then BPQ030= 1;

Else if BPQ020 in ( 1,2) and BPQ030 <7 then BPQ030= 2;

Else BPQ030= .;
Use the if, then, and else statements to directly recode BPQ.030 values based on the BPQ.020 values.
If you recode this way, the original variable BPQ.030 is modified and the original values will no longer be available should you need to use this variable again somewhere else.

Proc freq data =demo_BP2a;
where ridstatr=2 and ridageyr>=20;
table BPQ020*BPQ030/ list missing ;
title 'Check recode BPQ030' ;
run ;

Use the proc freq procedure to determine the frequency of each value of the variables listed; use the data statement to refer to your analytic dataset; use the where statement to select participants who were interviewed and examined in the MEC (ridstatr=2) and who were age 20 years and older (ridageyr>=20); use the table statement to indicate variables of interest for the output.

#### Option 2 - Create Derived Variable (diagHTN) 1-Yes, 2-No

Statements Explanation

Data demo_BP2b;
set demo_BP1;

Use the data and set statements to refer to your analytic dataset.

If BPQ030= 1 then diagHTN= 1;
Else if BPQ020 in ( 1,2) and BPQ030 <7 then diagHTN= 2;

Use the if, then, and else statements to create a new, derived variable (diagHTN) based on the BPQ.030 and BPQ.020 values.

Proc freq data =demo_BP2b;
where ridstatr=2 and ridageyr>=20;
table diagHTN*BPQ020*BPQ030/ list missing ;
title 'Check derived variable diagHTN' ;
run ;

Use the proc freq and table statements check the derived variable (diagHTN) against the original variables (BPQ.020 and BPQ.030); use the data statement to refer to your analytic dataset; use the where statement to select participants who were interviewed and examined in the MEC (ridstatr=2) and who were age 20 years and older (ridageyr>=20); use the table statement to indicate variables of interest for the output.

Highlighted items from the recode output for skip patterns:

• Options 1 and 2 produce the same results: 6,540 respondents are coded as "2," e.g., a "No" response, instead of a missing response, for BPQ.030. Similarly, 6,540 are coded as "2" for the derived variable, diagHTN.