## Task 2: How to Identify and Recode Skip Patterns in NHANES III

The second task is to check the data for skip patterns. To do this, you will use the:

### Step 1: Check codebook for skip patterns

Check the codebook to determine if a skip pattern affects the variables in your analysis. See the Locate Variables module Task 1 for more information on how to locate background information on variables in the documentation.

#### Skip Patterns in Blood Pressure Questionnaire Codebook ### Step 2: Check data for skip patterns

After you have used the codebook to discover if a skip pattern affects variables in your analysis, you will use cross tabulations obtained by the SAS proc freq procedure to determine the presence of skip patterns.

#### Program to Check Data for Skip Patterns

Statements Explanation

Proc freq data =demo2_nh3;

Use the proc freq procedure to determine the frequency of each value of the variables listed.

where hsageu=2 and hsageir>= 20 and dmpstat=2 ;

Use the where statement to select participants who were age 20 years and older, and who had both the home interview and the MEC exam.

table hae2 hae3 hae5a hae2*(hae3 hae5a) hae6 hae7 hae9d hae6*(hae7 hae9d) har1 har3 har1*har3 / list missing ;

title 'Check skip patterns for BP, cholesterol, and smoking  questions' ;

run ;

Use the table statement to list the variables to be included in the output frequency table and the cross tabulation frequency table for the skip patterns. Note that a star (*) indicates that a crosstab will be constructed with hae2 as the row variable and hae3 and hae5 as the column variables, and similarly for variables hae6 with hae7 and hae9d, and har1 with har3.

Highlighted items from the proc freq output for skip patterns:

• Notice the high number of missing values (n=12,029) for hae3 compared to the much lower number of missing values in the prior question hae2 (n = 111).
• Further down, the output includes a cross tabulation of hae2 responses by hae3 responses. Note the large number of missing values in hae3 (n=11,874) for those who responded with a "No," coded as "2," for hae2. These respondents were not asked because of a skip pattern and therefore, these responses will need to be recoded before the data are further analyzed.

### Step 3: Recode data as necessary

To recode the missing data due to skip patterns, you can either:

• directly recode the variable, or
• create a derived variable.

Using the SAS if, then, and else statements you can either recode the variable directly or create a new variable (derived from the values of the variables in the skip pattern sequence).

#### Option 1 - Directly Recode Variable and Check After Recoding

Statements Explanation

Data temp3_nh3;
set demo2_nh3;

Use data and set statements to refer to your analytic dataset.

If hae3= 1 then hae3= 1;

Else if hae2 in ( 1,2) and hae3 <8 then hae3= 2;

Else hae3 = .;

Use the if, then, and else statements to directly recode hae3 values based on the hae2 values. If you recode this way, the original variable hae3 is modified and the original values will no longer be available should you need to use this variable again somewhere else.

Proc freq data =temp3_nh3;
where hsageu=2 and hsageir>= 20 and dmpstat= 2 ;
table hae2*hae3/ list missing ;
title 'Check recode hae3' ;
run ;

Use the proc freq procedure to determine the frequency of each value of the variables listed; use the data statement to refer to your analytic dataset; use the where statement to select participants who were age 20 years and older and who had both the home interview and the MEC exam (hsageu=2 and hsageir>=20 and dmpstat=2); use the table statement to indicate variables of interest for the output.

#### Option 2 - Create Derived Variable (diagHTN) 1-Yes, 2-No

Statements Explanation

Data demo3_nh3;
set demo2_nh3;

Use the data and set statements to refer to your analytic dataset.

If hae3= 1 then diagHTN= 1;
Else if hae2 in ( 1,2) and hae3 <8 then diagHTN= 2;

If hae5a= 1 then HTNMED= 1;

Else if hae2 in ( 1,2) and hae5a <8 then HTNMED= 2;

If hae7= 1 then diagCHOL= 1;

Else if hae6 in ( 1,2) and hae7 <8 then diagCHOL= 2;

If hae9d= 1 then CHOLMED= 1;

Else if hae6 in ( 1,2) and hae9d <8 then CHOLMED= 2;

If har3= 1 then CIGSMOK= 1;

Else if har1= 1 and har3=2 then CIGSMOK= 2;

else if har1= 2 then CIGSMOK= 3;

Use the if, then, and else statements to create a new, derived variable (diagHTN) based on the hae3 and hae2 values. Note: You repeat in similar fashion to create derived variables for hypertension medication (HTNMED), diagnosed hyperlipidemia (diagCHOL), cholesterol medication (CHOLMED), and cigarette smoking (CIGSMOK), as these variables will be used in later examples.

Proc freq data =demo3_nh3;
where hsageu=2 and hsageir>= 20 and dmpstat=2 ;
table diagHTN*hae2*hae3/ list missing ;
title 'Check derived variable diagHTN' ;

Use the proc freq and table statements check the derived variable (diagHTN) against the original variables (hae2 and hae3); use the data statement to refer to your analytic dataset; use the where statement to select participants who were age 20 years and older and who had both the home interview and the MEC exam (hsageu=2 and hsageir>=20 and dmpstat=2); use the table statement to indicate variables of interest for the output.

Highlighted items from the recode output for skip patterns:

• Options 1 and 2 produce the same results: 12,811 respondents are coded as "2," e.g., a "No" response, instead of a missing response, for hae3. Similarly, 12,811 are coded as "2" for the derived variable, diagHTN Close Window