Task 2: How to Identify and Recode Skip Patterns in NHANES II

The second task is to check the data for skip patterns. To do this, you will use the:

Step 1 Check codebook for skip patterns

Check the codebook and the appendix containing the data collection forms in the Plan and Operations Report to determine if a skip pattern affects the variables in your analysis. The Plan and Operation Report link is the first bullet under Data and Documentation/Codebook Files heading on the NHANES II page. See the Locate Variables module Task 1 for more information on how to locate background information on variables in the documentation.

Skip Patterns in Blood Pressure Questionnaire in Appendix of Plan and Operation Report Step 2 Check data for skip patterns

After you have used the codebook to discover if a skip pattern affects variables in your analysis, you will use cross tabulations obtained by the SAS proc freq procedure to determine the effect of skip patterns.

Program to Check Data for Skip Patterns

Statements Explanation

Proc freq data =demo2_nh2;

Use the proc freq procedure to determine the frequency of each value of the variables listed.

where n2ah0047>=20 ;

Use the where statement to select participants who were age 20 years and older.

table n2ah1059*n2ah1060 n2ah1060*(n2ah1067 n2ah1068 n2ah1069) n2ah1068*n2ah1069/ list missing ;

title 'Check skip patterns for BP questions' ;

run ;

Use the table statement to list the variables to be included in the output frequency table and the cross tabulation frequency table for the skip patterns. Use the list missing option to display missing values. Note that a star (*) indicates that a crosstab will be constructed with n2ah1059 as the row variable and n2ah1060 as the column variable. The syntax for a cross-tabulation is row variable(s)*column variable(s) and designates that the variable listed before the star will be the row variable and the variable listed after the star will be the column variable.

Highlighted items from the proc freq output for skip patterns:

• Note that in the cross-tabulation of n2ah1059 by n2ah1060 that there are 10,745 observations where the response is "no" to both questions on diagnosis of hypertension.
• Note that in the cross-tabulation of n2ah1059 by n2ah1060 that there are 4,302 observations where the response to n2ah1059 is "yes" and n2ah1060 is missing and 294 observations where the response to n2ah1059 is "no" and n2ah1060 is "yes." These responses will need to be recoded, or a new variable created, in order to estimate the total percent of persons who have diagnosed hypertension.
• Further down, the output includes a cross tabulation of n2ah1060 by n2ah1067, n2ah1068, and n2ah1069. Note that there are exactly 10,745 observations where the response to n2ah1060 is "no" and the response is missing for n2ah1067, n2ah1068, and n2ah1069. These respondents were not asked these questions because of a skip pattern.

Step 3 Recode data as necessary

To recode the missing data due to skip patterns, you can either:

• directly recode the variable, or
• create a derived variable.

Using the SAS if, then, and else statements you can either recode the variable directly or create a new variable (derived from the values of the variables in the skip pattern sequence).

Option 1 - Directly Recode Variable and Check After Recoding
Statements Explanation

Data temp3_nh2;
set demo2_nh2;

Use data and set statements to refer to your analytic dataset.

If n2ah1059= 1 then n2ah1059= 1;

Else if n2ah1060= 1 then n2ah1059=1 ;

Use the if, then, and else statements to directly recode n2ah1059 values based on the n2ah1060 values. If you recode this way, the original variable n2ah1059 is modified and the original values will no longer be available should you need to use this variable again somewhere else.

Proc freq data =temp3_nh2;
where n2ah 0047> = 20 ;
table n2ah1059*n2ah1060/ list missing ;
title 'Check recode n2ah1059 ' ;
run ;

Use the proc freq procedure to determine the frequency of each value of the variables listed; use the data statement to refer to your analytic dataset; use the where statement to select participants who were age 20 years and older (n2ah0047>=20); use the table statement to indicate variables of interest for the output.

Option 2 - Create Derived Variable (diagHTN) 1-Yes, 2-No

Statements Explanation

Data demo3_nh2;
set demo2_nh2;

Use the data and set statements to refer to your analytic dataset.

If n2ah1059= 1 or n2ah1060=1 then diagHTN= 1;
Else if n2ah1059= 2 or n2ah1060=2 then diagHTN= 2;

Else if n2ah1059= . and n2ah1060= . then diagHTN= .;

If n2ah1069= 1 then HTNMED= 1;

Else if n2ah1059 in ( 1,2) and n2ah1069 <8 then HTNMED= 2;

If n2ah0625= 1 then CIGSMOK= 1;

Else if n2ah0626= 1 and n2ah0625=2 then CIGSMOK= 2;

else if n2ah0626= 2 then CIGSMOK= 3;

Use the if, then, and else statements to create a new, derived variable (diagHTN) based on the n2ah1059 and n2ah1060 values. Note: You repeat in similar fashion to create derived variables for hypertension medication (HTNMED) and cigarette smoking (CIGSMOK), as these variables will be used in later examples.

Proc freq data =demo3_nh2;
where n2ah 0047> = 20 ;
table diagHTN*n2ah1059*n2ah1060/ list missing ;
title 'Check derived variable diagHTN' ;

Use the proc freq and table statements to check the derived variable (diagHTN) against the original variables (n2ah1059 and n2ah1060); use the data statement to refer to your analytic dataset; use the where statement to select participants who were age 20 years and older (n2ah0047>=20); use the table statement to indicate variables of interest for the output.

Highlighted items from the recode output for skip patterns:

• Options 1 and 2 produce the same results: 10,761 respondents are coded as "2," for the derived variable, diagHTN. Close Window