Task 2c: How to Identify and Recode Skip Patterns Using Stata

The second task is to check the data for skip patterns. To do this, you will use the:

 

Step 1: Check codebook for skip patterns

Check the codebook to determine if a skip pattern affects the variables in your analysis. See the Locate Variables module Task 1 for more information on how to locate background information on variables in the documentation.

 

Skip Pattern in Blood Pressure Questionnaire Codebook

screenshot of Skip Pattern in Blood Pressure Questionnaire Codebook


Step 2: Check data for skip patterns

After you have used the codebook to discover if a skip pattern affects variables in your analysis, you will use cross tabulations obtained by the Stata tabulate command to determine the presence of skip patterns.

 

Program to Check Data for Skip Patterns

Use the tabulate command to determine the frequency of each value of the variables listed for participants who were interviewed and examined in the MEC and who were age 20 years and older. Use the missing option to display the missing values. Two variables listed on the tabulate command line will create a crosstab with BPQ020 as the row variable and BPQ030 and BPQ050a as the column variables.

tabulate bpq020 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
tabulate bpq030 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
tabulate bpq050a if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
tabulate bpq020 bpq030 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
tabulate bpq020 bpq050a if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing

 

Highlighted items from the tabulate output for skip patterns:

 

 

Step 3: Recode data as necessary

To recode the missing data due to skip patterns, you can either:

Using the Stata  if qualifier command you can either recode the variable directly or create a new variable (derived from the values of the variables in the skip pattern sequence).

 

Option 1 - Directly Recode Variable and Check After Recode

Use the if qualifier commands to directly recode BPQ.030 values based on the BPQ.020 values. Use the tabulate command to determine the frequency of each value of the variables listed for participants who were interviewed and examined in the MEC (ridstatr==2) and who were age 20 years and older (ridageyr >=20 & ridageyr <.). Use the missing option to display the missing values. Use the save command  to create a new dataset with the recoded values.

warning iconWARNING

If you recode this way, the original variable BPQ.030 is modified and the original values will no longer be available should you need to use this variable again somewhere else.

 

***bpq030=1 if bpq030==1***
replace bpq030=2 if bpq030!=1 & ((bpq020==1 | bpq020==2) & bpq030 <7)
replace bpq030=. if bpq030!=1 | bpq030!=2
 
tabulate bpq020 bpq030 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
save C:\Nhanes\Data\demo_bp2a, replace

 

Option 2 - Create Derived Variable (diagHTN) 1-Yes, 2-No

Use the generate command  to create a new, derived variable (diagHTN) based on the BPQ.030 and BPQ.020 values. Use the tabulate command with bysort option to create a 3-way frequency table to check the derived variable (diagHTN) against the original variables (BPQ.020 and BPQ.030) for participants who were interviewed and examined in the MEC (ridstatr==2) and who were age 20 years and older (ridageyr >=20 & ridageyr <.). Use the missing option to display the missing values. Use the save command to create a new dataset, demo_bp2b, with the recoded values.

use C:\Nhanes\Data\demo_bp1,clear
 
gen     diaghtn=.
replace diaghtn=1 if bpq030==1
replace diaghtn=2 if diaghtn !=1 & (bpq020==1 | bpq020==2) & bpq030 !=9
 
bysort diaghtn: tab bpq020 bpq030 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, row missing
save C:\Nhanes\Data\demo_bp2b, replace

 

Highlighted items from the recode output for skip patterns:

 

close window icon Close Window