## Task 4: How to Recode Variables Based on Alternate Definitions

This task reviews how to recode variables so they are appropriate for your analytic needs and how to check your derived variables.

### Step 1 Recode based on alternate definitions

Recoding is an important step for preparing an analytical dataset.  In this step, you will view programs that recode variables using different techniques for each of the scenarios listed on the Clean & Recode Data: Key Concepts about Recoding Variables in NHANES I page. In the summary table below, each statement required for recoding is listed on the left with explanations on the right.

#### Program to Recode as Necessary Based on Alternate Definitions

Statements Explanation

data demo3_nh1;

set demo2_nh1;

Use the data and set statements to refer to your analytic dataset.

if (20 <=n1bm0101 <= 39) then  age3cat=1 ;
else if ( 40 <= n1bm0101<= 59 ) then age3cat= 2 ;
else if n1bm0101>= 60 then age3cat= 3 ;

Use the if, then, and else statements statement to create an age categorical variable (age3cat) from a continuous variable.

if N1ME0228>=240 then SBP=1; else SBP=0;

if N1ME0231>=90 then DBP=1 ; else DBP=0;

if N1AH0423 in ( 1, 2) or SBP=1or DBP=1 then HBP= 1;
else HBP= 0;

Use the if, then, and else statements to define a new variable, hbp (high blood pressure = 1 or 0), based on a series of conditions that indicate hypertension from the questionnaire and examination variables.

if n1lb0237 >= 240 then HLP= 1;
else HLP= 0;

Use the if, then, and else statements to define a new variable, hlp (hyperlipidemia = 1 or 0), based on high lipid levels from the biochemistry variable.

if N1BM0112>= 0 then do ;

if N1BM0112<34 then HIGHSCHL= 1;

else if N1BM0112=34 then HIGHSCHL= 2;

else if N1BM0112>34 then HIGHSCHL= 3;

else HIGHSCHL= .; end;

Use the if, then, and else statements to create a simple high school education categorical variable (HIGHSCHL) from a complex categorical education variable.

BMI = n1bm0260 / ((n1bm0266/100)**2);

Use the formula kg/m2 and the weight and height variables from the anthropometry file to calculate body mass index (BMI).

If n1ah0288 >= 0 then n1ah0287 = 1 ;

If n1ah0294 >= 0 then n1ah0293 = 1 ;

run ;

Use if...then statements to recode/create a value of 1=yes for heart attack (and stroke) by controlling for whether the respondent reported how long it had been since they experienced the event.

### Step 2 Check recodes

In this step, you will check to confirm that derived and recoded variables correctly correspond to the original variables.

#### Program to Check Recodes using Cross Tabulations or proc means

Statements Explanation

proc freq data =demo3_nh1;

where n1bm0101>= 20;

table n1ah0423*n1ah0290
HBP*n1ah0423*SBP*DBP /list missing ;

table n1bm0112*highschl / list missing;

table n1ah0287*n1ah0288
n1ah0293*n1ah0294 / list missing;

title 'Check regroup/recode/definitions categorical variables' ;

run ;

Use the proc freq procedure to create a cross tabulation of the original categorical variables for high blood pressure by their respective recoded variables. Use the where statement to select the participants who were age 20 years and older.  Do the same for education.  In addition, check to be certain that heart attack and stroke were recoded correctly.

proc means data =demo3_nh1 N min max ;
where n1bm0101 >= 20 ;
var n1bm0101;
class age3cat;
title 'Check if each age category contains the correct age range' ;

proc means data =demo3_nh1 N min max ;
where n1bm0101 >= 20 ;
var n1me0228;
class SBP;
title 'Check if SBP >=140 is defined correctly' ;

proc means data =demo3_nh1 N min max ;
where n1bm0101 >= 20 ;
var n1me0231;
class DBP;
title 'Check if DBP >=90 is defined correctly' ;

proc means data =demo3_nh1 N min max ;
where n1bm0101 >= 20 ;
var n1lb0237;
class HLP;
title 'Check if TC>=240 is defined correctly' ;

proc means data =demo3_nh1 N min max ;
where n1bm0101>=20 ;
var BMI;
title 'Check if BMI is calculated correctly' ;

run ;

Use the proc means procedure to calculate the mean, minimum, and maximum values for the original continuous variables. Use the where statement to select the participants who were age 20 years and older. The class statement will separate the original continuous variable into categories of the derived variables. This is done to check that coding of the derived variable, based on cut-off points of the continuous variable, is correct.   For BMI, no class statement is used because one continuous variable was created from two continuous variables.

Highlighted items comparing recoded or derived variables to original variables:

• The output from the frequency tables (proc freq) shows that the derived categorical variables were assigned correctly.  For example, 8,775 survey participants who were not taking medication for hypertension (n1ah0423=3), who had systolic blood pressures less than 140 (SBP = 0), and who had diastolic blood pressures less than 90 (DBP = 0) were all assigned to HBP = 0 (no high blood pressure).  Any observations with a "1" value for  SBP or DBP, or a value of "1 or 2" for n1ah0423 were assigned a value of "1" for the HBP derived variable, indicating the presence of conditions used to define high blood pressure.
• The output from the proc means procedure shows that the newly derived categorical variables were assigned correctly, based on the cut-off points of the original continuous variables.  For example, the derived categorical variable, age3cat, had values of "1," "2," and "3," which corresponds correctly to the selected cut-off points for age (20-39, 40-59, and 60-90) of the original continuous variable.