This task reviews how to recode variables so they are appropriate for your analytic needs and how to check your derived variables.
Recoding is an important step for preparing an analytical dataset. In this step, you will view programs that recode variables using different techniques for each of the scenarios listed on the Clean & Recode Data: Key Concepts about Recoding Variables in NHANES II page. In the summary table below, each statement required for recoding is listed on the left with explanations on the right.
| Statements | Explanation |
|---|---|
|
data demo4_nh2; set demo3_nh2; |
Use the data and set statements to refer to your analytic dataset. |
|
if n2ah0064 =1 then do;
if n2ah0062 <=33 then HIGHSCHL=1; else if n2ah0062=34 then HIGHSCHL=2; else if 41<=n2ah0062<=45 then HIGHSCHL=3; else HIGHSCHL=.; end;
if n2ah0064=2 then do; if n2ah0062=34 then HIGHSCHL=1; if n2ah0062=41 then HIGHSCHL=2; end; |
Use the if, then, and else statements to create a simple high school education categorical variable (HIGHSCHL) from two categorical education variables. |
|
if
(20 <=n2ah0047 <=
39) then
age3cat=1;
else
if (40
<= n2ah0047<= 59)
then age3cat=2; else if n2ah0047>= 60 then age3cat=3; |
Use the if, then, and else statements statement to create an age categorical variable (age3cat) from a continuous variable. |
|
mean_sbp = mean(of n2pe0411 n2pe0771); mean_dbp = mean(of n2pe0414 n2pe0774); |
Use these function statements to calculate mean systolic and diastolic blood pressures. |
|
if
mean_sbp>=140
then
SBP140=1; else SBP140=0; if mean_dbp>=90 then DBP90=1; else DBP90=0; if HTNMED>=0 and SBP140>=0 and DBP90>=0 then do; if HTNMED=1 or SBP140=1 or DBP90=1 then HBP=1; else HBP=0; end; |
Use the if, then, and else statements to define a new variable, hbp (high blood pressure = 1 or 0), based on a series of conditions that indicate hypertension from the questionnaire and examination variables. |
|
if
n2lb0421>=240
then
HLP =1; |
Use the if, then, and else statements to define a new variable, hlp (hyperlipidemia = 1 or 0), based on high lipid levels from the biochemical measurement variable. |
In this step, you will check to confirm that derived and recoded variables correctly correspond to the original variables.
| Statements | Explanation |
|---|---|
|
proc freq data=demo4_nh2; where n2ah0047>=20; table n2ah1059*n2ah1060*DiagHTN*n2ah1069*HTNMED
HBP*HTNMED*SBP140*DBP90 /list
missing; |
Use the proc freq procedure to create a cross tabulation of the original categorical variable for high blood pressure by its respective recoded variables. Use the where statement to select the participants who were age 20 years and older. |
|
proc
means
data=demo4_nh2
N
min
max;
proc
means
data=demo4_nh2
N
min
max;
proc
means
data=demo4_nh2
N
min
max;
proc
means
data=demo4_nh2
N
min
max;
|
Use the proc means procedure to calculate the mean, minimum, and maximum values for the original continuous variables. Use the where statement to select the participants who were age 20 years and older. The class statement will separate the original continuous variable into categories of the derived variables. This is done to check that coding of the derived variable, based on cut-off points of the continuous variable, is correct. |
Highlighted items comparing recoded or derived variables to original variables: