Task 4: How to Create New Variables

This task reviews how to recode or derive new variables so they are appropriate for your analytic needs and how to check recoded or derived variables.

Warning iconThe steps below assume that you are already familiar with the SAS code to create new variables in NHANES datasets. If you need more detailed instructions, please review the Clean & Recode Data module in the Continuous NHANES Web Tutorial before continuing.


Phthalate Data (1999–2004):

Step 1: Recode or Derive New Variables

Creating new variables, by recoding or deriving, is an important step when preparing your analytic dataset. This is particularly true when you prepare an analytic dataset from different cycles of environmental chemical data. For some of the survey cycle data files, the indicator variable, which identifies values at or above or below LOD, has not been included as a "variable". You may want to create a new variable using multiple existing variables and different cut points. In addition, you may want to create a creatinine-adjusted variable of interest if a chemical is measured in urine.

The sample code below shows how to recode and derive new variables in multiple scenarios, using the SAS DATA step.


Program to Record or Derive New Variables



 * Create age categorical variable 1=6-11 2=12-19 3=20-39 4=40-59 5=60+ years. *

 * Record race/ethnicity categorical variable 1=NHW 2=NHB 3=MA 4=Other.        *

 * Create categorical variable for the LOD 1=above or at LOD 2=below LOD       *

 * Variable to indicate at/above or below LODs is not available in NH 99-02    *

 * Variable URDMHPLC to indicate at/above or below LOD is available in NH 03-04*

 * LOD for urinary mono-(2-ethyl)-hexyl phthalate is constant.                 *

 * The lowest value of urinary mono-(2-ethyl)-hexyl phthalate is below LOD.    *

 * Create a creatinine-corrected variable to adjust for urine dilution.        *



data Phthalate;

      set Phthalate;


      Age5cat=1+(ridageyr>=12)+(ridageyr>=20) +(ridageyr>=40) +(ridageyr>=60);


      if ridreth1= 3 then reth4cat= 1;

      else if ridreth1= 4 then reth4cat= 2;

      else if ridreth1= 1 then reth4cat= 3;

      else reth4cat= 4;


      if (sddsrvyr= 1 and URXMHP>0.8) or

         (sddsrvyr=2 and URXMHP>0.7) or

         (sddsrvyr=3 and URDMHPLC=0)

         then MHP_aLOD= 1;

      else if (sddsrvyr= 1 and URXMHP=0.8) or

              (sddsrvyr=2 and URXMHP=0.7) or

              (sddsrvyr=3 and URDMHPLC=1)

              then MHP_aLOD= 2;


      if URXMHP> 0 and URXUCR>0 then MHP_UCR= 100*URXMHP/URXUCR;

run ;


Step 2: Check Recoded or Derived Variables

In this step, you will use the PROC FREQ, PROC MEANS, and PROC PRINT procedures in SAS to confirm that the derived and recoded variables correctly correspond to the original variables.


Program to Check Recoded or Derived Variables


 * Use the PROC MEANS procedure to check created age categorical variable *

 * Use the PROC FREQ procedure to check recorded categorical variable     *

 * Use the PROC FREQ procedure to check created LOD categorical variable  *

 * Use the PROC MEANS procedure to check created LOD categorical variable *

 * Use the PROC PRINT procedure to check creatinine-corrected variable    * 


proc means data =Phthalate N min max maxdec = 0;

      var ridageyr;

      class Age5cat;

      title 'Check created age categorical variable' ;

proc freq data =Phthalate;

      table reth4cat*ridreth1/ list missing ;

      title 'Check Recorded Race/ethnicity Variable' ;

proc freq data =Phthalate;

      table MHP_aLOD*URDMHPLC*sddsrvyr/ list missing ;

      where WTSPH6YR> 0 and URXMHP>0;

      title 'Check created LOD categorical variable' ;

proc means data =Phthalate N min max maxdec = 1;

      var URXMHP;

      class MHP_aLOD sddsrvyr;

      title 'Check created LOD categorical variable' ;

proc print data =Phthalate ( obs = 10);

      id seqn;


      title 'Check creatinine-corrected variable' ;

run ;


Additional Resources


Warning icon In each of the modules on Preparing an Analytic Dataset, you will be working with temporary datasets, which are saved in the WORK folder of your SAS program.  The dataset exists only as long as your SAS session and is deleted when you exit the program.  If you would like to save these datasets so that you can return to them at a later time, you can learn how to do this in the Save a Dataset module at the end of this course.


close window icon Close Window to return to module page.