Task 2: How to Append NHANES Dietary Data

Here are the steps to appending NHANES data:

 

Step 1: Compare Variable Names and Labels

The first step before appending data is to examine the contents of the data files. Using the PROC CONTENTS procedure, you can get a list of variable names and variable labels for each data file selected. While reviewing the output of the PROC CONTENTS procedure, you should compare variable names and labels to see whether any changes or differences occurred from cycle to cycle. 

The example below uses the sample "Food Sources" program. Notice that the variable labels for “Calcium (mg)” are the same between 2001-2002 and 2003-2004, but the variable names are different.  Additionally, a comparison of the documentation for vitamins A and E between 2001-2002 and 2003-2004 shows that although the variable names remain the same, the units of measure are different, and only careful examination of the documentation would allow you to detect this change. It is important to check whether the variable names and labels are consistent between datasets before appending.

 

Program to Check Datasets' Contents and Compare Variable Names and Labels

Sample Code

*---------------------------------------------------------;
* Use the LIBNAME statement to refer to the folder where ;
* the data files are stored.                            ;
*                                                        ;
* Use the PROC CONTENTS procedure to list the contents ;
* of each dataset                                    ;
* 2001-2002 Dietary Interview (Individual Foods File) ;
* Examination File                                    ;
* 2003-3004 Dietary Interview (Individual Foods File) ;
* Examination File;                        
* 2001-2002 Demographic File                          ;
* 2003-2004 Demographic File                             ;
*                                                     ;
* Use the VARNUM option to list the variables according ;
* to their position in the dataset.                     ;
*--------------------------------------------------------;

libname NH "C:\NHANES\DATA" ;
proc contents data =NH.DRXIFF_B varnum ;
proc contents data =NH.DR1IFF_C varnum ;
proc contents data =NH.DEMO_B varnum ;
proc contents data =NH.DEMO_C varnum ;
run ;

Output of Program

Click here to view program output and highlights

 

Info iconIMPORTANT NOTE

Most dietary variables from 2001-2002 begin with the prefix DRXT and most dietary variables from the 2003-2004 begin with the prefix DR1T (for Day 1 data) or DR2T (for Day 2 data). Because these variables are continuous (as opposed to categorical), you can simply rename them to make them identical.

 

Step 2: Append Directly, If Variables are Identical

After carefully reviewing the demographic files, you will find that the variables of interest in the two cycles remain the same. Therefore, you can directly append without any further changes.

Because you are interested only in a subset of the variables, you can use the KEEP option statement to select relevant variables.

 

Info iconIMPORTANT NOTE

When appending NHANES data you should always include the sequence number (SEQN). Failing to do so will lead to problems if you want to sort or merge your data files at a later time.

 

As a reminder, the sample code below is taken from the "Food Sources" program. No output is associated with this procedure, so you will need to check the SAS log file to make sure that the procedure was completed successfully. Additionally, you can use SAS Explorer to see that the new 4-year dataset (DEMO_4YR) is in your WORK library, which is the default temporary library created for each SAS session.  This library is deleted when the SAS session is complete.  (To find out how to save the dataset to a SAS-accessible library, see the Save a Dataset module.)

 

Program to Directly Append Datasets

Sample Code

*-------------------------------------------------------------------------;
* The DATA step creates a dataset for your 4 years of demographic data    ;
* (DEMO_4YR).                                                             ;
*                                                                         ;
* The SET statement appends the 2003-2004 demographic data file           ;
* (NH.DEMO_C) to the 2001-2002 demographic data file (NH.DEMO_B).         ;
*                                                                         ;
* The KEEP statement selects the variables of interest. Notice that       ;
* in the keep statement, the variable, sequence number (SEQN) is          ;
* included.  This variable should be included when datasets are appended. ;
*                                                                         ;
* The SDMVPSU and SDMVSTRA variables are included in the dataset in order ;
* to incorporate survey design information in later analyses.             ;
*                                                                         ;
* Note that WTMEC2YR is the weight variable for all persons examined in   ;
* the MEC and is appropriate for use with dietary recall data.  Weights   ;
* must be used in order for your analysis to be generalizable to the      ;
* total population.  For more information on weighting, see the Overview  ;
* of NHANES Survey Design and Weights module in the NHANES Dietary Data   ;
* Survey Orientation Course.                                              ;
*-------------------------------------------------------------------------;

data DEMO_4YR;
    set NH.DEMO_B (keep=SEQN RIDAGEYR SDMVPSU SDMVSTRA)
        NH.DEMO_C (keep=SEQN RIDAGEYR SDMVPSU SDMVSTRA);
run ;

 

Step 3: Rename Variables and/or Recode Variables Before Appending, If Variables are Different

If the variables in your datasets differ, you will need to rename and/or recode them before you append them.  For example, the 2001-2002 total nutrient intake files contain variables that were renamed in 2003-2004.  Therefore, if you append files from these survey cycles, you will need to rename the variable first and then append the data.  If the response categories of the variables are different, you will also need to recode.

You will see in the sample code from the "Food Sources" program that the variables DRDDRSTZ, DRXICALC, and DRDIFDCD in the 2001-2002 individual food file were renamed to DR1DRSTZ, DR1ICALC, and DR1IFDCD, respectively, the same as the variable names in the 2003-2004 data file.  After renaming the 2001-2002 variables, you will be ready to append the data files with selected variables of interest.

 

Program to Rename Variables and Append

Sample Code

*-------------------------------------------------------------------------;
* The DATA step creates the dataset for your 4 years of dietary data      ;
* (IFF_4YR).                                                              ;
*                                                                         ;
* The KEEP statement includes only variables of interest in your dataset. ;
*                                                                         ;
* The SET statement appends the 2003-2004 dietary nutrient data file      ;
* (NH.DR1IFF_C) to the 2001-2002 dietary nutrient data file (NH.DRXIFF_B).;
*                                                                         ;
* The RENAME statement renames the variables DRDDRSTZ, DRXICALC, and      ;
* DRDIFDCD in the 2001-2002 dietary nutrient data file to DR1DRSTZ,       ;
* DR1ICALC, and DR1IFDCD, which are the names given to the same variables ;
* in the 2003-2004 dietary nutrient data file.                            ;
*-------------------------------------------------------------------------;

data IFF_4YR (keep=DR1IFDCD WTDRD1 DR1ICALC SEQN DR1DRSTZ);
    set NH.DRXIFF_B (rename=(DRDDRSTZ=DR1DRSTZ DRXICALC=DR1ICALC
        DRDIFDCD=DR1IFDCD))
        NH.DR1IFF_C;
run ;

 

No output is associated with this procedure, so you will need to check the SAS log file to make sure that the procedure completed successfully.  Additionally, you can use SAS Explorer to see that the new 4-year dataset (IFF_4YR) is in your WORK library.

 

Step 4: Construct Weights for NHANES Analyses across Multiple Survey Cycles

In general, when combining multiple survey cycles, the basic sample weight variable for each cycle should be divided by the number of cycles in the combined data set. Then, these rescaled weights can be summed to form a new weight for the combined survey cycles.  The following examples show how to construct weights for multiple survey cycles for NHANES 2001-2002 and beyond.

 

Combining 2001-2002 and 2003-2004 to Produce a 4-Year Dataset

For 4 years of data from 2001-2004, construct a weight variable as follows:

Sample Code

if SDDSRVYR=2 or SDDSRVYR=3 then MEC4YR = WTMEC2YR/2;

 

Combining 2001-2002, 2003-2004, and 2005-2006 to Obtain 6 Years of Data

For 6 years of data from 2001-2006, construct a weight variable as follows:

Sample Code

if SDDSRVYR in (2,3,4) then MEC6YR = WTMEC2YR/3;

 

Info iconIMPORTANT NOTE

Certain survey components were completed on subsamples, which have subsample sample weights. Subsample weights are not designed to be combined. In fact, many subsamples are mutually exclusive. If it is necessary to combine two or more subsamples for your analyses, then appropriate weights would need to be recalculated. However, details on how to recalculate weights when combining subsamples are beyond the scope of this tutorial. Therefore, it is strongly advised that you do not attempt to combine subsamples in any analysis.

 

Step 5: Check Results

After appending the data files, it is a good idea to check the contents again to make sure that the files were appended correctly. Use the PROC CONTENTS procedure, as demonstrated in Step 1, to check the combined files. Consult the Program to Check Datasets' Contents and Compare Variable Names and Labels, above, for further instruction, if necessary.

Double check variable names and labels, and make sure that variables are renamed correctly. Pay special attention to the number of observations in the combined dataset, which should be the sum of the observations in the two data files.

Output of Program

Click here to view program output and highlights

  close window icon Close Window to return to module page.