Task 3: How to Merge and Append NHANES Data for PAXRAW Analyses

To merge and append NHANES data for the PAX analysis, you will need to:

Sort Data Files by a Unique Identifier

The first step in merging data is to sort each data file by a unique identifier. Each study participant is assigned a unique identifier, represented by the variable SEQN. Use the PROC SORT procedure to sort the DEMO and PAXRAW files by the SEQN variable. In this segment of code, we use the “out” statement to store the sorted dataset to the SAS temporary library titled ‘WORK’. You can explore the WORK library by accessing the SAS Explorer and navigating to “Libraries”.

Sample Code

proc sort data = demo_c.demo_c out = demo_c;
by seqn;
run ;

proc sort data = paxraw_c.paxraw_c out = paxraw_c;
by seqn;
run ;

proc sort data = demo_d.demo_d out = demo_d;
by seqn;
run ;

proc sort data = paxraw_d.paxraw_d out = paxraw_d;
by seqn;
run ;

Generate a Count of the Minutes in Data for Each Study Participant

Proc Means is used to count the number of minutes (observations) for each participant. In the next step, we will include only participants who have 10080 observations (24 hour x 7 days x 60 minutes=10080). Only participants whose data are deemed reliable (PAXSTAT=1), and whose monitors were in calibration (PAXCAL=1) will be included as we proceed.

Sample Code

proc means noprint data = paxraw_c(where=(PAXSTAT= 1 and PAXCAL= 1 ));
    by SEQN;
    var PAXN;
    output out =chka n = n ;
run ;

proc means noprint data = paxraw_d(where=(PAXSTAT= 1 and PAXCAL= 1 ));
    by SEQN;
    var PAXN;
    output out =chkb n = n ;
run ;

The number of minutes is used to create separate files of participants with 10080 minutes and those with less than 10080 minutes. Only participants in the file with 10080 minutes will be used for analysis.

Sample Code

data OK notOK;
    set chka chkb;
    if (n= 10080 ) then output OK;
    else output notOK;
run ;

Append the PAXRAW Data for Analysis and the DEMO Files

The files with number of minutes for each participant (OK or notOK) are used along with the PAXSTAT and PAXCAL variables to create the analytic PAXRAW data. The sequential day (1-7) variable DAY is being created here because it will be needed for the processing to follow.

Sample Code

data temp;
    set paxraw_c(where=(PAXSTAT= 1 and PAXCAL= 1 ))  paxraw_d(where=(PAXSTAT= 1 and PAXCAL= 1 ));
run ;

Create the accelerometer dataset with complete records, adding a variable for the sequential day of data collection by merging the TEMP and OK datasets

Sample Code

data monitors;
  merge temp OK( in =inOK drop=n _TYPE_ _FREQ_);
  by SEQN;
  day=ceil(paxn/1440 );
  label day= 'Sequential Day' ;
  if inOK;
run ;

Append the DEMO datasets from the 2003-2004 and 2005-2006 cycles

Sample Code

data demo;
  set demo_c demo_d;
run;

proc sort data=demo;
  by seqn;
run;

Merge only the age variable from the demo file because it will be needed to assign certain age-specific criteria

The remainder of the demographic file is not included because of the intensive data manipulation steps to follow. Other demographic variables will be added later.

Sample Code

data monitors;
  merge demo ( in =d keep=seqn ridageyr) monitors ( in =m);
  by seqn;
  if m and d;
run ;

   

Info iconIMPORTANT NOTE

A 4-byte SAS numeric variable was used for the SEQN variable to save space in the final PAM data file. The SEQN variable in the Demo file is an 8-byte variable. SAS may generate a warning when the BY variable has different byte lengths in the two files. If you merge with the 4-byte SEQN length dataset listed first in your merge statement, the resulting SEQN length in the merged dataset will be 4 bytes and you will get a warning message about the different lengths but the datasets will merge fine, as none of the values in the 8-byte SEQN field exceed the 4-byte maximum value (2,097,152). If you merge with the 8 byte SEQN length dataset first in your merge statement, the resulting SEQN length in the merged dataset will be 8 bytes and the datasets will again merge fine but without a warning message as the resulting SEQN is 8 bytes long and there is no potential for truncation.