Example 2 - One-to-many merge using SEQN as the unique identifier

This example is taken from the "Milk" program and shows SAS code for merging the demographic data file with the Day 1 Individual Food file.  The unique identifier used to merge the data is the SEQN.  This example shows a one-to-many merge, meaning that each observation in the demographic data file may have multiple matching records in the Day 1 Individual Food file, depending on the number of foods that a person reported on their dietary recall. 

Program to Merge and Check Contents of Data

Sample Code

*-------------------------------------------------------------------------;
* Use the PROC SORT procedure to sort the data files, by sequence number  ;
* (SEQN) that will be merged.  Note that the files come from the NH       ;
* library assigned in the previous step                                   ;
*                                                                         ;
* Merge the 2003-2004 Day 1 Individual Foods File (IFF) with the          ;
* demographics file. Use the KEEP statement to retain food-level records  ;
* of interest (fluid milk consumed not in combination with another food ) ;
* from the IFF file by selecting on food code and combination number.     ;
*-------------------------------------------------------------------------;

proc sort data =NH.DR1IFF_C out =DR1IFF_C;
    by SEQN;
run ;

proc sort data =NH.DEMO_C out =DEMO_C;
    by SEQN;
run ;

data DEMO1IFF;
    merge DR1IFF_C ( in =i keep = SEQN DR1CCMNM DR1IFDCD DR1IGRMS
        where = ((11100000 <= DR1IFDCD <= 11299999 ) and
        (DR1CCMNM eq 0 )))
        DEMO_C (keep=SEQN RIDAGEYR);
    by SEQN;
    < if i;
run ;

*-------------------------------------------------------------------------;
* Use the PROC CONTENTS procedure to list the contents of the DEMO1IFF    ;  
* dataset — the new merged dataset.  Use the VARNUM option to list the    ;
* variables according to their position in the dataset.                   ;
*                                                                         ;
* Use the PROC MEANS procedure to determine the number of non-missing     ;
* values (N), number of missing values (Nmiss), minimum (min) and maximum ;
*(max) values, to two decimal places (maxdec=2), for the variables in the ;
* merged dataset.                                                         ; 
*-------------------------------------------------------------------------;

proc contents
data =DEMO1IFF
varnum ;

proc means data = DEMO1IFF N< Nmiss min max maxdec = 2 ;
run ;

 

Output of Program

Click here to view program output and highlights

 

 

close window icon Close Window to return to module page.