Example 3 - One-to-many merge using supplement ID number as the unique identifier

This example is taken from the "Supplement" program and shows SAS code for merging the Supplement files 1 (one record per person) and 2 (multiple records per person) with Supplement file 3 (one record per supplement) file.  The unique identifier used to merge the data is the supplement ID number (DSDSUPID).  This example shows a one-to-many merge, meaning that each observation in one file may have multiple matching records in the other file.

Program to Merge and Check Contents of Data

Sample Code

*-------------------------------------------------------------------------;
* Use the PROC SORT procedure and BY statement to sort supplement files 1 ;
* (NH.DSQ1_C) and 2 (NH.DSQ2_C) by SEQN.                                  ;
*                                                                         ;
* Use the DATA statement to name the new dataset, SUP12, and the MERGE    ;
* statement to merge supplement files 1 and 2 by SEQN.                    ;
*-------------------------------------------------------------------------;

proc sort data =NH.DSQ1_C out =DSQ1_C;
    by SEQN;
run ;

proc sort data =NH.DSQ2_C out =DSQ2_C;
    BY SEQN;
run ;

data SUP12;
    merge DSQ1_C DSQ2_C;
    by SEQN;
run ;

*-------------------------------------------------------------------------;
* Use the PROC SORT procedure and BY statement to sort SUP12, which was   ;
* created in the previous step, and supplement file 3 (NH.DSQ3_C) by      ;
* DSDSUPID.                                                               ;
*                                                                         ;
* Use the DATA statement to name the new dataset, SUP123, and use the     ;
* MERGE statement to merge SUP12 and NH.DSQ3_C by DSDSUPID.  Note that    ;
* these files must be merged by DSDSUPID because this is the unique       ;
* identifier common to these two datasets.                                ;
*-------------------------------------------------------------------------;

proc sort data =SUP12;
    by DSDSUPID;
run ;

proc sort data =NH.DSQ3_C out =DSQ3_C;
    by DSDSUPID;
run ;

data SUP123;
    merge SUP12 DSQ3_C;
    by DSDSUPID;
run ;

*-------------------------------------------------------------------------;
* Use the PROC CONTENTS procedure to list the contents of the SUP123      ;  
* dataset — the new merged dataset.  Use the VARNUM option to list the    ;
* variables according to their position in the dataset.                   ;
*                                                                         ;
* Use the PROC MEANS procedure to determine the number of non-missing     ;
* values (N), number of missing values (Nmiss), minimum (min) and maximum ;
*(max) values, to two decimal places (maxdec=2), for the variables in the ;
* merged dataset.                                                         ; 
*-------------------------------------------------------------------------;

proc contents data = SUP123
varnum ;

proc means data = SUP123 Nmiss min max maxdec = 2 ;
run ;

Output of Program

Click here to view program output and highlights

 

 

close window icon Close Window to return to module page.