Example 1 - One-to-one merge using SEQN as the unique identifier

This example is taken from the sample "Supplement" program and shows SAS code for merging the Osteoporosis file, which is a Questionnaire component, and the Demographics file.    The unique identifier used to merge the data is the SEQN.  This example shows a 1:1 merge, meaning that each data file contains one record per person.

 

Program to Merge and Check Contents of Data

Sample Code

*-------------------------------------------------------------------------;
* Use the PROC SORT procedure and BY statement to sort the 2003-2004      ;
* demographic data (DEMO_C) and the osteoporosis data (OSQ_C) by SEQN.    ;
*-------------------------------------------------------------------------;

proc sort data =NH.DEMO_C out =DEMO_C;
    by SEQN;
run ;

proc sort data =NH.OSQ_C out =OSQ_C;
    by SEQN;
run;

*-------------------------------------------------------------------------;
* Merge the DEMO_C and OSQ_C datasets by their linking variable, SEQN.    ;
* Use the KEEP statement to retain selected variables of interest in the  ;
* dataset.                                                                ;
*                                                                         ;
* Use the FORMAT statement to apply the GENDER format to the RIAGENDR     ;
* variable                                                                ;
*-------------------------------------------------------------------------;

data DEMOOST (keep=SEQN RIAGENDR RIDAGEYR OSQ060 SDMVSTRA SDMVPSU
    WTINT2YR OSQ070);
    merge DEMO_C OSQ_C;
    by
    format RIAGENDR GENDER. ;
run ;

*-------------------------------------------------------------------------;
* Use the PROC CONTENTS procedure to list the contents of the DEMOOST     ;
* dataset — the new merged dataset.  Use the VARNUM option to list the    ;
* variables according to their position in the dataset.                   ;
*                                                                         ;
* Use the PROC MEANS procedure to determine the number of non-missing     ;
* values (N), number of missing values (Nmiss), minimum (min) and maximum ;
*(max) values, to two decimal places (maxdec=2), for the variables in the ;
* merged dataset.                                                         ;
*-------------------------------------------------------------------------;

proc contents
data =DEMOOST
varnum ;

proc means data = DEMOOST N Nmiss min max maxdec = 2 ;
run ;

 

Output of Program

Click here to view program output and highlights

 

close window icon Close Window to return to module page.