Here are the steps for identifying and recoding missing data:
In this step, you will use the PROC MEANS procedure to check for missing, minimum, and maximum values of continuous variables, and the PROC FREQ procedure to look at the frequency distribution of categorical variables in your analytic dataset. The output from these procedures provides the number and frequency of missing values for each variable listed in the procedure statement.
The following examples will use the PROC MEANS and PROC FREQ procedures on the same set of variables without distinguishing continuous and categorical variables. However, if you use the PROC FREQ procedure on a continuous variable with many values, the output could be extensive.
In the examples below, you will check for missing values as well and minimum and maximum values for the osteoporosis variables used in the sample "Supplement" program.
*----------------------------------------------------------------;
* Use the PROC MEANS procedure to determine the number of
;
* observations (N), the number of missing observations (Nmiss),
;
* minimum values
(min), and maximum values (max) for the
;
* selected variables. Use
the WHERE statement to select the INT ;
* sample weight and to
select females who are 20 years of age ;
* and older. Use the VAR statement to list the variables of
;
* interest.
;
*----------------------------------------------------------------;
data =DEMOOST
N
Nmiss
min
max ;
where WTINT2YR >
0
and RIAGENDR= 2
and RIDAGEYR >=
20 ;
var
OSQ060 OSQ070;
;
*-----------------------------------------------------------------;
* Use the PROC FREQ procedure to determine the frequency
of each ;
* value of the variables listed. Use the WHERE
statement to ;
* select the INT sample weight and to select females who
are 20 ;
* years of age and older. Use the TABLES statement
to list the ;
* variables of interest. Use the list missing
statement option ;
* to display the missing values.
;
*-----------------------------------------------------------------;
data =DEMOOST;
where WTINT2YR >
0
and RIAGENDR= 2
and RIDAGEYR >=
20 ;
tables
OSQ060 OSQ070/ list
missing ;
;
Click here to view program output and highlights
To recode missing data, assign missing values one variable at a time using an IF…THEN statement, as demonstrated in the excerpt of the "Supplement" program below.
*-------------------------------------------------------------;
* Recode DONT KNOW responses to missing for OS1060 and
OSQ070 ;
*-------------------------------------------------------------;
DEMOOST;
set
DEMOOST;
OSQ060= 9
then
OSQ060= . ;
if
OSQ070= 9
then
OSQ070= . ;
;
Step 3: Evaluate Extent of Missing Data
In this step you will use the PROC FREQ procedure to ensure that the variables were recoded correctly in the previous step. This example is from the "Supplement" program.
*-------------------------------------------------------------------;
* Use the PROC FREQ procedure to determine the frequency of each ;
* value of the variables listed. Use the WHERE statement to select ;
* the INT
sample weight and to select females who are 20 years of ;
* age and older.
Use the TABLES statement to list the variables of ;
* interest.
Use the list missing statement option to display the ;
* missing
values. ;
*-------------------------------------------------------------------;
data =DEMOOST;
where WTINT2YR >
0
and RIAGENDR= 2
and RIDAGEYR >=
20 ;
tables
OSQ060 OSQ070/ list
missing ;
;
Click here to view program output and highlights