The first task is to identify missing data and recode it. Here are the steps:
In this step, you will use the proc means procedure to check for missing, minimum and maximum values of continuous variables, and the proc freq procedure to look at the frequency distribution of categorical variables in your master analytic dataset. The output from these procedures provides the number and frequency of missing values for each variable listed in the procedure statement.
|
|
|
Statements |
Explanation |
|---|---|
| Proc means data=demo1_nh3 N Nmiss min max; |
Use the proc means procedure to determine the number of missing observations (Nmiss), minimum values (min), and maximum values (max) for the selected variables. |
| where hsageu=2 and hsageir>=20 and dmpstat=2; |
Use the where statement to select the participants who were age 20 years and older, and had the home interview and the MEC exam. |
|
var PEP6G1 PEP6H1 PEP6I1 PEPMNK1R PEP6G3 PEP6H3 PEP6I3 PEPMNK5R BMPBMI TCP TGP; run; |
Use the var statement to indicate the variables of interest. |
|
Statements |
Explanation |
|---|---|
| Proc freq data=demo1_nh3; |
Use the proc freq procedure to determine the frequency of each value of the variables listed. |
| where hsageu=2 and hsageir>=20 and dmpstat=2; | Use the where statement to select the participants who were age 20 years and older, and who had both the home interview and the MEC exam. |
|
Table haf10 hac1c hac1d har1 har3 hfa8r mapf12r hssex dmarethn hae1
hae2 hae3 hae5a hae6 hae7 hae9d /list missing;
run; |
Use the table statement to indicate the variables of interest. Use the list missing option to display the missing values. |
Highlighted items from proc means and proc freq output:
Two options can be used to recode the missing data:
|
Statement |
Explanation |
|---|---|
|
Data
temp2_nh3;
|
Use the data statement to create a new dataset from your existing dataset; the name of the existing dataset is listed after the set statement. |
|
if hae1 in (8, 9) then hae1=.; if hae3 in (8, 9) then hae3=.; |
Use the if…then statement to recode "8" and "9" values of a variable as missing. |
|
Statement |
Explanation |
|---|---|
| Data NH3.demo2_nh3; set NH3.demo1_nh3; |
Use the data statement to create a new dataset from your existing dataset; the name of the existing dataset is listed after the set statement. |
|
array _rdmiss hae1 hae3 hae5a hae7 hae9d hac1c haf10 hac1d mapf12r ; do over _rdmiss; if _rdmiss in (8, 9) then _rdmiss=.; array _rgmiss pep6g1 pep6h1 pep6i1 pepmnk1r pep6g3 pep6h3 pep6i3 pepmnk5r tcp ; do over _rgmiss; if _rgmiss in (888) then _rgmiss=.; end; if bmpbmi = 8888 then bmpbmi=.; if tgp = 8888 then tgp=.; if hfa8r in (88, 99) then hfa8r=.; run; |
Use the array statement to recode "8" and "9" values, etc ... of a variable as missing. In this example, _rdmiss designates the name of the array. Use this option when you want to recode multiple variables that use the same numeric value for "refused" and "don't know". Assign missing values to the remaining variables one at a time. |
In this step we will use the proc freq procedure to ensure that the recoding in the previous step was done correctly. As a general rule, if 10% or less of your data for a variable are missing from your analytic dataset, it is usually acceptable to continue your analysis without further evaluation or adjustment. However, if more than 10% of the data for a variable are missing, you may need to determine whether the missing values are distributed equally across socio-demographic characteristics, and decide whether further imputation of missing values or use of adjusted weights are necessary. (Please see Analytic Guidelines for more information.)
| Statement | Explanation |
|---|---|
| Proc freq data=demo2_nh3; | Use the proc freq procedure to determine the frequency of each value of the variables listed. |
| where hsageu=2 and hsageir>=20 and dmpstat=2; |
Use the where statement to select the study group who were age 20 years and older, and who had both the home interview and the MEC exam. |
|
table
hae1--hae3 hae5a hae6 hae7 hae9d hac1c haf10 hac1d mapf12r /list
missing;
run; |
Use the table statement to indicate the variables of interest. |
Highlighted items from the proc freq output for recording missing values: