In this example, you will use SAS Survey Procedures to combine age subgroups and generate population estimates for high blood pressure (HBP) by sex and race/ethnicity for persons 20 years and older. The method outlined in this module uses a SAS data file with CPS population totals. The process for combining subgroups and calculating population estimates is then automated using the code outlined below.
Alternatively, you can use the CPS population totals located on the respective survey cycle NHANES web page (referred to in Key Concepts), plus the results from a proc surveymeans procedure and manually calculate population estimates within a spreadsheet. If you choose this option, you will need to define the age, race/ethnicity and gender subgroups of interest and calculate population totals within the spreadsheet on your own.
The SAS Survey Procedure, proc surveymeans, is used to generate population estimates. The general program for obtaining population estimates is outlined in the 3step process below:
In the first step, you will calculate the prevalence of the health condition (i.e. HBP) by subdomains of interest. You will need to use appropriate weights, especially when combining across survey cycles.
The health outcome must be coded as a dichotomous (0, 100) variable for absence (0) or presence (100) of the health condition of interest (i.e. HBP and HBPX).
hbpx=. ;
if hbp= $1 then hbpx= 100 ;
else if hbp= $1 then hbpx= $1 ;
A new variable (sel) will be created to reflect the study subpopulation of interest (age 20 years and older) used in the domain statement of the proc surveymeans procedure.
sel=. ;
If ridageyr ge 20 then sel=1;
Else sel=2;
Population estimates will not be age standardized, so the estimates reflect the true population sampled. The results will be output to a SAS data file using the ods output statement below.
These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.
Statements  Explanation 

data=ANALYSIS_DATA nobs mean stderr clm;  Use the proc surveymeans procedure to obtain number of observations, mean, standard error and confidence intervals. 
strata sdmvstra;  Use the stratum statement to define the strata variable (sdmvstra). 
cluster sdmvpsu; 
Use the cluster statement to define the PSU variable (sdmvpsu). 
class  Use the class statement to specify the discrete variables used to select the subpopulations of interest (i.e., gender [riagendr] and race [race]). 
var hbpx; 
Use the var statement to specify which variable(s) will be analyzed. In this example, the HBP variable (hbpx) is used. 
weight 
Use the weight statement to account for the unequal probability of sampling and nonresponse. In this example, the MEC weight for 4 years of data (wtmec4yr) is used. 
domain sel sel*riagendr sel*race sel*riagendr*race; 
Use the domain statement to specify the subpopulations of interest. 
ods
OUTPUT domain(match_all)=unadj; ; 
Use the ods statement to output the SAS dataset of estimates from the subdomains listed on the domain statement. This set of commands will output four datasets for each domain specified in the domain statement above (unadj for sel unadj1 for sel*riagendr, unadj2 for sel*race, and undadj3 for sel*riagendr*race). 
Statements  Explanation 

set unadj unadj1 unadj2 unadj3; 
bp_stats;
Use the data statement to create a new dataset (bp_stats) from the SAS dataset created previously (unadj unadj1 unadj2 unadj3). 
if sel= 1 ; if race= . then race= 0 ; if riagendr= . then riagendr= 0 ; 
Use the if statement to select the subgroups of interest. Use if, then statements to recode missing values to 0 for race and riagendr. 
ll=round(lowerclmean,.01 ); ul=round(upperclmean,.01 ); 
Use these statements to round and rename the lower limit (lowerclmean to ll), and upper limit (upperclmean to ul) of the Wald 95% confidence intervals. 
percent=round(mean,.01 ); sepercent=round(stderr,.01 ); ; 
Use these statements to round the mean and standard error estimates and rename them to percent and sepercent, respectively. 
In Step 2, you will combine appropriate CPS population totals across survey cycles AND across years of age to reflect the subpopulation of interest (i.e., those 20 and older).
In this module, CPS population totals are supplied as a SAS dataset with values for: age (CTUTAGE) ranging from 0 to 85+ years ; gender (CTUTGNDR); race/ethnicity (CTUTRACE), where 1= nonHispanic white, 2=nonHispanic black, 3=Mexican American and 4=other; race/ethnicity (CTUTRETH), where 1=Mexican American, 2=nonHispanic other, 3=nonHispanic white, 4=nonHispanic black, 5=other Hispanic; ethnicity (CTUTHISP) where 1=Hispanic and 2=nonHispanic; survey cycle (CTUTSRVY); and the population total (CTUTPOPT). Appropriate age, race/ethnicity, and gender groups were created in a previous step.
The proc means procedure for simple random samples in SAS will be used to calculate CPS population totals for the subdomains of interest (i.e., sex and race) for the subpopulation of interest (age 20 and older). In this case, no sample design factors or weights need to be used. Subgroup totals are output to another SAS data set (saspt9902) for use in Step 3.
Statements  Explanation 

data =nh.cpstot9902; where ctutage >= 20 ; 
Use the proc means procedure and the where statement to calculate totals for persons 20 years of age and older. 
var ctutpopt; 
Use the var statement to select the variable of interest (ctutpopt). 
output out =d1
n = n sum = sum ; ; 
Use the ouput statement to create a dataset (d1) for the population totals (sum). 
data =nh.cpstot9902;
by
ctutgndr; ; 
Use the proc sort procedure to sort the dataset by sex. 
data =nh.cpstot9902; where ctutage >= 20 ; 
Use the proc means procedure and the where statement to calculate totals for persons 20 years of age and older. 
var ctutpopt; 
Use the var statement to select the variable of interest (ctutpopt). 
by ctutgndr; 
Use the by statement to generate population totals by sex (ctutgndr). 
output out =d2
n = n sum = sum ; ; 
Use the output statement to create a dataset (d2) for the population totals (sum). 
data =nh.cpstot9902; by ctutrace; ; 
Use the proc sort procedure to sort the dataset by race. 
data =nh.cpstot9902; where ctutage >= 20 ; 
Use the proc means procedure and the where statement to calculate totals for persons 20 years of age and older. 
var ctutpopt; 
Use the var statement to select the variable of interest (ctpopt). 
by ctutrace; 
Use the by statement to generate population totals by race. 
output out =d3
n = n sum = sum ; ; 
Use the output statement to create a dataset (d3) for the population totals (sum). 
data =nh.cpstot9902; by ctutgndr ctutrace; ; 
Use the proc sort procedure to sort the dataset by sex and race. 
data =nh.cpstot9902; where ctutage >= 20 ; 
Use the proc means procedure and the where statement to calculate totals for persons 20 years of age and older. 
varctutpopt; by ctutgndr ctutrace; 
Use the var statement to select the variable of interest (ctutpopt). Use the by statement to generate population totals by sex and race. 
output out =d4 n = n sum = sum ; ; 
Use the output statement to create a dataset (d4) for the population totals (sum). 
set d1 d2 d3 d4; if ctutrace= . if ctutgndr= . then ctutgndr= 0 ; ; 
saspt9902;
This data step consolidates the datasets created above into a single dataset for use in the next step (saspt9902). 
In this last step, you will multiply prevalence estimates with corresponding CPS population totals to estimate the total number of noninstitutionalized U.S. citizens affected with HBP.
Note that the datasets produced in Step 1 and Step 2 will be sorted on the subdomain variables and merged. The new dataset will be used in the final SAS program. Percent prevalence estimates as well as lower and upper 95% confidence limits will be multiplied to the corresponding population total for that subgroup. Results will be rounded, formatted, and printed in SAS.
Statements  Explanation 

data =bp_stats;
by riagendr race ;
;
data =saspt9902(rename=(ctutgndr=riagendr ctutrace=race)); by riagendr race ; ;

Use the proc sort procedure to sort the two datasets by sex and race. In the second dataset, rename the CPS total race and gender (ctutrace and ctutgndr) variables to match the variable names used in the original dataset. 
comb; merge (in =a) saspt9902 ; by riagendr race ; if a ; 
Use the data statement to create a new dataset (comb) by merging SAS datasets created previously (bp_stats and saspt9902). Keep all data for both datasets if values for race and sex exist in bp_stats (in=a).

popmean=(percent/100 )*total ; popl=ll/100 *sum ; popu=ul/100 *sum ; 
Use these statements to calculate the population counts by applying the population totals (sum) to the prevalence estimate (percent) and the 95% confidence interval limits. 
poplr=round(popl,1000 ); popur=round(popu,1000 ); popmeanr=round(popmean,1000 ); totalr=round(total,1000 ) ; 
Use these statements to round and format the estimates to the nearest thousand. 
'/' double; noobs split=var riagendr race percent sepercent ll ul n totalr popmeanr poplr popur ; formatrace racefmt. riagendr sexfmt. n 5.0 percent 5.2 sepercent 5.2 ll 4.2 ul 4.2 ; label percent='%' / 'with' / 'high' / 'bp' n='Num' / 'bp' / 'status' sepercent='Std' / 'error' ll='Lower' / '95 %' / 'Wald' / 'CI' ul='Upper' / '95 %' / 'Wald' / 'CI' popmeanr='Pop' / 'Est' / 'US' / 'with' / 'high' / 'bp' totalr='Pop' / 'total' / 'US' poplr='Pop Est' / 'Lower' / '95 %' / 'WALD' / 'CI' popur='Pop Est' / 'Upper' / '95 %' / 'WALD' / 'CI' ; title1 'Prevalence of persons with high Bp  US, 19992002' ; title2 'Percent and population estimates of number with high BpWald CI' ; ; 
Use the proc print procedure to print the variables of interest. 
Highlights from the output
include: