In this example, you will use SUDAAN to combine age subgroups and generate population estimates for high blood pressure (HBP) by sex and race/ethnicity for persons 20 years and older. The method outlined in this module uses a SAS data file with CPS population totals. The process for combining subgroups and calculating population estimates is then automated using the code outlined below.
Alternatively, you can use the CPS population totals located on the respective survey cycle NHANES web page (referred to in Key Concepts), plus the results from a proc descript or proc crosstab procedure and manually calculate population estimates within a spreadsheet. If you choose this option, you will need to define the age, race/ethnicity and gender subgroups of interest and calculate population totals within the spreadsheet on your own.
The proc descript procedure in SUDAAN and additional SAS code will be used to generate population estimates using the 3step process outlined below.
In Step 1 of this method you will calculate the prevalence of the health condition of interest (i.e., HBP) by each subdomain of interest (i.e., sex and race/ethnicity) using the proc descript procedure. You will need to use appropriate weights, especially when combining across survey cycles.
The health outcome must be coded as a dichotomous variable (0, 100) for absence (0) or presence (100) of the health condition of interest. (HBPX).
hbpx= . ;
if hbp= 1 then hbpx= 100 ;
e lse if hbp= 2 then hbpx= 0 ;
Similar to the proc descript procedure used in Task 1, you will output the results to a SAS data file using the output statement, as shown below. Population estimates will not be age standardized so that they reflect the true population sampled.
ese programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.
Statements  Explanation 

by ; 
data=analysis_data ;
Use the proc sort procedure to sort the dataset by strata (sdmvstra) and PSU (sdmvpsu). 
data=analysis_data design=wr noprint mean atlevel1=1 atlevel2= 2 ; 
Use proc descript to generate means and specify the sample design using the design option WR (with replacement). 
subpopn ridageyr >=20 ; 
Use the subpopn statement to select sample persons 20 years and older (ridageyr >=20) because only those individuals are of interest in this example. Note: For accurate estimates, it is preferable to use the subpopn statement in SUDAAN to select a subpopulation for analysis, rather than to select the study population in the SAS program while preparing the data file. 
NEST sdmvstra sdmvpsu;  Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects. 
WEIGHT wtmec4yr;  Use the weight statement to account for the unequal probability of sampling and nonresponse. In this example, the MEC weight for 4 years of data (wtmec4yr) is used. 
subgroup riagendr age race ; 
Use the subgroup statement to list the categorical variables for which estimates are requested. These names will also appear in the table statement below. In this example, gender (riagendr), age (age), and raceethnicity (race) are of interest. 
levels 2 3 4; 
Use the levels statement to define the number of categories in each of the subgroup variables. The level must be an integer greater than 0. This example has two genders, three age groups, and four raceethnicity groups. 
var hbpx ; 
Use the var statement to name the variable(s) to be analyzed. In this example, the high blood pressure variable (hbpx) is used. 
table riagendr * race ; 
Use the table statement to specify crosstabulations for which estimates are requested. If a table statement is not present, a onedimensional distribution is generated for each variable in the subgroup statement. In this example, the estimates are for gender (riagendr) by raceethnicity (race). 
OUTPUT NSUM MEAN SEMEAN atlev2 atlev1 / FILENAME=nh9902 FILETYPE=SAS REPLACE; ; 
Use an output statement to output the number of observations (nsum), mean (mean), standard error of the mean (semean), number of strata (atlev1), and number of PSUs (atlev2) to a SAS file named nh9902. 
These data will be fed into a SAS program where degrees of freedom, tstatistics, and 95% confidence limits will be calculated for each prevalence estimate, as shown below. Values also will be labeled and formatted.
Statements  Explanation 

SET nh9902; nh9902c;df=atlev2atlev1; 
Use the data statement to create a new dataset (nH9902c) from the SAS dataset created in Step 1 above (nh9902). Calculate the degrees of freedom (df) by subtracting the number of strata (atlev1) from the number of PSU (atlev2). 
tcritl=tinv( .025 ,df); tcritu=tinv(.975 ,df); 
Use these statements to calculate the tstatistics (tcritl and tcritu) based on the calculated degrees of freedom (df). 
ll=round((mean+tcritl*semean),.01 ); ul=round((mean+tcritu*semean),.01 ); 
Use these statements to calculate the lower limit (ll), and upper limit (ul) for the Wald 95% confidence intervals. 
percent=round(mean,.01 ); sepercent=round(semean,.01 ); run ; 
Use these statements to round the estimates and rename them to percent and sepercent. 
In this step, you will combine appropriate CPS population totals across survey cycles AND across years of age to reflect the subpopulation of interest (i.e., those 20 and older by sex and race).
In this module, CPS population totals are supplied as a SAS dataset with values for: age (CTUTAGE) ranging from 0 to 85+ years ; gender (CTUTGNDR); race/ethnicity (CTUTRACE), where 1= nonHispanic white, 2=nonHispanic black, 3=Mexican American and 4=other; race/ethnicity (CTUTRETH), where 1=Mexican American, 2=nonHispanic other, 3=nonHispanic white, 4=nonHispanic black, 5=other Hispanic; ethnicity (CTUTHISP) where 1=Hispanic and 2=nonHispanic; survey cycle (CTUTSRVY); and the population total (CTUTPOPT). Appropriate age, race/ethnicity, and gender groups were created in a previous step.
The proc descript procedure in SUDAAN will be used to calculate CPS population totals for the subdomains of interest (i.e., sex and race/ethnicity) for the subpopulation of interest (age 20 and older). In this case, no sample design factors or weights need to be used. The design specified is SRS (simple random sampling). The variable is CTUTPOPT (the population totals). Subgroup totals are output to another SAS data set (pt9902) for use in the next step. Nothing is printed.
Statements  Explanation 

data= nh.cpstot9902 design=srs noprint means; 
Use proc descript to generate means and specify the sample design using the design option SRS (simple random sample). Use the SASsupplied dataset (cpstot9902) to read the CPS population totals. 
SUBPOPN ctutage >=
20 ;

Use the subpopn statement to select sample persons 20 years and older (ctutage>=20) because only those individuals are of interest in this example. Note: For accurate estimates, it is preferable to use the subpopn statement in SUDAAN to select a subpopulation for analysis, rather than to select the study population in the SAS program while preparing the data file. 
SUBGROUP ctutgndr ctutrace ; 
Use the subgroup statement to list the categorical variables for which estimates are requested. These names will also appear in the table statement below. In this example, gender (ctutgndr) and raceethnicity (ctutrace) are of interest. 
LEVELS 2 4 ; 
Use the levels statement to define the number of categories in each of the subgroup variables. The level must be an integer greater than 0. This example has two genders and four raceethnicity groups. 
VAR ctutpopt; 
Use the var statement to name the variable(s) to be analyzed. In this example, the population total variable (ctutpopt) is used. 
TABLES ctutgndr*ctutrace ; 
Use the table statement to specify crosstabulations for which estimates are requested. If a table statement is not present, a onedimensional distribution is generated for each variable on the subgroup statement. In this example, the estimates are for gender (ctutgndr) by raceethnicity (ctutrace). 
OUTPUT
total / FILENAME=pt9902 FILETYPE=SAS REPLACE;
; 
Use an output statement to output the total number of observations to a SAS file named pt9902. 
In this last step, you will multiply prevalence estimates with corresponding CPS population totals to estimate the total number of noninstitutionalized U.S. citizens affected with HBP.
Note that the datasets produced in Step 1 and Step 2 will be sorted on the subdomain variables and merged. The new dataset will be used in the final SAS program. Percent prevalence estimates as well as lower and upper 95% confidence limits will be multiplied to the corresponding population total for that subgroup. Results will be rounded, formatted, and printed in SAS.
Statements  Explanation 

data =nh9902c; by riagendr race; ;  Use the proc sort procedure to sort the two datasets by sex and race. In the second dataset, rename the CPS total race and gender (ctutrace and ctutgndr) variables to match the variable names used in the original dataset. 
comb; merge nh9902c( in =a) pt9902 ; by riagendr race ; if a ; 
Use the data statement to create a new dataset (comb) by merging the SAS datasets created previously (nh9902c and pt9902).

popmean=(percent/100 )*total ; popl=ll/100 *total ; popu=ul/100 *total ; 
Use these statements to calculate the population counts by applying the population totals (total) to the prevalence estimate (percent) and the 95% confidence interval limits. 
poplr=round(popl,1000 ); popur=round(popu,1000 ); popmeanr=round(popmean,1000 ); totalr=round(total,1000 ) ; 
Use these statements to round and format the estimates to the nearest thousand. 
noobs split = '/' double ; var riagendr race percent sepercent ll ul df nsum totalr popmeanr poplr popur ; format race racefmt. riagendr sexfmt. nsum 5.0 percent 5.2 sepercent 5.2 ll 4.2 ul 4.2 df 2.0 ; label percent='%' / 'with' / 'high' / 'bp' nsum='Num' / 'bp' / 'status' sepercent='Std' / 'error' ll='Lower' / '95 %' / 'Wald' / 'CI' ul='Upper' / '95 %' / 'Wald' / 'CI' df='degrees' / 'of' / 'freedom' popmeanr='Pop' / 'Est' / 'US' / 'with' / 'high' / 'bp' totalr='Pop' / 'total' / 'US' poplr='Pop Est' / 'Lower' / '95 %' / 'WALD' / 'CI' popur='Pop Est' / 'Upper' / '95 %' / 'WALD' / 'CI' ; title1 'Prevalence of persons with high Bp  US, 19992002' ; title2 'Percent and population estimates of number with high BpWald CI' ; ; 
Use the proc print procedure to print the variables of interest. 
Highlights from the output
include: