Task 2: How to Calculate Confidence Intervals for Geometric Means Using SUDAAN


Step 1: Calculating Confidence Intervals for Geometric Means Using SUDAAN

This task will outline how to calculate confidence intervals for geometric means. See "How to Perform Statistical Tests and Calculate Confidence Limits with Degrees of Freedom"  in the Variance Estimation module for basic programming steps for calculating confidence limits.

When the data are highly skewed you will need to transform them. For example, you can obtain the geometric mean by applying a log transformation to the data.

In this example, you will be calculating geometric means for the fasting serum triglyceride variable. Obtain the geometric mean and its standard error directly from the SUDAAN proc descript procedure and then output them to a SAS dataset where the CI can be constructed directly. The explanations in the summary table below provide an example that you can follow.



These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

Generating the Geometric Mean and Standard Error from SUDAAN
Statements Explanation
proc sort 1 =analysis_data; 1 sdmvstra sdmvpsu; 

Use the SAS procedure, proc sort, to sort the data by strata (sdmvstra) and PSU (sdmvpsu).

proc descript geometric atlevel1= 1 atlevel2= 2

Use proc descript to specify the dataset (analysis_Data).

Use the geometric option to compute and print of geometric means and their standard errors.

The ATLEVEL1=1 and ATLEVEL2=2 options specify the sampling stages (in NHANES, the number of strata is level 1, and the number of PSUs is level 2) for which you want counts per table cell. ATLEV1 is the number of strata with at least one valid observation and ATLEV2 is the number of PSUs with at least one valid observation. These numbers are used to calculate degrees of freedom.

nest sdmvstra sdmvpsu;

Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects.

1 wtsaf4yr;

Use the morning fasting sample weight for 4 years of data (wtsaf4yr) because serum triglyceride was obtained in persons examined in the morning who fasted for 9+ hours.

subpopn ridageyr>=20 /name= "Adults 20 years and older" ;

Use the subpopn statement to select the subpopulation of interest. In this example, sample persons 20 years and older (ridageyr>=20) are used.

1 age1 riagendr/nofreq; 

Use a class statement to list discrete variables upon which subgroups are based. In this example, gender (riagendr) and age (age1) are used.

1 lbxtr; 

Use a var statement to select the serum triglyercide variable (lbxtr) as your variable of interest.

1 riagendr*age1;

Use the table statement to request prevalence of serum triglyceride stratified on gender (riagendr) within each age group (age1).

print nsum geomean segeomean/style=NCHS geomeanfmt=f6.0 segeomeanfmt= f6.1

Use the print statement to print the number of observations (nsum), geometric means (geomean), and standard errors of geometric means (segeomean).

1 nsum geomean segeomean atlev1 atlev2/filename=tg9902 replace; 

run ;

Use an output statement to output the number of observations (nsum)geometric mean (geomean), standard error of the geometric mean (segeomean), number of strata (atlev1), and number of PSUs (atlev2) to a SAS file named tg9902.


Calculate Confidence Intervals from SAS Output Dataset
Statements Explanation
data newtg9902;>

set tg9902;


Use the data statement to create a new dataset (newtg9902) from the SAS dataset created previously (tg9902).

Calculate the degrees of freedom (df) from the number of PSU (atlev2) minus the number of strata (atlev1).


Use a drop statement to drop the selected variables from the dataset.

ll=round(geomean+tinv(.025 ,df)*segeomean);

ul=round(geomean+tinv(.975 ,df)*segeomean);

geomean=round(geomean);segeomean=round(segeomean,.1 );


Use these statements to calculate the lower limit (ll), upper limit (ul), geometric mean (geomean), and confidence intervals (ciwidth).

proc print 1 = '/' noobs ; 1 age1 age1fmt. riagendr sexfmt. nsum 7.0 geomean 6.0 segeomean 6.1 df 2.0 ;

label ll= 'Lower' / 'Limit' ul= 'Upper' / 'limit' df= 'Degrees' / 'of' / 'freedom'

ciwidth='Confidence' / 'interval' / 'width' ;

title1 'Geometric mean of serum triglyceride and 95 % Confidence' ;
title2 'interval of adults 20 years and older:' ;
title2 'United States, 1999-2002' ;
run ;

Use the proc print procedure to output the age group (age1), gender (riagendr), number of observations (nsum), geometric means (geomean), standard error of the geometric mean (segeomean), and degrees of freedom (df).


Step 2: Review Output


close window icon Close Window