## Task 2: How to Calculate Confidence Intervals for Geometric Means Using SUDAAN

### Step 1: Calculating Confidence Intervals for Geometric Means Using SUDAAN

This task will outline how to calculate confidence intervals for geometric means. See "How to Perform Statistical Tests and Calculate Confidence Limits with Degrees of Freedom"  in the Variance Estimation module for basic programming steps for calculating confidence limits.

When the data are highly skewed you will need to transform them. For example, you can obtain the geometric mean by applying a log transformation to the data.

In this example, you will be calculating geometric means for the fasting serum triglyceride variable. Obtain the geometric mean and its standard error directly from the SUDAAN proc descript procedure and then output them to a SAS dataset where the CI can be constructed directly. The explanations in the summary table below provide an example that you can follow.

IMPORTANT NOTE

These programs use variable formats listed in the Tutorial Formats page. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.

Generating the Geometric Mean and Standard Error from SUDAAN
Statements Explanation
proc sort 1 =analysis_data; 1 sdmvstra sdmvpsu;

Use the SAS procedure, proc sort, to sort the data by strata (sdmvstra) and PSU (sdmvpsu).

proc descript geometric atlevel1= 1 atlevel2= 2

Use proc descript to specify the dataset (analysis_Data).

Use the geometric option to compute and print of geometric means and their standard errors.

The ATLEVEL1=1 and ATLEVEL2=2 options specify the sampling stages (in NHANES, the number of strata is level 1, and the number of PSUs is level 2) for which you want counts per table cell. ATLEV1 is the number of strata with at least one valid observation and ATLEV2 is the number of PSUs with at least one valid observation. These numbers are used to calculate degrees of freedom.

nest sdmvstra sdmvpsu;

Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects.

1 wtsaf4yr;

Use the morning fasting sample weight for 4 years of data (wtsaf4yr) because serum triglyceride was obtained in persons examined in the morning who fasted for 9+ hours.

subpopn ridageyr>=20 /name= "Adults 20 years and older" ;

Use the subpopn statement to select the subpopulation of interest. In this example, sample persons 20 years and older (ridageyr>=20) are used.

1 age1 riagendr/nofreq;

Use a class statement to list discrete variables upon which subgroups are based. In this example, gender (riagendr) and age (age1) are used.

1 lbxtr;

Use a var statement to select the serum triglyercide variable (lbxtr) as your variable of interest.

1 riagendr*age1;

Use the table statement to request prevalence of serum triglyceride stratified on gender (riagendr) within each age group (age1).

print nsum geomean segeomean/style=NCHS geomeanfmt=f6.0 segeomeanfmt= f6.1

Use the print statement to print the number of observations (nsum), geometric means (geomean), and standard errors of geometric means (segeomean).

1 nsum geomean segeomean atlev1 atlev2/filename=tg9902 replace;

run ;

Use an output statement to output the number of observations (nsum)geometric mean (geomean), standard error of the geometric mean (segeomean), number of strata (atlev1), and number of PSUs (atlev2) to a SAS file named tg9902.

Calculate Confidence Intervals from SAS Output Dataset
Statements Explanation
data newtg9902;>

set tg9902;

df=atlev2-atlev1;

Use the data statement to create a new dataset (newtg9902) from the SAS dataset created previously (tg9902).

Calculate the degrees of freedom (df) from the number of PSU (atlev2) minus the number of strata (atlev1).

1 PROCNUM TABLENO VARIABLE _C1 _C2 ATLEV1 ATLEV2;

Use a drop statement to drop the selected variables from the dataset.

ll=round(geomean+tinv(.025 ,df)*segeomean);

ul=round(geomean+tinv(.975 ,df)*segeomean);

geomean=round(geomean);segeomean=round(segeomean,.1 );

ciwidth=ul-ll;

Use these statements to calculate the lower limit (ll), upper limit (ul), geometric mean (geomean), and confidence intervals (ciwidth).

proc print 1 = '/' noobs ; 1 age1 age1fmt. riagendr sexfmt. nsum 7.0 geomean 6.0 segeomean 6.1 df 2.0 ;

label ll= 'Lower' / 'Limit' ul= 'Upper' / 'limit' df= 'Degrees' / 'of' / 'freedom'

ciwidth='Confidence' / 'interval' / 'width' ;

title1 'Geometric mean of serum triglyceride and 95 % Confidence' ;
title2 'interval of adults 20 years and older:' ;
title2 'United States, 1999-2002' ;
run ;

Use the proc print procedure to output the age group (age1), gender (riagendr), number of observations (nsum), geometric means (geomean), standard error of the geometric mean (segeomean), and degrees of freedom (df).

### Step 2: Review Output

• If you used the proc univariate procedure on the fasting serum triglycerides and compared the mean and median values, you would see that the difference is substantial as triglyceride is a highly skewed variable. Therefore, you should use geometric means.
• The geometric mean for males increases up to age 40-49 years and then declines.
• The geometric mean for females increases up to age 60-69 years and then declines.
• The width of the confidence interval (CI) is wider for males than for females, and is the largest for males 40-49 years, indicating more variability in the mean serum triglycerides in this group.
• Confidence intervals can also be used as a first glance to see if two groups are different, for example the CI for mean serum triglycerides for total males (CI 124, 137) and total females (CI 111, 118) do not overlap, indicating that the two groups are likely to be different.  However, a test for statistical difference, such as a t-test, should be performed in order to definitively determine a significant difference between the mean for two population sub-groups.