This task will outline how to calculate confidence intervals for geometric means. See "How to Perform Statistical Tests and Calculate Confidence Limits with Degrees of Freedom" in the Variance Estimation module for basic programming steps for calculating confidence limits.
When the data are highly skewed you will need to transform them. For example, you can obtain the geometric mean by applying a log transformation to the data.
In this example, you will be calculating geometric means for the fasting serum triglyceride variable. Obtain the geometric mean and its standard error directly from the SUDAAN proc descript procedure and then output them to a SAS dataset where the CI can be constructed directly. The explanations in the summary table below provide an example that you can follow.
|
|
| Statements | Explanation |
|---|---|
| proc sort data=analysis_data;by sdmvstra sdmvpsu; |
Use the SAS procedure, proc sort, to sort the data by strata (sdmvstra) and PSU (sdmvpsu). |
|
proc
descript
geometric atlevel1=1
atlevel2=2;
|
Use proc descript to specify the dataset (analysis_Data). Use the geometric option to compute and print of geometric means and their standard errors. The ATLEVEL1=1 and ATLEVEL2=2 options specify the sampling stages (in NHANES, the number of strata is level 1, and the number of PSUs is level 2) for which you want counts per table cell. ATLEV1 is the number of strata with at least one valid observation and ATLEV2 is the number of PSUs with at least one valid observation. These numbers are used to calculate degrees of freedom. |
|
nest sdmvstra sdmvpsu;
|
Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects. |
|
weight
wtsaf4yr;
|
Use the morning fasting sample weight for 4 years of data (wtsaf4yr) because serum triglyceride was obtained in persons examined in the morning who fasted for 9+ hours. |
| subpopn ridageyr>=20/name="Adults 20 years and older"; | Use the subpopn statement to select the subpopulation of interest. In this example, sample persons 20 years and older (ridageyr>=20) are used. |
| class age1 riagendr/nofreq; |
Use a class statement to list discrete variables upon which subgroups are based. In this example, gender (riagendr) and age (age1) are used. |
| var lbxtr; |
Use a var statement to select the serum triglyercide variable (lbxtr) as your variable of interest. |
|
table
riagendr*age1;
|
Use the table statement to request prevalence of serum triglyceride stratified on gender (riagendr) within each age group (age1). |
| print nsum geomean segeomean/style=NCHS geomeanfmt=f6.0 segeomeanfmt=f6.1; |
Use the print statement to print the number of observations (nsum), geometric means (geomean), and standard errors of geometric means (segeomean). |
|
output
nsum geomean segeomean atlev1 atlev2/filename=tg9902
replace; run; |
Use an output statement to output the number of observations (nsum)geometric mean (geomean), standard error of the geometric mean (segeomean), number of strata (atlev1), and number of PSUs (atlev2) to a SAS file named tg9902. |
| Statements | Explanation |
|---|---|
|
data
newtg9902; set tg9902;df=atlev2-atlev1; |
Use the data statement to create a new dataset (newtg9902) from the SAS dataset created previously (tg9902). Calculate the degrees of freedom (df) from the number of PSU (atlev2) minus the number of strata (atlev1). |
| drop PROCNUM TABLENO VARIABLE _C1 _C2 ATLEV1 ATLEV2; |
Use a drop statement to drop the selected variables from the dataset. |
|
ll=round(geomean+tinv(.025,df)*segeomean); ul=round(geomean+tinv( .975,df)*segeomean);geomean=round(geomean);segeomean=round(segeomean, .1);ciwidth=ul-ll; |
Use these statements to calculate the lower limit (ll), upper limit (ul), geometric mean (geomean), and confidence intervals (ciwidth). |
|
proc
print
split='/'
noobs;format
age1
age1fmt.
riagendr
sexfmt.
nsum
7.0
geomean
6.0
segeomean
6.1
df
2.0; label ll='Lower'/'Limit' ul='Upper'/'limit' df='Degrees'/'of'/'freedom'ciwidth= 'Confidence'/'interval'/'width';title1 'Geometric mean of serum triglyceride and 95 % Confidence';title2 'interval of adults 20 years and older:'; title2 'United States, 1999-2002'; run; |
Use the proc print procedure to output the age group (age1), gender (riagendr), number of observations (nsum), geometric means (geomean), standard error of the geometric mean (segeomean), and degrees of freedom (df). |