Task 2c: How to Obtain Confidence Intervals for Geometric Means Using Stata

This task will provide you with a method to obtain confidence intervals for geometric means.

When the data are highly skewed you will need to transform them. For example, you can obtain the geometric mean by applying a log transformation to the data.

In this example, you will obtain geometric means for the fasting serum triglyceride variable. You can see that fasting triglycerides has a right skew by looking at the distribution with this command: sum lbxtr [w=wtsaf4yr], det which shows that median value is 106 but the mean is 135. So, the geometric mean is a better representation of central tendency than the regular mean.

Obtain the mean and its standard error of the log transformed fasting serum triglyceride variable from the Stata command svy:mean and then use ereturn display, eform( ) to display the exponentiated coefficients (geometric mean, standard error and confidence interval).  The explanations in the summary table below provide an example that you can follow.

 

warning iconWARNING

There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.

 

Step 1: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of  commands. The general format of this command is below:

svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)

 

To define the survey design variables for your fasting serum triglyceride analysis, use the weight variable for four-years of MEC data obtained from persons who fasted nine hours and were examined in the morning at the MEC(wtsaf4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization.  Here is the svyset command for four years of MEC data obtained from persons who fasted nine hours and were examined in the morning:

svyset [w= wtsaf4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

 

Step 2: Create log transformed variable

The gen command is used to created new variables. The ln option creates the log of the variable of interest.  The general format of this command is below.

gen logvar=ln(var)

 

In this example, you will create the log transformed triglycerides variable (lnlbxtr) for the triglycerides variable (lbxtr) using this command:

gen lnlbxtr=ln(lbxtr)

 

Step 3:  Use svy:mean to generate geometric means and standard errors in Stata

Now, that the svyset has been defined you can use the Stata command, svy: mean, to generate means and standard errors.  To display  the geometric mean in the original units of the variable, use the ereturn display command with the eform option. The general command for obtaining weighted means and standard errors of a subpopulation is below.

svy: mean varname, subpop(if condition)

ereturn display, eform(varname)

 

Use the svy : mean command  with the log transformed triglyceride variable (lnlbxtr) to estimate the mean the geometric mean of triglycerides for people age 20 years and older. Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0.  Use ereturn display, eform( ) to display the geometric mean in the original units of triglyceride (i.e., the exponentiated coefficients) (geo_mean), standard error, and confidence interval.

svy:mean lnlbxtr, subpop(if ridageyr>=20 & ridageyr<.)

ereturn display, eform(geo_mean)

 

Output of svy:mean

Output of svy:mean

 

Step 4:  Use over option of svy:mean command to generate geometric means and standard errors for different subgroups in Stata

You can also add the over() option to the svy:mean command to generate the means for different subgroups. To display  the geometric mean in the original units of the variable, use the ereturn display command with the eform option.  Here is the general format of these commands for this example:

svy: mean varname, subpop(if condition) over(var1 var2)

ereturn display, eform(varname)

 

Use the svy : mean command  with the log transformed triglyceride variable (lnlbxtr) to estimate the mean the geometric mean of triglycerides for people age 20 years and older. Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0.  Use the over option to get stratified results. This example produces estimates by gender and age. Use ereturn display, eform( ) to display the geometric mean in the original units of triglyceride (i.e., the exponentiated coefficients) (geo_mean), standard error, and confidence interval.

svy:mean lnlbxtr, subpop(if ridageyr>=20 & ridageyr<.) over(riagendr age1)

ereturn display, eform(geo_mean)

 

Output of svy:mean command with over option

Output of svy:mean command with over option

Step 5: Review Output

Here is a table summarizing the output for the variable fasting triglyceride (lbxtr): 

Summary output for the variable fasting triglyceride (lbxtr)
Subpopulation analyzed Number of respondents with data Geometric
Mean
95% confidence interval

Adults age 20 and older

3,982

122

118-126

Men age 20 and older

1,893

130

124-137

Women age 20 and older

2,089

114

111-118

Men age 20-29

 

103

96-111

Men age 30-39

 

122

115-129

Men age 40-49

 

153

136-172

Men age 50-59

 

148

135-162

Men age 60-69

 

141

129-154

Men 70+

 

125

117-134

Women age 20-29

 

97

91-104

Women age 30-39

 

102

96-107

Women age 40-49

 

104

96-112

Women age 50-59

 

133

123-143

Women age 60-69

 

144

136-152

Women age 70+

 

142

133-151

 

According to the stratified analysis, men's fasting trigylcerides is 16 points higher than women's. Confidence intervals can also be used as a first glance to see if two groups are different, for example the CI for mean serum triglycerides for total males (CI 124, 137) and total females (CI 111, 118) do not overlap, indicating that the two groups are likely to be different.  However, a test for statistical difference, such as a t-test, should be performed in order to definitively determine a significant difference between the mean for two population sub-groups. The geometric mean for males increases up to age 40-49 years and then declines. The geometric mean for females increases up to age 60-69 years and then declines. The width of the confidence interval (CI) is wider for males than for females, and is the largest for males 40-49 years, indicating more variability in the mean serum triglycerides in this group. 

 

close window icon Close Window