This task will provide you with a method to obtain confidence intervals for geometric means.
When the data are highly skewed you will need to transform them. For example, you can obtain the geometric mean by applying a log transformation to the data.
In this example, you will obtain geometric means for the fasting serum triglyceride variable. You can see that fasting triglycerides has a right skew by looking at the distribution with this command: sum lbxtr [w=wtsaf4yr], det – which shows that median value is 106 but the mean is 135. So, the geometric mean is a better representation of central tendency than the regular mean.
Obtain the mean and its standard error of the log transformed fasting serum triglyceride variable from the Stata command svy:mean and then use ereturn display, eform( ) to display the exponentiated coefficients (geometric mean, standard error and confidence interval). The explanations in the summary table below provide an example that you can follow.
There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.
Remember that you need to define the SVYSET before using the SVY series of commands. The general format of this command is below:
svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)
To define the survey design variables for your fasting serum triglyceride analysis, use the weight variable for fouryears of MEC data obtained from persons who fasted nine hours and were examined in the morning at the MEC(wtsaf4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization. Here is the svyset command for four years of MEC data obtained from persons who fasted nine hours and were examined in the morning:
svyset [w= wtsaf4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)
The gen command is used to created new variables. The ln option creates the log of the variable of interest. The general format of this command is below.
gen logvar=ln(var)
In this example, you will create the log transformed triglycerides variable (lnlbxtr) for the triglycerides variable (lbxtr) using this command:
gen lnlbxtr=ln(lbxtr)
Now, that the svyset has been defined you can use the Stata command, svy: mean, to generate means and standard errors. To display the geometric mean in the original units of the variable, use the ereturn display command with the eform option. The general command for obtaining weighted means and standard errors of a subpopulation is below.
svy: mean varname, subpop(if condition)
ereturn display, eform(varname)
Use the svy : mean command with the log transformed triglyceride variable (lnlbxtr) to estimate the mean the geometric mean of triglycerides for people age 20 years and older. Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0. Use ereturn display, eform( ) to display the geometric mean in the original units of triglyceride (i.e., the exponentiated coefficients) (geo_mean), standard error, and confidence interval.
svy:mean lnlbxtr, subpop(if ridageyr>=20 & ridageyr<.)
ereturn display, eform(geo_mean)
You can also add the over() option to the svy:mean command to generate the means for different subgroups. To display the geometric mean in the original units of the variable, use the ereturn display command with the eform option. Here is the general format of these commands for this example:
svy: mean varname, subpop(if condition) over(var1 var2)
ereturn display, eform(varname)
Use the svy : mean command with the log transformed triglyceride variable (lnlbxtr) to estimate the mean the geometric mean of triglycerides for people age 20 years and older. Use the subpop( ) option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if statement to define the subpopulation based on the age variable's (ridageyr) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0. Use the over option to get stratified results. This example produces estimates by gender and age. Use ereturn display, eform( ) to display the geometric mean in the original units of triglyceride (i.e., the exponentiated coefficients) (geo_mean), standard error, and confidence interval.
svy:mean lnlbxtr, subpop(if ridageyr>=20 & ridageyr<.) over(riagendr age1)
ereturn display,
eform(geo_mean)
Here is a table summarizing the output for the variable fasting triglyceride (lbxtr):
Subpopulation analyzed  Number of respondents with data  Geometric Mean 
95% confidence interval 

Adults age 20 and older 
3,982 
122 
118126 
Men age 20 and older 
1,893 
130 
124137 
Women age 20 and older 
2,089 
114 
111118 
Men age 2029 

103 
96111 
Men age 3039 

122 
115129 
Men age 4049 

153 
136172 
Men age 5059 

148 
135162 
Men age 6069 

141 
129154 
Men 70+ 

125 
117134 
Women age 2029 

97 
91104 
Women age 3039 

102 
96107 
Women age 4049 

104 
96112 
Women age 5059 

133 
123143 
Women age 6069 

144 
136152 
Women age 70+ 

142 
133151 
According to the stratified analysis, men's fasting trigylcerides is 16 points higher than women's. Confidence intervals can also be used as a first glance to see if two groups are different, for example the CI for mean serum triglycerides for total males (CI 124, 137) and total females (CI 111, 118) do not overlap, indicating that the two groups are likely to be different. However, a test for statistical difference, such as a ttest, should be performed in order to definitively determine a significant difference between the mean for two population subgroups. The geometric mean for males increases up to age 4049 years and then declines. The geometric mean for females increases up to age 6069 years and then declines. The width of the confidence interval (CI) is wider for males than for females, and is the largest for males 4049 years, indicating more variability in the mean serum triglycerides in this group.