Estimating mean intakes of selected foods is one of the most commonly conducted analyses of NHANES dietary data.  To obtain complete and accurate results from your analysis, consider the following issues before you begin.

#### Addressing Random Error and Bias

As noted in an earlier module, the mean of the usual intake distribution in the population is almost always the measure of interest, and this is estimated using dietary recall data.  Although dietary recall data are known to contain random errors, especially large day-to-day variability, these errors can be generally assumed to cancel out.  Therefore, the mean of 1-day intakes can be used as an estimate of the mean of the usual intake distribution in the population without specific statistical adjustment if the data are collected evenly throughout the year and the days of the week are evenly represented.

Dietary recall data also are known to contain bias, at least insofar as a tendency toward underreporting of energy.  Little is known regarding the extent to which energy underreporting extends to underreporting of foods.  For that reason, and for practical purposes, the current statistical convention is to assume that the recalls are not biased (i.e., that no underreporting occurs).  However, this assumption is more troubling than the one regarding random error and should be noted as a limitation or caveat in any analysis of this type.

IMPORTANT NOTE

When estimating the mean of the population distribution of usual dietary intakes from 24-hour recalls, single day data are sufficient and no specific statistical adjustment is necessary, but an assumption regarding lack of bias is required and should be acknowledged. The second day of dietary recall is generally not used to estimate means but is used for more advanced analyses.

#### Interpreting Measures of Central Tendency

If the data are highly skewed, as dietary data often are, means may not provide a very good representation of central tendency.  You may want to consider using the median instead of, or in addition to, the means in such an instance.  However, you should know that the simple median of reported intakes from a sample of one 24-hour recalls is not clearly interpretable with regard to usual intake (as it really represents the median on a given day).  For more information on how to obtain the distribution of usual intake and its associated median, please see the Advanced Dietary Analyses course.

#### Grouping Foods for Analysis

Because more than 7,000 food codes are used in NHANES, food intake analysis almost always involves grouping like foods together.  Analysts can group foods for their own purposes, or use previously developed grouping schemes.  One such scheme is the Food Surveys Research Group-defined food groups that measures food in grams; another is the MyPyramid food groups that measure food group equivalent amounts as defined by the MyPyramid Equivalents Database.  For more information about FSRG-defined food groups, the USDA Food Coding Scheme, or the MyPyramid Equivalents Database, see the Resources for Dietary Data Analysis module of the Survey Orientation Course.

#### Choosing Whether to Include Non-Consumers

Another consideration with estimating mean food intake is whether you are interested in the mean amount among all persons in the population, or only a given day’s consumers of the food.  That is, you need to decide whether the non-consumers should be included in the estimation.  If you are interested in the per capita amount consumed, you should include the non-consumers with their intake value at zero; if you are interested in the average amount consumed by users of the food on days when the food is consumed, you should exclude the non-consumers.

#### Using Appropriate Statistical Procedures

Means should be examined along with their standard errors, to get an indication of the variation about the mean.  Special statistical procedures are required to get appropriate standard errors when using data from a complex sample such as the NHANES.  In addition, appropriate weighting factors should be applied, so that the data will represent the population as a whole.  See Module 13, Estimate Variance, Analyze Subgroups, and Calculate Degrees of Freedom for further information.