The NHANES sample is designed to be nationally representative of the civilian, non-institutionalized U.S. population, in that it does not include persons residing in nursing homes, institutionalized persons, or U.S. nationals living abroad. Thus, for NHANES 1999-2010, each year's sample and any combination of samples from consecutive years comprise a nationally representative sample of the resident, non-institutionalized U.S. population.
In order to have sufficient sample sizes to obtain stable estimates for population subgroups of common interest, NHANES data are released in 2-year cycles. Just as each year's sample is representative of the resident, non-institutionalized U.S. population, the same is true for these 2-year cycles of data.
NHANES data are not obtained using a simple random sample. Rather, a complex, multistage, probability sampling design is used to select participants. The NHANES sampling procedure consists of 4 stages, shown and described below.
Stage 1: Primary sampling units (PSUs) are selected from strata defined by geography and proportions of minority populations. These are mostly single counties or, in a few cases, groups of contiguous counties selected with probability proportional to a measure of size (PPS). Most strata contain two PSUs. Additional stages of sampling are performed to select various types of secondary sampling units (SSUs), namely the segments, households, and individuals that are selected in Stages 2, 3, and 4.
Stage 2: The PSUs are divided into segments (generally city blocks or their equivalent). As with each PSU, sample segments are selected with PPS.
Stage 3: Households within each segment are listed, and a sample is randomly drawn. In geographic areas where the proportion of age, ethnic, or income groups selected for over-sampling is high, the probability of selection for those groups is greater than in other areas.
Stage 4: Individuals are chosen to participate in NHANES from a list of all persons residing in selected households. Individuals are drawn at random within designated age-sex-race/ethnicity screening sub-domains. On average, 1.6 persons are selected per household.
Further details about the sampling plan for NHANES 2003-2004 can be found in the 2003-2004 Interviewer Procedures Manual.
It is important to note that pregnant women are sampled slightly differently during this stage. From 1999-2006, all women of childbearing age (15-39 years of age) were asked during the screening interview whether they were pregnant and a supplementary sample of pregnant women were included in the sample.
NHANES' unique sampling design has consequences for two issues that are critical to conducting accurate dietary data analyses – the use of sample weights and special procedures for variance estimation.
To make a collection of participants selected under the complex NHANES survey design represent the US non-institutionalized civilian population, each sampled person is assigned a numerical sample weight. This measures the number of people in the population represented by that particular sampled person. Sample weights for NHANES participants incorporate adjustments for unequal selection probabilities and certain types of non-response, as well as an adjustment to independent estimates (called control totals) of population sizes for specific age, sex, and race/ethnicity categories. Sample weights must be used to obtain correct national estimates from the NHANES data. For more information about the construction and use of sample weights in NHANES, see Task 2 in this tutorial or the Survey Design Factors course in the Continuous NHANES Web Tutorial.
The multistage, probability sampling design of NHANES means that individuals are selected as part of groups defined by the strata and by the primary and secondary sampling units, rather than as specific individuals. Although this makes NHANES data collection more efficient, it means that, in statistical terms, the sampling variance of NHANES estimates is not based on the counts of individuals, but instead on the counts of the groups. Special variance estimation techniques (implemented in several major statistical analysis software packages, such as SUDAAN, SAS Survey, and STATA) that account for the sampling design must be used to obtain correct measures of sampling variance.
For more information about variance estimation, please see the Estimate Variance and Analyze Subgroups module of the Basic Dietary Analyses course (coming soon).
Close Window to return to module page.