NHANES I data are NOT obtained using a simple random sample. Rather, a complex, multistage, probability sampling design was used to select participants representative of the civilian, non-institutionalized population of the coterminous United States, excluding Indian reservations. The sample does not include persons residing in nursing homes, members of the armed forces, institutionalized persons, or U.S. nationals living abroad.
The sampling procedures for NHANES I and the NHANES I Augmentation Survey are similar but contain several differences. There are multiple stages to each. These are outlined below.
A sample weight is assigned to each sample person. It is a measure of the number of people in the population represented by that sample person in NHANES I, reflecting the unequal probability of selection, non-response adjustment, and adjustment to independent population controls. When unequal selection probability is applied, as in the NHANES I sample, the sample weights are used to produce an unbiased national estimate. More information about sample weights and how they are created can be found in the Weighting module.
NHANES was designed to sample larger numbers of certain subgroups of particular public health interest. Oversampling was done to increase the reliability and precision of estimates of health status indicators for these population subgroups.
For NHANES I, people of very low income, preschool children, women of childbearing age, and the elderly were oversampled because of concerns that these subgroups were at greater risk of malnutrition than the general population. For later NHANES surveys, different subgroups were oversampled depending on public health trends of concern at that time.
NHANES I was conducted on a nationwide probability sample of approximately 32,000 persons ages 1-74 years.
For your own analyses, it is critical to carefully review the documentation for each survey cycle to determine which subgroups were oversampled.
The NHANES I sample represented the total civilian, non-institutionalized population, ages 1-74, in the coterminous United States. Selection began using the 1960 Census and dividing the entire United States into nearly 1900 primary sampling units (PSUs). Each PSU was either a standard metropolitan statistical area (SMSA), a single county, or a group of two or three contiguous counties. These 1900 PSUs were then grouped into 357 strata, and subsequently collapsed into 40 super strata. 15 of the 40 super strata which contained a single metropolitan area of 2 million or more persons were selected with certainty. The remaining 25 super strata were classified into regions, and two PSUs were selected from each super strata using a controlled selection technique based upon the 1960 Census population for stands 1-44, and the 1970 Census population for stands 45-100, proportionate to size. This yielded a total of 65 PSUs for the NHANES I general (or nutrition) sample. The Augmentation Survey was similar to the general sample, but only 5 of the 15 super strata were selected with certainty. The remaining 10 super strata were collapsed into five groups of two each, from which only 1 super strata per group was selected. When considered as part of the 100 stand design, they were selected with certainty. Finally, only one PSU was selected from each of the remaining 25 superstrata rather than two as above, for a total of 35 PSUs.
NHANES I was conducted from April 1971 through September 1975, including both the general sample and the Augmentation Survey. For data analysis purposes, the strata and PSU variables on the data files were modified. Several of the 15 certainty strata were combined to form only 10 strata. Each of these "certainty PSUs" (i.e. strata) consists of 235 enumeration districts which were treated as pseudo-PSUs whereas each of the remaining 25 strata can be considered as being composed of exactly two PSUs each.
Unlike in continuous NHANES where masked variance units (MVUs) were used, NHANES I did not create MVUs. Instead, these pseudo primary sampling units and stratification variables are provided. In the NHANES I dataset, 35 pseudo strata and 235 pseudo PSUs were created for variance estimation. Therefore, the strata variable will have values ranging from 1-35, and the PSU variable will have values ranging from 1-235 (when using the 1-65 or 1-100 stands analytic approach). Together, these strata and the PSUs represent the variance units (sampling units used to estimate the sampling error). For tutorial exercise purposes, the stratum variable name for NHANES I is N1BM0194, and the PSU variable name is N1BM0196.
Computation of national estimates using either the 1-65 location design for the general sample, or the 1-100 location design for the detailed subsample, is the preferred option; the smaller sample designs (e.g. 66-100, or 1-35), while nationally representative, may be highly variable. The strata and PSU variables provided on the data files are only appropriate for use with the 1-65 or 1-100 sample design analyses. Analyses requiring use of the smaller sample designs will necessitate a recoding of the strata and PSU variables by the researcher, following the instructions provided in the file documentation. For the six different survey samples which can be analyzed, see the Weighting module.