## Key Concepts About Weighting in NHANES I

Weights are created in NHANES I to account for the complex survey design (including oversampling), survey non-response, and post-stratification.  When a sample is weighted in NHANES I it is representative of the civilian non-institutionalized Census population in the coterminous United States, excluding persons living on American Indian reservations.

### How weights are created in NHANES I

Each Sample person in the NHANES I dataset is assigned a sample weight. This sample weight is created in three steps:

### (1) Calculating the base weight

In general a sample person is assigned a weight that is equivalent to the reciprocal of his/her probability of selection.  In other words:

However, calculating the base weight for a sample person in NHANES I is much more complicated due to the survey's complex, multistage design. In NHANES I, the following equation, which takes into account the survey design, is used to determine the base weight for a sample person:

where

### (2) Adjusting for Non-response in NHANES I

Adjustment to the interview or exams

The base weights were adjusted for non-response to the MEC exam.  The reciprocal of the probability of selection of the sample persons is multiplied by a factor that brings the estimates based on examined persons up to a level that would have been attained if all sample persons had been examined. This non-response adjustment factor was computed separately within relatively homogeneous classes defined by five income groups (under \$3,000; \$3,000-\$6,999; \$7,000-\$9,999; \$10,000-\$14,999; \$15,000 or more) within each stand.  The factor is the ratio of the sum of the sample weights for all sample persons to the sum of sampling weights for all responding sample persons within the same homogeneous class.

In NHANES I, the response rates for the in-home interview were over 98%.  Thus, an individual was classified as a non-respondent only to the exam portion of the survey if they agreed to complete the interview but did not agree to, or come in for, the MEC portion of the survey. Adjustments made for survey non-response account only for exam non-response, but not for component/item non-response (i.e., a sample person declined to have their blood pressure measured in the examination component but completed all other examination components). Under certain conditions, missing data were imputed.  There are variables in the data files which indicate whether the data were imputed. Please see the specific file description for the criteria used to create the imputations.

Finally, some variables appear to have a great deal of missing data, but that is only because the NHANES design dictated that the item was to be obtained only for a particular subsample.  One asterisk means the data were obtained only on examinees at stands 1-65.  Two asterisks denote that the data were obtained from examinees in stands 66-100. Three asterisks denote that the data were obtained only for examinees receiving the detailed examination.  Four asterisks denoted items obtained only for detailed examinees from stands 1-35, and five asterisks are for items obtained only on respondents aged 1-17 years.

In general for NHANES I, 32,000 persons were selected into the sample, 31,973 completed the interview, and 23,808 were MEC examined. For detailed response rates by age and other selected demographic characteristics, please click the link below to see the Table of Response Rates for NHANES I.

For more information on sample design, weighting, non-response, variance estimation and other important analytical considerations, see

1.  A Statistical Methodology for Analyzing Data From a Complex Survey: The First National Health and Nutrition Examination Survey. Landis, J. R., Lepkowski, J. M., Eklund, S. A., Stehouwer, S. A. September 1982. 58 pp. (PHS) 82-1366. PB88-226949. PC A04 MF A01.

1. Lohr, Sharon L. Sampling: Design and Analysis, pp.265-272. Duxbury Press, 1999.

#### Adjustment for NHANES I subsample components

NHANES I was conducted in several stages. The initial design from April 1971 through June 1974 provided for the selection of a representative sample of the target population 1-74 years of age, in 65 survey locations (or stands).  All 20,749 examined persons received a specifically designed nutrition-related examination.  In addition, approximately 20% of those ages 25-74 years (3,854 persons) received a more detailed examination and questionnaire.   Furthermore, an additional 3,059 persons ages 25-74 were examined in the subsequent 35 locations in an Augmentation Survey which was conducted from July 1974 to September 1975.  Finally, the first 35 stands of the initial 65 location survey became an independent, representative sample in order to provide for early estimates of a number of nutrition-related factors, and because some components of the initial examination could no longer be continued.

In summary, there are 6 different survey samples which can be analyzed:

1. The original 65 stand survey sample which oversampled for persons of low income, preschool children, women of child-bearing age, and the elderly;
2. The initial 35 stand survey sample which includes the oversampling design and is also representative of the U.S. non-institutionalized, civilian population;
3. The Detailed Subsample from the first 65 stands which does NOT include any oversampling but is representative of the civilian non-institutionalized population ages 25-74;
4. The Detailed Subsample from the first 35 stands, similar to the previous detailed subsample otherwise;
5. The Augmentation Survey from stands 66-100 which also does NOT include any oversampling but is representative of the civilian, non-institutionalized population ages 25-74; and
6. The sample from stands 1-100 which combines the respondents from the detailed subsample from stands 1-65 with the respondents from the Augmentation Survey from stands 66-100.

For response rates by total survey and subsample components, see the tables below.

Full Surveys - Sample Sizes and Response Rates
Sample Sample Size Number
Interviewed
Percent
Interviewed
Number
Examined
Percent
Examined
Original Sample (Stands 1-65) 28,043 27,753 99.0 20,749 74.0
Augmentation Survey (Stands 66-100) 4,288 4,220 98.4 3,059 71.3
Total Participants 32,331 31,973 98.9 23.808 73.6

Detailed Surveys - Sample Sizes and Response Rates
Sample Sample Size Number
Interviewed
Percent
Interviewed
Number
Examined
Percent
Examined
Detailed Subsample (Stands1-65) 5,593 5,522 98.7 3,854 68.9
Augmentation Survey (Stands 66-100) 4,288 4,220 98.4 3,059 71.3
Total Detailed Subsample (Stands 1-100) 9,881 9,742 99.0 6,913 70.0

Half Samples - Sample Sizes and Response Rates
Sample Sample Size Number
Interviewed
Percent
Interviewed
Number
Examined
Percent
Examined
Initial (35 Stands) 14,147 13,969 98.7 10,127 71.6
Detailed Subsample (Stands 1-35) 2,798 2,753 98.4 1,892 67.6

Each of these samples has its own designated weight, which accounts for the specific probability of selection into that sample, as well as the appropriate non-response.

These sample weights are not designed to be combined. In fact, they are mutually exclusive. If it is necessary to combine two or more samples for your analyses, then appropriate weights would need to be recalculated. However, details on how to recalculate weights when combining samples go well beyond the scope of this tutorial. Therefore, it is strongly advised that you do not attempt to combine samples in any analysis of NHANES I data

### (3) NHANES I post-stratification adjustment to match 1972 U.S. Census population control totals

In addition to accounting for sample person non-response, weights are also post-stratified to match the population control totals for each sampling subdomain. This additional adjustment makes the weighted counts the same as independent controls prepared by the U.S. Bureau of the Census for the non-institutionalized population of the United States as of November 1, 1972 (the approximate mid-point of the survey).

### Summary

In summary, it is important to utilize the weights in analyses to account for the complex survey design (including oversampling), survey non-response, and post-stratification in order to ensure that calculated estimates are truly representative of the U.S. civilian non-institutionalized population.