Task 2: Key Concepts about Appending Data in NHANES

Each 2-year cycle of NHANES and any combination of 2-year cycles is a nationally representative sample. However, in some situations, such as when estimating the average serving size of a rarely consumed food, the sample for a single 2-year cycle is too small to produce statistically reliable estimates.  The NHANES sample design makes it possible to combine data from multiple survey cycles to increase the sample size for an analysis.  Increased sample size improves the statistical power, reliability, and stability of estimates for population sub-domains including racial and ethnic groups and results for rare events.

The process of combining data for multiple survey cycles or years is called appending. This is similar to adding rows to a table.

Always check the contents of each data file before appending the data files because some components or questions are not collected in every survey cycle.  For example, food frequency data are collected only in the 2003-04 and 2005-06 cycles.  In addition, variable names may be different from cycle to cycle, and recoded or derived variables may be added in different cycles.

NHANES adds or deletes survey items from time to time. If the added or deleted variables are not relevant to your analysis, you can simply append the data files as described and use only the variables of interest for your analysis. The extra variables will not affect your analysis if you do not include them in your dataset.



When extracting variables from an NHANES data file or appending NHANES data you should always include the SEQN variable, which is the unique identifier for each participant in NHANES.  Failing to include this variable in your dataset will lead to problems when you sort or merge your data files at a later time.


When combining two or more 2-year cycles of the continuous NHANES for NHANES 2001-2002 and beyond, you must create a new weight variable, by summing rescaled versions of the existing weight variables, before beginning any analyses.  When survey cycles are combined, the estimates weighted with the new variable will be representative of the population at the midpoint of the combined survey period. The new weight variable simply rescales the values of the weight variables from the separate cycles so that the sum of the new weights matches the survey population size at the midpoint of that period.

When combining data cycles, it is extremely important to:

  1. verify that data items collected in all combined years are comparable in wording and methods, and
  2. select the same type of sample weight from each cycle when constructing the new weight variable in the combined data set.

For more information about determining the compatibility of datasets, please see the Locate Variables and Structure & Contents modules.



Because the data collection protocol changed significantly in 2002, it is recommend that you not combine dietary recall data from survey cycles before 2001-2002 with data from subsequent cycles.


After appending the data, you will need to check the results. You should check that all your variables of interest were included and that any variables you renamed or recoded are correct and include all the years of data.



close window icon Close Window to return to module page.