Creating Your Analytic Data Set

Compiling the public use dataset provides you the opportunity to become familiar with the data and expedite the data creation process. We provide access to data needed to answer your research questions, but will limit any excess variables. It is important to remember:

  • You will provide the public and non-NCHS data files
  • Non-NCHS data includes data collected by you, another government agency, or a private institution that the researcher wishes to merge with NCHS data, often using geographic information.
  • Masked PSU and Strata should not be provided when requesting true geography
  • Any attempt to include variables that may lead to re-identification of subjects/establishments will result in the cessation of your project and possible legal actions.
  • The RDC Analysts merges all data files to create the final analytic dataset.



  1. Discuss with your RDC Analyst the preferred format for any merge variables. This is especially important for complex merges that involve multiple data sets and multiple merge variables.
  2. Only include the variables specified in your proposal for public-use and non-NCHS data.
  • Do not include variables that are not listed in your approved proposal without first updating your proposal and discussing the matter with your RDC Analyst.
  • Researchers using NHDS, NAMCS, NHAMCS, and some DHHS hosted data, do not need to provide a public dataset. Your RDC Analyst will provide an extract from the restricted files that includes all of the variables specified in your proposal.
  • Researchers using the restricted Mortality files cannot include any public use mortality variables, or variables derived from the public use mortality data in their dataset. The RDC Analyst will include these variables with the restricted data merge.
  1. Create the variables as your RDC Analyst requests. This helps expedite the merge and improves data quality. Be prepared to provide the code you used to create variables or dataset.
  • If you would like to rename NCHS public-use variables, include the original variable name in the variable description.
  • Derived variables must be clearly defined. The variable description should include the original variable name(s) from which it was derived and any arithmetic manipulation must be explained.
  • Discuss the creation of derived variables that include restricted data with your RDC Analyst in advance. You may be asked to provide a definition of the derived variable, or provide the code to create the new variables.
  1. Email the data files along with a list of the variables to your RDC Analyst. If your data files are too large to be emailed, please discuss other options with your RDC Analyst.

Merging the Data

Analytic files will be created based on the specifications in Data Requirements Section (E. 4) of the proposal. We strongly encourage you to discuss the merge with your RDC Analyst throughout the process to ensure that the data set is created to your specifications. Data sets will be made accessible as SAS data sets unless otherwise specified.

  • RDC Analysts will follow the policies to protect geographic, temporal, and perturbed/masked data outlined in the NCHS Disclosure Manual.


Page last reviewed: February 26, 2015