Frequently Asked Questions
- Where can I find data from NHANES 1999-2000?
- Where can I find the data files and list of data items that are available from NHANES 1999-2000?
- Why are there so many data files?
- On the NHANES 1999-2000 data page I see links for data. How do I access the data from these links?
- Next to each questionnaire section, laboratory component, or exam component on the NHANES 1999-2000 data page there are links that appear as follows: [Codebook, Doc, Freqs, Data]. What are these links for?
- What format are the data files in? Can they be used with SAS, SPSS, or STATA?
- Where can I find a description of the codebook contents?
- Where can I find the analytic guidelines (weighting, variance estimation, sample design)?
- What is the sample size for a particular data item, questionnaire section, examination component, or laboratory analyte?
- How do I determine the skip patterns for a questionnaire section?
- How are missing values, "blank but applicable", "don't know" and other values coded?
- I have questions about using the data, protocols, etc - where can I get help?
- When will other data be available from NHANES 1999-2000?
- Why isn't the adolescent data on alcohol use, smoking, sexual behavior, reproductive health and drug use available as a public release file?
- Will data and weights be available on public use files for single years such as 1999, 2000, or 2001?
- Will data and weights be available on public use files in combined datasets for three year and six year periods such as 1999-2001, 2002-2004, or 1999-2004?
- Do I need to use SAS software to view NHANES data
- Are there variables which can identify whether survey participants are family members and/or live in the same household?
- Can I identify what region of the country or what state or county a survey participant resides within?
- I am interested in one or more questions which appear in the survey questionnaire but I cannot find the question in a codebook or data file available on your Internet site. What happened to it?
- List/description of Mobile Examination Center (MEC) exam data items
- List/description of MEC lab data items
- List/description of household interview data items
- List/description of demographic data items
- List/description of dietary data items
The data files have been separated to reduce the amount of time to download data and documentation from the Internet along with the greater ease in producing, editing, and validating data files. This does require that you merge files together for analysis. Please refer to the following SAS code examples to learn how to merge files together:
- MEC exam merge example [TXT - 3 KB]
- Lab merge example [TXT - 3 KB]
- Household interview merge example [TXT - 3 KB]
4. On the NHANES 1999-2000 data page I see links for data. How do I access the data from these links?
The Doc files are webpages, so you should be able to view these directly in your browser. A webpage can be saved using the "File/Save As..." menu and specifying a location on your local computer or network to store the file. Or you can right-click the file name directly on the webpage and select "Save Target As..." from the popup box, then specify a location to save the file on your computer.
Clicking on the Data link will open a dialog box from which you can specify a location to store the file (using the "Save" button) or open it directly with SAS (using the "Open" button.)
5. Next to each questionnaire section, laboratory component, or exam component on the NHANES 1999-2000 data page there are links for Doc and Data. What are these links for?
These links allow you to view the documentation, including the codebook with the frequency distribution for each item in a particular data file. This can be used to verify the sample size for any particular data item. The documentation are webpages so you should be able to view these directly in your browser.
The files are in SAS transport file format. They can be used with any package that supports this file format. For statistical/analytical packages that do not support SAS transport file format, you need to convert the file to a different format using an appropriate software package. Please note that NHANES 1999-2000 is a complex probability sample and proper analysis of the data usually requires statistical software that specifically incorporates sample design complications such as weighting and clustering.
The analytic guidelines provide information on the sample design and on recommended methodologies for analyzing the data. In particular, the guidelines provide information on how the sample persons were selected, how the various survey weights were calculated, what particular survey weight should be used to provide survey estimates, how to compute sampling variances for those estimates, and recommended sample sizes for analysis.
For any particular questionnaire section, examination component or laboratory data file you will only find records for survey participants that were eligible. For example, suppose 6,000 people were eligible for an examination in the MEC and only 5,000 were eligible for the muscular strength component due to age restrictions. Of the 5,000 suppose only 4,500 participated in the examination; the other 500 either refused or did not have enough time to participate in the exam. The data file would have 5,000 records with 500 records having missing data. For further details refer to the "frequency" counts document for each of the data files.
The first step is to review all of the documentation for the questionnaires. To review skip patterns look at the complete questionnaire instrument. Please note that not all questionnaire items are released due to small sample sizes and confidentiality/sensitivity issues, but all skip pattern integrity was maintained and validated.
There are codes for refused (7-fill: that is 7, or 77, or 777, …, depending on the number of digits required for a particular data value), don't know (9-fill), and missing values (a blank field) which means the person was not asked the question or given the test. There is no longer a specific code for those cases where the variable response is “blank but applicable”; for such cases the values are designated as missing values. For laboratory data there are special considerations. When a laboratory value was less than the lower limit of detection (LOD), a “fill” value based on the LOD was used instead of the sample value as the sample value was deemed “not detectable.” An indicator variable taking value (0 or 1) is used to identify which values are real and which values are fill values.
First, and most important, refer to the questionnaire, exam component, or laboratory descriptions. If you need help beyond this you can pose your question to the NHANES listserv – please note however that the NHANES program staff do not routinely provide technical responses to questions posted to the listserv.
A second release of data from NHANES 1999-2000 is to be released in spring of 2003. As other data is processed and ready for public release it will be released on the NHANES website. Certain data will only be available at the NCHS Research Data Center. The RDC data consists of adolescent data (people less than 20 years old) such as: youth conduct disorder, sexual behavior, drug use, alcohol use, and CDISC. Please refer to the NHANES What's New page for further details.
These files have not been released on the NHANES website due to confidentiality concerns. Adolescent data files containing this sensitive information will be made available at the NCHS Research Data Center.
No. The continuous NHANES will be grouped for two year periods for public release (i.e. 1999-2000, 2001-2002, 2003-2004, etc.). Combining two or more two-year periods is possible (i.e. 1999-2002, etc.). The two-year sample weights should be used for NHANES 1999-2000, NHANES 2001-2002, NHANES 2003-2004, and NHANES 2005-2006 analyses, respectively. The four-year sample weights should be used for combined analyses of NHANES 1999-2000 & NHANES 2001-2002 data.
Six-year sample weights for NHANES 1999-2004 should be calculated by researchers as follows: With the first two dataset weights (NHANES 1999-2002) already averaged as a four-year sample weight, then the six year weight would be WT99-04 = (2/3) x WT99-02 + (1/3) x WT03-04, where WT99-02 is the variable WTMEC4YR from the NHANES 2001-2002 demographic file dataset, and WT03-04 is the variable WTMEC2YR from the NHANES 2003-2004 demographic file dataset. Eight-year sample weights for NHANES 1999-2006 should be calculated similarly to calculating the six-year sample weight. WT99-06 = (1/2) x WT99-02 + (1/4) x WT03-04 + (1/4) x WT05-06, where WT05-06 is the variable WTMEC2YR from the NHANES 2005-2006 demographic file dataset.
Six-year sample weights for 2001-2006 can be combining by using the 2-year weights found in the demographic files. For example, WT01-06 = (1/3) x WT01-02 + (1/3) x WT03-04 + (1/3) x WT05-06.Please refer to the NHANES Analytic Guidelines provided with the data release files to determine the appropriate methodology for analyses of combined years of data.
No. You can view NHANES data with the SAS System Viewer—a free download from SAS Institute. Currently, most NHANES is available in the SAS transport format (.xpt), which can be used in several statistical software programs, including SUDAAN and SPSS. Users desiring alternate data formats can use the SAS Viewer to convert the transport file into a comma-delimited text file (.csv) for use in additional software programs, such as Microsoft Excel.
Learn more about SAS System Viewer.
In continuous NHANES 1999+, there is no way to identify whether one or more survey participants are related. However, it is possible to identify whether they may live in the same household, but that information is only available through the Research Data Center. Analysts should bear in mind that sharing a household can mean a rental arrangement or other non-family circumstance.
Geographic identifiers are available but only through the Research Data Center (RDC), in order to protect the confidentiality of our participants. A list of NHANES Geocode variables is available through the RDC website.
It sometimes means that the data is not yet ready to be publicly released. Other times, the staff have determined that a question poses a risk of disclosure to our survey participants. Under these circumstances the data are made available only through the Research Data Center. Documentation for some of these datasets is available on the Limited Access Datasets page. Please send an email to firstname.lastname@example.org to inquire about the status of a particular question.