NHIS - 2007 Data Release

The missing data on family income and personal earnings in the 2007 NHIS were imputed using multiple-imputation methodology. Five ASCII data sets containing imputed values for the 2007 survey year are included in the compressed data file (INCMIMP.EXE), which can be downloaded via the Datasets link below. For analyses involving other variables in addition to family income or personal earnings, each set of imputed values can be merged with other data from the 2007 NHIS to create a single completed data set. Multiple imputation is a technique that allows analysts to incorporate the extra variability due to imputation into their analyses. This is accomplished by analyzing EACH OF THE FIVE completed data sets separately using methods and software that are appropriate for survey data and then combining the estimates and standard errors using the combining rules described in Section 2.2 and Appendix A of the document available via the Technical Documentation link below. The extra variability due to imputation CANNOT be incorporated by simply analyzing a SINGLE completed data set as if the imputed values were true values. Moreover, analysts SHOULD NOT create a single completed data set using the AVERAGE of the five sets of imputed values. Examples of correct data analyses using SAS-callable SUDAAN and SAS-callable IVEware are provided in Section 4 of the document available via the Technical Documentation link below; the document also provides information on the procedures used to create the imputations. The Dataset Documentation link below opens to a document containing both the file layout description and the frequency counts (in the last page) of the variables in the data sets containing imputed values for the 2007 survey year. Users interested in data for several years should note that to date, multiple imputation has been carried out for the 1997-2007 NHIS, and that the file layout description is identical for years 1997-2003. Beginning with 2004, the person number variable has changed to FPX which is unique within each family. Beginning with 2007, variables names INCGRP_F, INCGRP_I, RAT_CATF, and ERNYRG_I have changed to INCGRPF2, INCGRPI2, RATCATF2, and ERNYRGI2 respectively due to questionnaire and response category changes. Users are also encouraged to check the NHIS website for updates and to subscribe to the NHIS Listserv to receive notices of any corrections/updates.

2007

The 2007 NHIS Paradata File contains data about the NHIS data collection process. It may be used as a stand-alone data file or linked to the NHIS 2007 health data files.

The Paradata File Description Document gives an overview of the 2007 Paradata File, including information about the sample design, weighting, and variables found on the file. Appendix I of this Description Document contains an example of SAS code that can be used to link the 2007 Paradata File with the 2007 regular health data files.

An ASCII data set containing paradata for the 2007 survey year (PARADATA.EXE) can be downloaded via the Dataset link below.

Dataset documentation for the Paradata File consists of a variable summary, variable layout and variable frequencies. A sample SAS input program is also provided.

Users are encouraged to check the NHIS website for updates and to subscribe to the NHIS Listserv to receive notices of any corrections/updates.

Since the 1997 NHIS, income dollar amount ranges have been provided on the NHIS public-use data files. However, survey data may be more useful for policy analysis when income dollar amounts (instead of income ranges) are available in public-use data files. To increase the usability of NHIS income data, NHIS staff created a set of supplemental imputed family income and personal earnings files which contain income dollar amounts. Because respondent confidentiality must be balanced against providing more detailed information, the variables containing the dollar amounts for personal earnings and family income have been top-coded to the 95th percentile of the appropriate distribution. The 95th percentile was calculated separately for each of the 5 imputed family income/personal earnings datasets and then a weighted average of the 5 individual 95th percentile amounts was calculated. The weighted average was rounded to the nearest $1,000 and this weighted average was used to top-code all 5 supplemental imputed personal earnings and family income datasets. The same procedure was used for family income and personal earnings dollar amounts. For all observations which were top-coded, the family income or personal earnings dollar amounts were replaced with the top-coded value. Also, since the 1997 NHIS, poverty ratio ranges have been provided on the NHIS public-use data files. The poverty ratio is a ratio of the family’s income to the appropriate Federal poverty threshold. In the supplemental imputed family income and personal earnings files, the poverty ratio is calculated using top-coded family income and the final calculated poverty ratio value is rounded to 2 decimal places.

Five ASCII data sets containing the supplemental top-coded values for the 2007 survey year are included in the compressed data file (INCIMPS.EXE), which can be downloaded via the Datasets link below. For analyses involving other previously released variables in addition to family income or personal earnings, each set of values from the supplemental imputed family income and personal earnings files can be merged with other data from the 2007 NHIS to create a single completed data set. The imputed values on the supplemental imputed family income and personal earnings files were imputed using multiple imputation, a technique that allows analysts to incorporate the extra variability due to imputation into their analyses. For more information about the correct analysis of multiple imputed data, please refer to the 2007 Imputed Family Income/Personal Earnings Files.

The Dataset Documentation link below opens to a document containing both the file layout description and the frequency counts (in the last page) of selected variables (survey year, imputation number, top-coding flags) for the 2007 survey year. A sample SAS program which can be used to create SAS datasets for each of the 5 supplemental imputed family income and personal earnings data files is available from the Sample SAS Input Program link below. Users are also encouraged to check the NHIS website for updates and to subscribe to the NHIS Listserv to receive notices of any corrections/updates.

For the 2007 NHIS Family File, an error in the variable RAT_CAT3 was discovered after the data file was released. RAT_CAT3 is a recoded poverty ratio variable which is created from several different variables. When the size of the family was 2 or less and the family’s income was at least $35,000 but was less than $50,000, RAT_CAT3 should have been set to “99 Unknown”. However, in the released 2007 NHIS Family File, these families actually have a RAT_CAT3 value equal to “17 2.00 – 3.99 (no further detail)”. This data release, which consists of several components, corrects this error. Listed below are links for a revised ASCII file, a SAS program which can be used to create a SAS dataset, and frequency counts for the modified RAT_CAT3 variable. Data users should note that the revised ASCII file only contains the following variables: survey year, household number, family number, and the modified RAT_CAT3 variable. This allows the information contained in the revised ASCII file to be merged with the data from the previously released family file. Data users should also note that the previously released 2007 Imputed Family Income/Personal Earnings files are not affected by this change.