Variance Estimation for the National Health Interview Survey (NHIS) Public-Use Person Data Files, 1985-94
Design Information Available on the NHIS Public Use Data Files.
Not all of the following variables are used for all methods of variance approximation. Field locations may change on some files; the user should check the data format for each file.
|Variable name||Tape location||Field label||Values|
|STRATUM||179-181||‘FULL SAMPLE STRATUM IDENTIFIER’|
|CSTRATUM||187-188||‘PSEUDO PSU CODES’, first two columns||1, 2, …, 62|
|CPSU||189||‘PSEUDO PSU CODES’, last column||1, 2, 3, 4
(1985: 1, 2, 3)
(1986: 2, 3)
|SUB||178||‘TYPE OF SUBSTRATUM’||0, 1, 2|
|SSU||5-12||concatenation of ‘PROCESSING QUARTER’,
‘RANDOM RECODE OF PSU NUMBER’,
‘WEEK-CENSUS CODE’ and ‘SEGMENT
|TYPE_PSU||185||‘TYPE OF PSU’,
|WTF||219-227||‘FINAL BASIC WEIGHT|
Method 1 – Single Stage PSU’s Sampled With Replacement within Strata Design for 1985-1994 NHIS.
This method is statistically less efficient than the method described below but is more flexible. This method requires no recoding of design variables, may be applicable to many complex survey sample design computer programs, and covers the 1985-1994 NHIS survey years. Using the variables CSTRATUM, CPSU, and WTF, the PSU unit CPSU is treated as being sampled with replacement within stratum unit CSTRATUM. The data file needs to be sorted only by CSTRATUM and CPSU prior to invoking SUDAAN.
For the above simplification of the NHIS sample design structure, use the following SUDAAN design statements:
PROC (procedure name) DESIGN = WR;
NEST CSTRATUM CPSU;
For information about corresponding statements for other software packages (Stata, SPSS, SAS survey procedures, R, VPLX), and guidance for analyses of subgroups, refer to:
Method 2 – Multistage stratified sampling design for the 1987-1994 NHIS.
This design provides for more statistically efficient variance estimation than Method 1. Method 2 makes fewer simplifications of the NHIS sample design structure but is only applicable to SUDAAN. This method also requires recodes of the design variables and is only applicable to survey years 1987-94 NHIS person data. The variables STRATUM and SUB that are used in this method are not available on the 1985-86 public-use files.
Prior to use of this method the following recoding must be done on the NHIS file. This example is in SAS but other programming languages may be used.
If (TYPE_PSU = 1 or TYPE_PSU = 4) then do;
PSU = 1;
POPPSU = 0;
If (TYPE_PSU = 3 or TYPE_PSU = 6) then do;
PSU = CPSU;
POPPSU = -1;
This recode creates two new variables on each record, PSU and POPPSU, for use by SUDAAN’s NEST and TOTCNT statements. For more information on the purpose of these statements refer to SUDAAN documentation. With these additional variables, use the following SUDAAN code for NHIS data-sets assuming a multistage stratified sampling design.
Before running SUDAAN against the data file, however, sort the input file by the NEST variables (STRATUM, PSU, SUB, and SSU).
For SUDAAN, describe the NHIS sampling design as follows:
PROC (procedure name) DESIGN = WOR;
NEST STRATUM PSU SUB SSU /MISSUNIT;
TOTCNT POPPSU _ZERO_ _MINUS1_ _ZERO_;
Caution. This method assumes that ALL records on the BASIC HEALTH AND DEMOGRAPHIC (BHD) or CURRENT HEALTH TOPIC (CHT) file are being used. If, however, you keep only selected records on a file, e.g., persons aged 65+ (which may delete all records from one or more SSUs), care must be taken to preserve the integrity of the sampling design.
All Self-Representing (SR) SSU’s (TYPE_PSU = 1 or 4) listed on the FULL NHIS file (BHD or CHT) must have at least one representative on the data file used for input to SUDAAN. Moreover, all Nonself-representing (NSR) PSU’s must have at least one record for a person on this data file. The NEST variables (STRATUM, PSU, SUB, and SSU) along with the weight variable must be on each record of the analysis file.