Task 3c: Create NHANES Datasets in Stata

There are several steps to loading the text data files and saving them as permanent Stata datasets:

Step 1 Locate data documentation

To decide which variables you need (or just to see what is available) go to the NHANES documentation. In your Stata example, you will use the documentation downloaded in Task 2, the Adult documentation file, to learn about the adult data. The documentation tells you how to decipher the data file. 

In the Adult documentation file, search for "item description". The second instance should be the beginning of the Data File index. This list describes each item.  There are four columns:  "Positions/SAS name", "Counts", "Item description and code", and "Notes".  The Positions column indicates where the variable is located in the raw data file and the SAS name for the item.  The "Counts" column tells you how many observations are available. The "Item description and code" column gives the English text and codes used for that item. The Notes column provides additional notes for that item following the data file index.

Figure: Screenshot of NHANES III Adult Data File Documentation Data File Index
Screenshot of NHANES III Adult Data File Documentation Data File Index showing Positions, SAS name, Counts, Item description and code and Notes columns.



Step 2 Create a data dictionary

Like the name sounds, the dictionary defines the data.  It tells the computer where to look in those blocks of numbers to find the variables you want in the dataset and how to name the variables. It even lets you give the variables more descriptive labels.  You will create the data dictionary in Stata's do-file editor.

The dictionary gives STATA instructions on reading the raw data file.  A simple data dictionary looks like this:


_column(1)   seqn        %5.0f       "id number"
_column(15)  hssex       %1.0f       "sex"
_column(18)  hsageir     %2.0f       "age in years"



There are two types of variables in Stata — numbers and strings. Because NHANES codes data using only numbers, this example will only show you how to read in number variables. Now, let's look at each part of the dictionary in more detail:

The dictionary{} command notifies Stata that the following code is a dictionary file. The _column() command tells Stata where to look for the variable in the raw data file by indicating the beginning location in the parenthesis. The beginning location of the variable within each record can be found in the data file index of the documentation under the "Position" heading (Note: What the data file index refers to as a "position" is referred to as "column" in Stata.)

Then, you provide a name for the variable. In Stata 9 and 10, variable names can be up to 32 characters long. NHANES assigns names to all variables and to maintain continuity with the documentation, you may choose to name your variables the same way. However, you may also create your own names following the naming rules outlined in the Stata manual. NHANES variable names can be found under the "SAS name" heading of the data file index.


warning iconWARNING

Stata is case-sensitive.  For example, as far as STATA is concerned, feh ≠Feh.  If you use capital letters in the dictionary, you will always have to type capital letters to refer to that variable. 


Next, you will need to tell Stata how wide the variable is using the %X.Yf format. X indicates how many digits are in the variable. Y indicates the minimum number of digits to the right of the decimal point. Enter 0 here, as this will only affect how the output looks and does not change the actual values. To calculate the width of the variable, use this:

variable width = (end position - beginning position) + 1

For example, the data file index says that the variable SEQN (the person's identification number) is in position 1-5.  That means the variable starts at position 1 and goes to position 5 and is five columns wide.

Finally, save the data dictionary in your C:\NHANES III\Data folder. Use the dictionary option in the Save As menu to automatically add the .dct extension.


Step 3 Load and save the dataset

Once you have a dictionary you can use it to load the data into STATA.  The general syntax is:

infile using <path to dictionary>, using(<path to data file>)


Using the dictionary file you created in the previous step and the data file you downloaded in the previous module, your example should look like this:

infile using "c:\nhanes iii\data\adult.dct", using ("c:\nhanes iii\data\adult.dat")


warning iconWARNING

Pay attention to the syntax.  The first using command doesn’t need parentheses around the file name, but the second one does.



To ensure that the path to the dictionary and data files are correct, find the dictionary file and the data file using the pull down menu FILE (top left on your screen in STATA) and click on FILENAME.  To be able to use "filename", you need to write the infile command in the "Command" window (NOT THE DO-FILE EDITOR).  It's great to keep track of these commands in the do file editor but paste the entire command AFTER running it in the “Command" window.  You need to enter infile to make it run.

Alternative Approach:  If you want to be able to just type the simple file name (e.g. adult.dct), you can set the directory in Stata so it always knows were to look for the file.  To do this, you go under the pull down menu under FILE, then select Change Working Directory.  Then you can select the folder where you have all your adult files. 


Use the save command to save a dataset in Stata. The general format of the command is below.

save  filename [,save options]  

You will use the save command to save your loaded data, as a permanent Stata dataset, adult.dta.

save "c:\NHANES III\DATA\adult.dta"

If a filename is specified without an extension, .dta is assumed.


Step 4 Check results

To check the results of your program, open Windows Explorer and go to your C:\NHANES III\Data folder. You should now see adult.dta in the folder. You now have the adult dataset.



close window icon Close Window.