Stata Tips


Tip 1: There are two series of commands.

There are two series of commands you can use analyze NHANES in Stata.

SVY Commands  

SVY commands are a series of commands specifically designed to analyze complex survey designs like NHANES.  To calculate the means and standard errors, you would use Stata survey (svy) commands because they account for the complex survey design of NHANES data when determining variance estimates. These commands can be used for simple random samples also.


Whenever you want to use SVY commands, you need to set up Stata by defining the survey design variables using the svyset command.  This command has the general structure:

svyset [w= weight], psu(psu variable) strata(strata variable)


Here is the command using the 4-year weight for data collected in the MEC and the output:


svyset [w= wtmec4yr], psu( sdmvpsu) strata(sdmvstra)

(sampling weights assumed)

           pweight: wtmec4yr
        VCE: linearized
Single unit: missing
   Strata 1: sdmvstra
       SU 1: sdmvpsu
      FPC 1: <zero>


Once you do this, Stata remembers these variables and applies them to every subsequent SVY command. If you save the dataset, Stata will remember these variables and apply them automatically when you reopen the data set.

You can change these variables any time you want by typing a new SVYSET command.


Standard commands

Standard commands are regular Stata commands that can incorporate sampling weights. For example, if standard errors are not needed, you can simply use regular Stata commands with the weight variable (i.e., mean with the weight variable) to calculate means.

You only need to use these commands when there is no corresponding SVY command.  When you use these commands, keep in mind that:


Tip 2: Make sure Stata's memory size is large enough.

NHANES data files are very big; you will encounter memory problems unless you change some of Stata's default settings.  If you don't you'll be plagued by messages like:


. use "/WoloHD/Teaching/CECS/ECS 122 2005/Classes/Week 6/lab6/"

no room to add more observations

    An attempt was made to increase the number of observations beyond what is

    currently possible.  You have the following alternatives:


     1.  Store your variables more efficiently; see help compress.  (Think of

         Stata's data area as the area of a rectangle; Stata can trade off width

         and length.).).)


     2.  Drop some variables or observations; see help drop.


     3.  Increase the amount of memory allocated to the data area using the set

         memory command; see help memory.


The solution is simple, just tell Stata to make more room.  The syntax is simple, you just tell Stata how much memory to set aside for data.   Functionally, the only limit is the size of your hard drive.


set memory 1g


(*the 1g means 1 could try a smaller -- like 100m [m for megabyte] -- or larger.  Don't be scared to experiment.  If you want you can set the memory "permanently" (that is until you manually reset it) type:


set memory 1g, permanently

warning iconWARNING

Do not drop observations from the dataset. This may affect variance estimation.


Tip 3.  Stata is case-sensitive. 

Stata cares about the case of the letters - so if your dictionary has all capital letters, you will always need to use caps and visa versa.  The only requirement is that you use the NHANES variable names.  So, when you write the data dictionary, it is your choice of all caps or all small letters.  If you click on the variables in the "variable box", you don't need to worry about this. 



Tip 4: Missing numeric values are represented by large numeric values.

Stata represents missing numeric values (".") as large numeric values. So, unlike SAS Survey Procedures or SUDAAN, which would  place missing values at the bottom of the range, Stata will place them at the top of the range.



close window icon Close Window