Key Concepts About Outliers in NHANES Data

Before you analyze your data, it is very important that you check the distribution and normality of the data and identify outliers for continuous variables with a univariate analysis. If the distribution is highly skewed, you can do a data transformation to make the distribution of the data closer to normal (the underlying assumption in most statistical analyses is that the distribution of the data is normal). The common types of transformation are LOGIT, LOG, LOG10, SQRT, INVERSE, or ARCSIN. Transforming data should be covered in any basic biostatistics text and will not be covered in detail in this tutorial.

After checking the distribution and normality of the data, plot the survey weight against the variable to determine which of the extreme values identified in the univariate analysis are outliers. You must also determine if the outliers represent valid values and, if so, also carry extremely large survey weights. Outliers with extremely large weights could have an influential impact on your estimates. Then you have to decide whether to keep these influential outliers in your analysis or not. It is up to the analysts to make that decision. Please consult the Analytical Guidelines for more information on this topic.

close window icon Close Window