Key Concepts about Outliers in NHANES Data

Outliers, or extreme values in the data, are common in surveys such as NHANES. They can occur as a result of errors in data collection or recording, or for other reasons.  Because the data were reviewed carefully before release, data collection and recording errors should be minimal within the publicly available NHANES PAQ data.  Problematic outliers are the legitimate self-report values that are far outside the range of other values in the data. Examples of these might include a report of physical activity engagement for 18 hours or more per day, which does not take into account the study participant’s sleep time; or reporting more that 10,080 minutes of activity per week, thereby exceeding the number of minutes in a given week.

Consider outliers carefully, as their presence may substantially affect your results, especially if the sample weight associated with the outlying value is large. In some types of analysis, outliers have the potential to distort statistical estimates, alter apparent relationships, and lead to faulty conclusions. In these cases, the outliers may be deleted or the data transformed to lessen their impact. On the other hand, if the data are assumed to be correct and the statistical methods are robust in dealing with outlying values, outliers may sometimes be accommodated.

Please consult the Analytical Guidelines for more information on this topic.