CDC Home

# Assess Normality and Estimate Percentiles, Geometric Means, and Proportions

### Purpose

NHANES data are often used to provide national estimates on important public health issues. This module introduces how to generate the descriptive statistics for NHANES data that are most often used to obtain these estimates. Topics covered in this module include checking frequency distribution and normality, generating percentiles, generating geometric means, and generating proportions.

### Task 1: Check Frequency Distribution and Normality

It is highly recommended that you examine the frequency distribution and normality of the data before starting any analysis. These descriptive statistics are useful in determining whether parametric or non-parametric methods are appropriate to use, and whether you need to recode or transform data to account for extreme values and outliers.

IMPORTANT NOTE

Because the procedure is the same as it would be for any dataset, links to the Continuous NHANES Web Tutorial are provided here for your convenience. If you use the code, please be sure to change it to match the variables in your environmental analytic dataset.

Percentiles are used to indicate the relative position of an individual within a given dataset. Frequency distribution and percentiles also can be used to describe the characteristics and shape of a distribution and to check for outliers.

IMPORTANT NOTE

Although SAS have commands for calculating estimates of weighted percentiles, they do not have commands to directly produce standard errors for the percentiles. So this tutorial will not provide sample programs in SAS for percentiles and their standard errors. Please refer to SUDAAN program for reference.

### Task 3: Generate Geometric Means

A geometric mean provides a better estimate of central tendency for data that are distributed with a "long tail" at the upper end of the distribution, which is very common in the measurement of environmental chemical in blood or urine.

Proportions are used for prevalence estimates of an event or trait (e.g., the prevalence of persons with high blood pressure (HBP) in the United States).

IMPORTANT NOTE

Because the procedure is the same as it would be for any dataset, links to the Continuous NHANES Web Tutorial are provided here for your convenience. If you use the code, please be sure to change it to match the variables in your environmental analytic dataset.