Advance Data No. 314 December 4, 2000

Statistical smoothing procedures

Data were grouped by single month of age from 1 through 11 months, by 3-month intervals from 12 through 23 months, and by 6-month intervals from 24 months through 19 years. Data for weight-for-length and weight-for-stature were grouped by 2 cm intervals. The weighted empirical percentile estimates were obtained by applying the survey-specific sample weights. Then, weighted empirical percentile data points were calculated and plotted at the midpoint of each age group (or the midpoint of each 2-cm interval for length or stature). When the observed percentile points are plotted on a graph and connected, the resulting lines are jagged or irregular, in part because of sampling variability. Because of these irregularities, statistical smoothing procedures were applied to the observed data to generate smoothed curves for selected percentiles and to generate parameters that can be used to produce additional percentiles.

 The smoothing procedures are described in more detail below. The smoothed percentile curves were developed in two stages. In the first stage selected percentiles were smoothed with a variety of parametric and nonparametric procedures. In the second stage the smoothed curves were approximated using a modified LMS estimation procedure, as described below, to provide associated z-scores that closely match the empirically smoothed percentile curves. In the first stage of smoothing, smoothed percentile curves were created from the empirical data points. The method of smoothing empirical percentiles for infant weight, length, and head circumference was based upon a family of three-parameter linear models (Guo et al., 1988, 1990, 1991; Roche et al., 1989). The method of smoothing the empirical percentiles for older children differed among the growth variables. For the smoothing of weight-for-age percentiles, a locally weighted regression procedure was first applied to better discern the patterns of change over time in the empirical percentile curves. This procedure applies a weight function to data in the neighborhood of the value to be estimated, so that ages at measurements that are close to that of the value to be estimated receive larger weights than those further away from the specific age. Locally weighted regression generated intermediate results. The intermediate results were further smoothed using a family of parametric models. The smoothed weight-for-age percentiles for infants and the smoothed percentiles for older children were combined in a manner that resulted in a continuous transition between these two sets of percentile curves.

Smoothing of the empirical percentiles for stature-for-age was based upon a nonlinear model that ensured a monotonic increase in stature during the growth period; this captures early childhood growth, pubertal growth, and post-pubertal growth patterns. Weight-for-length empirical data were adjusted and merged with the weight-for-stature data. These combined data were smoothed with a polynomial regression model.

Empirical percentile curves for BMI-for-age were considerably more irregular than those for stature-for-age and weight-for-age. Similar to weight-for- age, locally weighted regression was applied to the BMI empirical percentile curves to discern the shape of the curve. The intermediate smoothed percentile curves were then fit by a polynomial regression to achieve reasonably smoothed curves and to summarize the BMI-for-age percentile curves in polynomial equations.

For each set of percentile curves, the initial smoothing methods were applied to the nine empirical percentiles (3rd, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 97th) for each age group. In addition, the 85th percentile was included in the BMI-for-age charts because the 85th percentile of BMI has been recommended as a cutoff threshold to identify children and adolescents at risk for overweight (Himes, Dietz, 1994; Barlow, Dietz, 1998). The initial smoothing procedures are summarized in table 3. A detailed description of these procedures will be presented in future reports.

In the second stage, a modified LMS statistical smoothing procedure was applied to the smoothed curves generated in the first stage of the process. For ease of interpolation between percentiles, a normal transformation of the curves is useful. A normal transformation makes it possible to estimate any percentile and allows the calculation of standard deviation units (SDU) and z-scores.

With the exception of stature, which tends to be normally distributed, for most other anthropometric measures neither the empirical nor the smoothed data strictly follow a normal distribution. Rather, the distribution contains some degree of skewness. To remove skewness, a power transformation can be used so that one tail of the distribution is stretched while the other tail is shrunk. One means of doing this is to apply a Box-Cox transformation to transform the data to a nearly normal distribution. When applied to percentile curves, this is known as the LMS technique (Cole, 1988). The assumption is that after the appropriate power transformation, the data are closely approximated by a normal distribution (Cole, 1990). The transformation does not adjust for kurtosis, but kurtosis is a less important contributor than skewness to nonnormality (Cole, 1992).

In the LMS technique, three parameters are estimated: the median (M), the standard deviation (S), and the power in the Box-Cox transformation (L). The equation for the LMS is:

Centile = M (1 + LSZ)1/L

where Z is the z-score that corresponds to the percentile. The usual practice is to use a penalized likelihood estimation procedure applied to the empirical data to generate age-specific estimates of L, M, and S. These age-specific estimates of L, M, and S are then smoothed. A smoothed percentile curve or an individual standardized score can be obtained from the smoothed values of L, M, and S (Cole, 1988, 1990). However, a smoothed percentile curve based on this type of LMS estimation procedure can be somewhat different from the curve that is obtained by smoothing empirical data points.

A modified estimation procedure was used to increase the agreement between the empirically smoothed curves and the LMS smoothed curves. In the modified LMS approach used for the present analyses, observed percentile curves were initially smoothed, as described above. Then, the Box-Cox power transformation (Box, Cox, 1964) was used to specify an equation at each of the previously smoothed major percentiles. A simultaneous solution for the three parameters was generated using the SAS procedure NLIN (SAS, 1988). The set of L, M, and S parameters that best matched the set of smoothed percentiles was obtained as a solution to a system of equations rather than as likelihood-based estimates from empirical data. These parameters allowed final curves to be produced that are extremely close to the curves smoothed for each major percentile from the first stage of curve smoothing. The advantage is that the final curves retain a nearly identical appearance to the initially smoothed percentiles, and the z-scores can be obtained in a continuous manner. The final set of percentile curves presented in this report was produced using the modified LMS estimation procedure.