The SAS Program for CDC Growth Charts that Includes the Extended BMI Calculations

(Updated Jan 9, 2023)

Overview

Note that the calculations for BMI z-scores and percentiles for 2- to 19-year-olds with obesity (BMI ≥ 95th percentile for a child’s sex and age) have changed on Dec 15, 2022. See the section on the extended BMI percentiles and z-scores for more information.

This SAS program calculates percentiles and z-scores (standard deviations) for a child’s sex and age for BMI, weight, height, and head circumference from the CDC growth charts (1). In addition, weight-for-height z-scores and percentiles are also calculated. The program also allows for the identification of outliers. These extreme values, however, are not necessarily incorrect and could be reviewed for possible inclusion or exclusion.

Although the SAS program calculates z-scores and percentiles for children up to 20 years of age, the World Health Organization (WHO) growth charts are recommended for children < 24 months of age. Several programs on the WHO and CDC websites are based on the WHO growth charts.

Note that the calculations for BMI z-scores and percentiles for 2- to 19-year-olds with obesity (>= 95th percentile (1.645 z-score)) changed on Dec 15, 2022, to use extended BMIz. See the section on the extended BMI percentiles and z-scores for more information.

The SAS program, cdc-source-code (files are below, in step #1), calculates these z-scores and percentiles for children in your data from the reference data in cdc_ref.sas7bdat for children without obesity and extended BMI percentiles and z-scores for children with obesity. Note that the z-scores and percentiles calculated for children with obesity will differ from earlier (pre-2022) versions of this SAS program. If you’re not using SAS or R, you can download CDCref_d.csv and create a program based on cdc-source-code.sas.

Instructions for SAS users

Step 1: Download the SAS program (cdc-source-code.sas) and the reference data file (CDCref_d.sas7bdat). Move these files to a folder (directory) that SAS can access. For the following example, the files are in c:\sas_growth_charts.

Example SAS code corresponding to Steps 2 to 6 below. After downloading the SAS code and the reference data, you can cut and paste the following 4 lines into your SAS program. But you’ll likely need to change the libname and %include statements to point at the folder/directory for the downloaded files. You’ll also probably have to rename and recode variables, as explained in Steps 2 to 6.

libname refdir 'c:\sas_growth_charts';
data mydata; set whatever-your-original-dataset-is-named;
%include ‘c:\sas_growth_charts\cdc-source-code.sas’;
proc means data=_cdcdata; run;

Step 2: Create a libname statement in your SAS program to point at the folder location of ‘CDCref_d.sas7bdat’. An example would be:
libname refdir ‘c:\sas_growth_charts’;

Note: the SAS code expects this name to be refdir – make sure you specify this in the libname statement.

Step 3: Set your existing data that contains height, weight, sex, age, and other variables into a temporary dataset named mydata. Rename and code the variables as follows (Table 1):

Table 1. Instructions for SAS users (Step 3), guidance on renaming and coding variables in your dataset.
Variable Description of variables and coding in the input dataset, mydata
agemos Months of age. Agemos must be in your dataset, and the program assumes that you know the number of months to the nearest day. For example, if a child were born on Oct 1, 2007, and examined on Nov 15, 2011, the child’s age would be 1506 days or 49.48 (1506 / 30.4375) months. In everyday usage, this child’s age would be 4 years or 49 months. However, if 49 months were used for all children between 49.0 and < 50 months of age, then most of the calculated z-scores would be too high because, on average, these children would be taller and heavier than children who are 49.0 months of age.
If only the completed number of months is known (as in NHANES), add 0.5 to the age so that the maximum error would be 15 days. If age represents the completed years (e.g., 13 years), multiply by 12 and add 6. If age is in days, divide by 30.4375.
sex Sex must be coded as 1 for boys and 2 for girls.
height Height (cm). Height is either standing height (for children ≥ 24 months of age or recumbent length (< 24 months). If standing height was measured for children under 24 months of age, you should add 0.8 cm to these values (see page 8 of https://www.cdc.gov/nchs/data/series/sr_11/sr11_246.pdf [PDF-5.3MB]). If recumbent length was measured for children ≥ 24 months, subtract 0.8 cm.
weight Weight (kg)
bmi BMI [Weight (kg) / Height (m)2]. The program calculates BMI if it is not present in your data but will not overwrite BMI if present.
headcir Head circumference (cm)

Z-scores and percentiles for the anthropometric variables not in mydata (or are that are missing) will be coded as missing (.) in the output dataset, _cdcdata. It’s unlikely that the SAS code will overwrite variables in your dataset, but you should avoid having variable names that begin with an underscore or with ‘mod_’

Step 4: Copy and paste the following line into your SAS program after the line (or lines) in Step #3.
%include ‘c:\sas_growth_charts\cdc-source-code.sas’; run;

If necessary, change this statement to point at the folder/directory containing the downloaded cdc-source-code.sas file. The %include will run your data through cdc-source-code.sas and create a dataset named _cdcdata.

Step 5: The output dataset, _cdcdata, contains your original data and z-scores, percentiles, and flags for extreme values shown in Table 2. Additional information on the extreme z-scores is given in the Extreme Values, Implausible Values, and Data Errors section.

Step 6: Examine the new dataset, _cdcdata, to verify that the z-scores and other variables have been created. If z-scores and percentiles for a variable in your dataset are unexpectedly missing, (1) make sure your dataset is named _mydata, and (2) variables are named and coded as shown in Table 1. The program will not modify your original data but adds new variables to your dataset. Table 2 shows the names and descriptions of several variables in _cdcdata.

Table 2. Z-Scores, percentiles, and extreme (possibly implausible, BIV) values in the output dataset, _cdcdata
Description Variable Cutoff for Extreme Z-Scoresa
Percentile Z-score Modified Z-score to Identify Extreme Valuesb Flag for Extreme
Values
Low (Flag = -1) High
(Flag = +1)
Weight-for-age for children aged from 0 to < 240 months wapct waz mod_waz _bivwt < -5 >8
Height-for-age for children aged from 0 to < 240 months. hapct haz mod_haz _bivht < -5 >4
Weight-for-height for children with heights from 45 to 121 cm (these heights approximately correspond to ages 0 to 6 years) whpct whz mod_whz _bivwh < -4 >8
BMI-for-age for children aged 24 to < 240 months. bmipct bmiz mod_bmiz _bivbmi < -4 >8
Head circumference-for-age for children aged from 0 to < 36 months headcpct headcz mod_headcz _bivhc < -5 >5
Original calculations for BMIc original_bmipct original_bmiz

a  Several cut points were changed in 2016.
b. The names of the modified z-scores were changed in Dec 2022. Previously, they began with ‘_F.’
c   See sections on LMS Method and Extended BMI percentiles and z-scores

Other variables in _cdcdata, as shown in the following table

Table 3. Additional variables in the output dataset, _cdcdata
Variable Description
bmi50 and bmi95 Sex- and age-specific 50th and 95th percentiles of BMI in the CDC growth charts
bmip50 and bmip95 BMI expressed as a percentage of CDC’s 50th and 95th percentiles
bmi120 The BMI value that is 120% of the CDC 95th percentile

LMS Method

The LMS (lambda, mu, sigma) method calculates BMI z-scores as

Z-score = ((BMI / M)L - 1) / (L × S)  [equation 1]

The L (transformation for normality), M (median), and S (coefficient of variation) values for the CDC growth charts, which vary by sex and month of age, are in CDCref_d.sas7bdat. These z-scores are then transformed into percentiles with the SAS probnorm function. For example, a z-score of 1.645 is the 95th percentile. For more information on the LMS method, developed by Tim Cole and PJ Greene in the 1990s, see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC27365 and http://www.ncbi.nlm.nih.gov/pubmed/1518992. Sex- and age-specific L, M, and S values are at https://www.cdc.gov/growthcharts/percentile_data_files.htm.

The LMS method for BMI results in a curvilinear relation between BMI and BMIz, as shown in the following figure for children aged 5, 12, and 18 years. The range of BMIs (x-axis) corresponds to those observed in NHANES 1999-2000 through 2017-2018. At low BMIs, a small change in BMI results in a sizeable BMIz change. In contrast, at very high BMIs, the same BMI change results in a much smaller BMIz change – this is most evident among 12-year-old boys and 18-year-old females.

Figure 1. Relation of BMI to BMIz at three ages. BMIz was calculated using the LMS values and equation #1.

Figure 1. Relation of BMI to BMIz at three ages. BMIz was calculated using the LMS values and equation #1

Further, if a child’s BMI is very large relative to the median BMI, (BMI ÷ M)L in the LMS equation approaches 0, and the maximum BMIz value that is possible at that sex/age is (-1) ÷ (L × S). For most ages over 5 years, the maximum possible BMIz, regardless of the magnitude of the BMI, is < 4.0 SDs. Further, among 7- to 15-year-old males and 15- to 19-year-old females, BMIz cannot be > 3.3 SDs, limiting the usefulness of these z-scores in characterizing the extremely high BMIs (e.g., ≥ 40 kg/m2) shown in the figure above.

The CDC 2000 growth charts were based on data collected from 1963 to 1980 for most children, and it was advised that extrapolation beyond the 97th percentile be done cautiously (1). Further, the 2000 CDC growth charts’ BMI z-scores were not intended for use among children with extremely high BMI values (2,3).  Several other studies have also highlighted the limitations of LMS-calculated BMIz in characterizing very high BMIs (4–6).

Extended BMI percentiles and z-scores

To explore alternative metrics for BMI, NCHS convened a workshop in 2018 and published a 2022 report (7) that evaluated several alternatives to LMS-BMIz.  This report recommended that ‘extended BMIz’ and ‘extended BMI percentiles’ be used to characterize the BMIs of children with obesity (BMI ≥ 95th percentile for a child’s sex and age). These extended metrics were constructed from the BMIs of children with obesity in the CDC growth chart reference population and more recent NHANES surveys (through 2015-2016). These BMI data were modeled within each sex and 6-month age group as a half-normal distribution, a truncated normal distribution with only values at or to the right of the peak having a probability density greater than 0 (8). Characterizing these distributions’ shape parameter, sigma, allows calculating BMI percentiles for children with obesity, even those with extremely high BMIs. These percentiles can then be transformed into z-scores.

To facilitate the use of these extended metrics, as of Dec 15, 2022 (7), the calculated values for BMIz and BMI percentile (bmiz and bmipct) in the SAS program combine the LMS-based values for children without obesity with the extended values for children with obesity. Therefore, the original BMI metrics, constructed using only the L, M, and S parameters, have been renamed as original_bmiz and original_bmipct. Note that bmiz and original_bmiz (and bmipct and original_bmipct) are identical for children without obesity.

The following figure shows the relation of BMI to both the original and new (extended) values of BMIz. The dashed lines represent the original, LMS-based BMI z-scores from the 2000 CDC growth charts, whereas the solid lines represent the extended bmiz values for BMIs ≥ 95th percentile (z-score = 1.645 SDs).  Among children without obesity, the LMS-based z-scores and the new BMI z-score are identical. At higher BMIs, the relation of BMI to bmiz is fairly linear and does not approach a horizontal asymptote. However, the extended BMIz values are lower than the original values for some BMIs above the 95th percentile, which is most evident for 5- and 18-year-old males in the figure. These lower values arise because children with obesity in more recent NHANES surveys have higher BMIs than those in the original CDC reference population.

Figure 2. Relation of BMI to Original and New (Extended) BMIz at three ages. Dashed lines represent the original z-scores; solid lines are the new z-scores

Figure 2. Relation of BMI to Original and New (Extended) BMIz at three ages. Dashed lines represent the original z-scores; solid lines are the new z-scores.

Severe Obesity

Because the original LMS-based z-scores for very high BMIs resulted in percentiles that differ from those estimated from the data (3), a BMI ≥ 120% of the 95th of the CDC 95th percentile has been widely used for the classification of severe obesity since 2013 (9). This cut-point is approximately equal to the empirical 99th percentile in the growth charts (3). However, among older adolescents, a BMI can be ≥ 35 kg/m2 but be less than 120% of the 95th percentile.  Therefore, severe obesity is defined as either a BMI ≥ 120% of the 95th percentile or a BMI ≥ 35 kg/m2; this aligns with guidelines from the American Heart Association (9) and the American Academy of Pediatrics (10).

The program outputs the variable, bmip95, which expresses a child’s BMI as a percentage of the CDC 95th percentile, which can range from below 50 to over 220.  For example, a bmip95 of 140 would indicate that that child has a BMI equal to 1.4 times the 95th percentile. If desired, one can also calculate the arithmetic difference between a child’s BMI and the CDC 95th percentile. For example, the CDC 95th percentile for a 60-month-old boy is 17.9 kg/m2. If this 5-year-old had a BMI of 21.3 kg/m2, the arithmetic difference would be 3.4 kg/m2 (21.3 – 17.9), and bmip95 would be 119% (100 × 21.3/17.9).

Extreme values, Implausible Values, and Data Errors

As explained in the Modified z-scores documentation  [PDF-297KB], the SAS code also calculates modified z-scores that can be used to identify extreme values that may be errors. These modified z-scores were computed by extrapolating one-half of the distance between 0 and +2 (or between 0 and -2) z-scores to the distribution’s tails. Although these z-scores were developed to identify outliers at a single examination, they have been incorporated into algorithms for cleaning longitudinal data (11).

The output from the SAS program contains biologically implausible value (BIV) flag variables for weight, height, and BMI that are coded as -1 (modified z-score is very low), +1 (modified z-score is very high), or 0 (modified z-score is between these 2 cut points). These BIV flags in the output dataset, _cdcdata, were included in Table 2.  It is essential to realize that an extreme value is not necessarily incorrect, but the value should be further examined, possibly in conjunction with other characteristics of the child.

The upper thresholds for the modified z-score cut-points were initially based on a 1995 WHO publication (12) but were changed in 2016. Several papers (13–15) showed that these cut points excluded many children whose weight, height, or BMI were very likely to have been recorded correctly.  These BIVs can flag potentially problematic data points, but the BIV cut points are not a gold standard. The cut points were chosen to balance the inclusion of extreme values that are likely to be correct and the exclusion of those that are likely to be incorrect (14,15).

Based on the results of these papers, the upper cut points were increased in 2016 from
(1) +5 to +8 for modified z-scores for weight and BMI, and
(2) +3 to +4 for modified z-scores for height.
These new z-score cut points roughly correspond to the modified z-scores for the maximum values of the body size measures among 2- to 18-year-olds in NHANES at many, but not all, ages. However, please be careful in using these cut points to exclude data, as different decisions could alter the prevalence of obesity and severe obesity by up to 1% (14,15).

Other cut points for the modified z-scores may be more appropriate based on additional information in your data. For example, does a child with an extremely high BMI also have a high skinfold thickness or arm circumference or is very tall?  If so, the very high BMI value is more likely to be correct. Similarly, in a longitudinal study or an analysis of electronic health records (EHR), one could assess whether a child has extreme values of weight and BMI at multiple examinations.

Although +8 SDs is the threshold for a high BMI BIV, two young (< 5 years) boys in NHANES (2005-2006 and 2017-2018) have a modified BMIz > 11 SDs. Further, electronic health record datasets that comprise millions of children indicate that many children consistently have a modified BMIz between 10 and 12 SDs at consecutive examinations. Growthcleanr (11), an R package, helps identify errors in longitudinal datasets containing multiple records for each child.

The modified z-scores can be used to construct other cut points for extreme values rather than relying on the BIV flag variables. For example, if you feel using a BMI-for-age cut point of +8 SDs would exclude many values likely to be correct, then you could use mod_bmiz > 10 as the definition of a high BMI BIV. This could be recoded as:

if -5 <= mod_bmiz <= 10 then _bivbmi=0; *plausible;
else if mod_bmiz > 10 then _bivbmi=1; *high BIV;
else if . < mod_bmiz < -5 then _bivbmi= -1; *low BIV;
References
  1. Kuczmarski RJ, Ogden CL, Guo SS, Grummer-Strawn LM, Flegal KM, Mei Z, Wei R, Curtin LR, Roche AF, Johnson CL. 2000 CDC Growth Charts for the United States: methods and development. Vital Health Stat 11 2002;11:1–190.
  2. Flegal KM, Cole TJ. Construction of LMS parameters for the Centers for Disease Control and Prevention 2000 Growth Charts. Natl Health Stat Rep 2013;9:1–3.
  3. Flegal KM, Wei R, Ogden CL, Freedman DS, Johnson CL, Curtin LR. Characterizing extreme values of body mass index-for-age by using the 2000 Centers for Disease Control and Prevention growth charts. Am J Clin Nutr 2009;90:1314–20.
  4. Woo JG. Using body mass index Z-score among severely obese adolescents: a cautionary note. Int J Pediatr Obes 2009;4:405–10.
  5. Freedman DS, Butte NF, Taveras EM, Lundeen EA, Blanck HM, Goodman AB, Ogden CL. BMI z-Scores are a poor indicator of adiposity among 2- to 19-year-olds with very high BMIs, NHANES 1999-2000 to 2013-2014. Obes Silver Spring Md 2017;25:739–46.
  6. Freedman DS, Berenson GS. Tracking of BMI z Scores for Severe Obesity. Pediatrics 2017;140:e20171072.
  7. Hales C, Freedman DS, Akinbami L, Wei R, Ogden CL. Using CDC growth charts to assess and monitor weight status in children and adolescents with extremely high BMI. Natl Cent Health Stat Vital Health Stat 2 2022;197.
  8. Wei R, Ogden CL, Parsons VL, Freedman DS, Hales CM. A method for calculating BMI z-scores and percentiles above the 95th percentile of the CDC growth charts. Ann Hum Biol Taylor & Francis; 2020;47:514–21.
  9. Kelly AS, Barlow SE, Rao G, Inge TH, Hayman LL, Steinberger J, Urbina EM, Ewing LJ, Daniels SR, American Heart Association Atherosclerosis, Hypertension, and Obesity in the Young Committee of the Council on Cardiovascular Disease in the Young, Council on Nutrition, Physical Activity and Metabolism, and Council on Clinical Cardiology. Severe obesity in children and adolescents: identification, associated health risks, and treatment approaches: a scientific statement from the American Heart Association. Circulation 2013;128:1689–712.
  10. Armstrong SC, Bolling CF, Michalsky MP, Reichard KW, Haemer MA, Muth ND, Rausch JC, Rogers VW, Heiss KF, Besner GE, et al. Pediatric Metabolic and Bariatric Surgery: Evidence, Barriers, and Best Practices. Pediatrics 2019;144:e20193223.
  11. Daymont C, Ross ME, Russell Localio A, Fiks AG, Wasserman RC, Grundmeier RW. Automated identification of implausible values in growth data from pediatric electronic health records. J Am Med Inform Assoc JAMIA 2017;24:1080–7.
  12. World Health Organization (WHO). Physical status: the use and interpretation of anthropometry. Report of a WHO Expert Committee. World Health Organ Tech Rep Ser 1995;854:1–452.
  13. Lawman HG, Ogden CL, Hassink S, Mallya G, Vander Veur S, Foster GD. Comparing methods for identifying biologically implausible values in height, weight, and Body Mass Index among youth. Am J Epidemiol 2015;182:359–65.
  14. Freedman DS, Lawman HG, Skinner AC, McGuire LC, Allison DB, Ogden CL. Validity of the WHO cutoffs for biologically implausible values of weight, height, and BMI in children and adolescents in NHANES from 1999 through 2012. Am J Clin Nutr 2015;102:1000–6.
  15. Freedman DS, Lawman HG, Pan L, Skinner AC, Allison DB, McGuire LC, Blanck HM. The prevalence and validity of high, biologically implausible values of weight, height, and BMI among 8.8 million children. Obes Silver Spring Md 2016;24:1132–9.
Connect with Nutrition, Physical Activity, and Obesity