Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content

A SAS Program for the 2000 CDC Growth Charts (ages 0 to <20 years)

Note that the BIV cut points were changed in 2016. These changes did not affect the calculation of any of the z-scores or percentiles, or the subsequent calculation of overweight or obesity.

The purpose of this SAS program is to calculate the percentiles and z-scores (standard deviations) for a child’s sex and age for BMI, weight, height, and head circumference based on the CDC growth charts. Weight-for-height percentiles and z-scores are also calculated. Observations that contain extreme values are flagged as being biologically implausible. These extreme values, however, are not necessarily incorrect.

Although the SAS program can be used to calculate z-scores and percentiles for children up to 20 years of age, the World Health Organization (WHO) growth charts are recommended for children <24 months of age. There are several computer programs available on the WHO and CDC sites that use the WHO growth charts; the latter follows the same steps as does this SAS program for the CDC growth charts.

The SAS program, cdc-source-code.sas [SAS - 8KB] (files are below, in step #1), calculates these z-scores and percentiles for children in your data based on reference data in cdc_ref.sas7bdat. If you’re not using SAS, you can download CDCref_d.csv [CVS -160KB], and create a program based on cdc-source-code.sas [SAS-8KB] to do the calculations.

Instructions for SAS users

Step 1: Download the SAS program (cdc-source-code.sas [SAS -8KB]) and the reference data file (CDCref_d.sas7bdat). Do not alter these files, but move them to a folder (directory) that SAS can access.

For the following example, the files have been saved in c:\sas\growth charts\cdc\data.

Step 2: Create a libname statement in your SAS program to point at the folder location of ‘CDCref_d.sas7bdat’. An example would be:
libname refdir 'c:\sas\growth charts\cdc\data';

Note the SAS code expects this name to be refdir; do not change this name.

Step 3: Set your existing dataset containing height, weight, sex, age and other variables into a temporary dataset, named mydata. Variables in your dataset should be renamed and coded as follows:

Table 1

Variable Description
agemos Child's age in months; must be present. The program assumes you know the number of months to the nearest day based on the dates of birth and examination. For example, if a child was born on Oct 1, 2007 and was examined on Nov 15, 2011, the child’s age would be 1506 days or 49.48 months. In everyday usage, this age would be stated as 4 years or as 49 months. However, if 49 months were used as the age of all children who were between 49.0 and <50 months in your data, the estimated z-scores would be slightly too high because, on average, these children would be taller, weigh more, and have a higher BMI than children who are exactly 49.0 months of age. This bias would be greater if only completed years of age were known, and the age of all children between 4 and <5 years was represented as 48 months.
If age is known only as the completed number of months (as is data from NHANES 1988-1994 and 1999-2010), consider adding 0.5 so that the maximum error would be 15 days. If age is given as the completed number of years, multiply by 12 and consider adding 6.
sex Coded as 1 for boys and 2 for girls.
height Height in cm. This is either standing height (for children who are ≥ 24 months of age or recumbent length (for children < 24 months of age); both are input as height. If standing height was measured for some children less than 24 months of age, you should add 0.8 cm to these values (see page 8 of http://www.cdc.gov/nchs/data/series/sr_11/sr11_246.pdf [PDF-5.4MB]). If recumbent length was measured for some children who are ≥ 24 months of age, subtract 0.8 cm.
weight Weight (kg)
bmi BMI (Weight (kg) /Height (m)2). If your data doesn’t contain BMI, the program calculates it. If BMI is present in your data, the program will not overwrite it.
headcir Head circumference (cm)

Z-scores and percentiles for variables that are not in mydata will be coded as missing (.) in the output dataset (named _cdcdata). Sex (coded as 1 for boys and 2 for girls) and agemos must be in mydata. It’s unlikely that the SAS code will overwrite other variables in your dataset, but you should avoid having variable names that begin with an underscore, such as _bmi.

Step 4: Copy and paste the following line into your SAS program after the line (or lines) in step #3.
%include 'c:\sas\growth charts\cdc\data\CDC-source-code.sas'; run;

If necessary, change this statement to point at the folder containing the downloaded ‘CDC-source-code.sas’ file. This tells your SAS program to run the statements in ‘CDC-source-code.sas’.

Step 5: Submit the %include statement. This will create a dataset, named _cdcdata, which contains all of your original variables along with z-scores, percentiles, and flags for extreme values. The names and descriptions of these new variables in _cdcdata are in Table 2. Additional information on the extreme z-scores is given in a separate section that follows the “Example SAS Code”.

 Top of Page

Table 2: Z-Scores, percentiles, and extreme (biologically implausible, BIV) values in output dataset, _cdcdata

Description

Variable

Cutoff for Extreme Z-Scores

Percentile

Z-score

Modified Z-score to Identify Extreme Values

Flag for Extreme
Values

Low z-score (Flag coded as -1)

High z-score (Flag coded
as +1)

Weight-for-age for children between 0 and 239 (inclusive) months of age

wapct

waz

_Fwaz

_bivwt

< -5

> 8*

Height-for-age for children between 0 and 239 (inclusive) months of age.

hapct

haz

_Fhaz

_bivht

< -5

>4*

Weight-for-height for children with heights between 45 and 121 cm (this height range approximately covers ages 0 to 6 y)

whpct

whz

_Fwhz

_bivwh

< -4

>8*

BMI-for-age for children between 24 and 239 months of age

bmipct

bmiz

_Fbmiz

_bivbmi

< -4

>8*

Head circumference-for-age for children between 0 and 35 (inclusive) months of age

headcpct

headcz

_Fheadcz

_bivhc

< -5

>5

* Changed in 2016. Additional information is below

Step 6: Examine the new dataset, _cdcdata, with PROC MEANS or some other procedure to verify that the z-scores and other variables have been created. If a variable in Table 1 was not in your original dataset (e.g., head circumference), the output dataset will indicate that all values for the percentiles and z-scores of this variable are missing. If values for other variables are unexpectedly missing, make sure that you’ve renamed and recoded variables as indicated in Table 1 and that your SAS dataset is named mydata. The program should not modify your original data, but will add new variables to your original dataset.

Example SAS code corresponding to steps 2 to 6. You can simply cut and paste these lines into a SAS program, but you’ll need to change the libname and %include statements to point at the folders containing the downloaded files.

libname refdir 'c:\sas\growth charts\cdc\data';
data mydata; set whatever-your-original-dataset-is-named;
%include 'c:\sas\growth charts\cdc\data\CDC-source-code.sas';
proc means data=_cdcdata; run;

Additional Information

Z-scores are calculated as =

Z = [ ((value / M)**L) – 1] / (S * L) ,

in which ‘value’ is the child’s BMI, weight, height, etc. The L, M, and S values are in CDCref_d.sas7bdat and vary according to the child’s sex and age or according to the child’s sex and height. Percentiles are then calculated from the z-scores (for example, a z-score of 1.96 would be equal to the 97.5 percentile). For more information on the LMS method, see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC27365/

 Top of Page

Extreme or Biologically implausible Values

As explained in the Modified z-scores documentation [PDF - 367KB] , the SAS code also calculates modified z-scores that can be used to identify extreme values that may be errors. These modified z-scores are based on extrapolating one-half of the distance between 0 and +2 z-scores to the tails of the distribution. The output from the SAS program contains BIV flag variables that are coded as -1 (modified z-score is extremely low), +1 (modified z-score is extremely high), or 0 (modified z-score is between these 2 cut-points). These BIVs flags (e.g., _bivbmi), along with other variables that are in the output dataset, _cdcdata, are shown in Table 2. A biologically implausible value is not necessarily incorrect, but the value should be further examined, possibly in conjunction with other characteristics of the child.

 Top of Page

2016 Change to BIV cut-points: Rationale

The modified z-scores used for the upper range of valid values was changed in 2016 for a number of the growth chart parameters. Previously, the cut-points for extremely high values were based on recommendations from a 1995 WHO publication (1), but several papers (2–6) have since indicated that these cut-points were probably too restrictive. The WHO cut-points identified many values that were extremely high, but were probably not errors.

On the basis of an analyses of 2- to 18-year-olds in NHANES 1999-2000 through 2011-2012 (3) and 2- to 4-year-olds in CDC’s Pediatric Nutrition Surveillance System (PedNSS) (6), we now suggest that the upper BIV cut points be increased from
(1) +5 to +8 for modified z-scores for weight and BMI, and
(2) +3 to +4 for modified z-scores for height.
These new z-score cut-points roughly correspond to the modified z-scores for the maximum values of the body size measures among 2- to 18-year-olds in NHANES. We are not making changes to the cut-points for the extremely low values of the body size measurements.

If BIV cut-points are used to exclude data, this change would likely affect comparisons of data calculated and cleaned using these new BIV cut-points with data that used the older (WHO 1995) values. The effects of these changes will likely differ across datasets depending upon the true prevalence of extreme values and the accuracy of the recorded data. In an analysis (3) of NHANES 1999-2012 data, for example, as compared with estimates obtained using the WHO 1995 cut-points, the use of the 2016 cut-points increased the prevalence of obesity and extreme obesity (120% of the 95th percentile of BMI-for-age) by about 0.5 percentage points. (Because of the extensive data cleaning in NHANES, published estimates from these surveys do not exclude any of the extremely high values.) In an analysis of PedNSS (6), compared with the WHO 1995 cut-points, the use of the 2016 cut points increased the prevalence of both obesity and extreme obesity by 0.9 percentage points. Because of the relatively low prevalence of extreme obesity among children, particularly pre-school children, a 0.5% to 0.9% increase results in a large proportional change in prevalence.

BIVs vs. Data Errors

These BIVs can be used to flag potentially problematic data points, and the 2016 cut-points were chosen to balance the inclusion of extreme values that are likely to be correct and the exclusion of those that are likely to be incorrect. However, other cut-points can be used and may be more appropriate based on other information specific to your data. If desired, the modified z-scores (3rd column of Table 2) can be used to construct other cut-points for extreme (or biologically implausible) values rather than relying on the BIV flag variables. For example, if you feel that use of the BMI-for-age cut-point of +8 would result in the inclusion of many values that are likely to be errors, you could use F_bmiz > 6 as the definition of a high BMI BIV. This could be re-coded in the output dataset as:

if -5 <= _Fbmiz <= 6 then _bivbmi=0; *plausible;
else if _Fbmiz > 6 then _bivbmi=1; *high BIV;
else if . < _Fbmiz < -5 then _bivbmi= -1; *low BIV;

It would also be possible to use the modified z-scores to identify children who would have been flagged with the older WHO cut-points.

Once a data point has been flagged as a potential problem, other information from the child, if available, could be used to help identify errors and help in the decision to include or exclude the value. For example, if a child with an extremely high BMI also has a high skinfold thickness or arm circumference, the BMI value is more likely to be correct than if the other measure is low. Similarly, in a longitudinal study, one could assess whether a child with an extreme value at 1 time point also has a high value at other examinations. If only weight and height are available at a single examination, one might consider whether a child who has an extremely high weight is also very tall, and if there are other children who weigh nearly as much.

 Top of Page

Defining Extreme Obesity (the 99th percentile of BMI-for-age)

The use of the LMS parameters of the CDC growth charts has been shown to result in inaccurate estimates of the empirical percentiles at very high BMI values (e.g., the 99th percentile) http://www.ajcn.org/content/90/5/1314.full.pdf [PDF - 154KB]. Therefore, rather than using the BMI-for-age percentiles (and z-scores) to identify and track children who are extremely obese, it is recommended that these high BMI values be expressed as a percentage of the 95th percentile. A BMI value that is 20% greater than the 95th percentile (relative to the CDC reference population) is approximately equal to the 99th percentile of the reference population.

The SAS code creates a variable, bmipct95, to simplify the use of this definition. This variable expresses a child’s BMI as a percentage of the 95th percentile for that child’s sex and age. Bmipct95 can range from <50 (for very thin children) to >220 (for very heavy children). A child with a bmipct95 of 100 is at the 95th percentile of BMI-for-age. A value of 120 would indicate that the child’s BMI is 20% greater than the 95th percentile.

 Top of Page

References

  1. WHO Expert Committee. Physical status: the use and interpretation of anthropometry. WHO Tech. Rep. Ser. 1995;pages 217 to 250.
  2. Lawman HG, Ogden CL, Hassink S, et al. Comparing methods for identifying biologically implausible values in height,weight, and Body Mass Index among youth. Am. J. Epidemiol. 2015;182(4):359–65.
  3. Freedman DS, Lawman HG, Skinner AC, et al. Validity of the WHO cutoffs for biologically implausible values of weight, height, and BMI in children and adolescents in NHANES from 1999 through 2012. Am. J. Clin. Nutr. 2015;102(5):1000–6.
  4. Lo JC, Maring B, Chandra M, et al. Prevalence of obesity and extreme obesity in children aged 3-5 years. Pediatr. Obes. 2014;9(3):167–75.
  5. Dennison BA, Edmunds LS, Stratton HH, et al. Rapid infant weight gain predicts childhood overweight. Obesity (Silver Spring). 2006;14(3):491–9.
  6. Freedman DS, Lawman HG, Pan L, et al. The prevalence and validity of high, biologically implausible values of weight, height and BMI among 8.8 million children. Obes. (Silver Spring). 2016;Mar 17.  PMID 26991694.
Top