A SAS Program for the 2000 CDC Growth Charts (ages 0 to <20 years)
The purpose of this SAS program is to calculate the percentiles and z-scores (standard deviations) for a child’s sex and age for BMI, weight, height, and head circumference based on the CDC growth charts. Weight-for-height percentiles and z-scores are also calculated. Observations that contain extreme values are flagged as being biologically implausible. These extreme values, however, are not necessarily incorrect.
Although the SAS program can be used to calculate z-scores and percentiles for children up to 20 years of age, the World Health Organization (WHO) growth charts are recommended for children <24 months of age. There are several computer programs available on the WHO and CDC sites that use the WHO growth charts; the latter follows the same steps as does this SAS program for the CDC growth charts.
The SAS program, cdc-source-code.sas [SAS - 8KB] (files are below, in step #1), calculates these z-scores and percentiles for children in your data based on reference data in cdc_ref.sas7bdat. If you’re not using SAS, you can download CDCref_d.csv [CVS -160KB], and create a program based on cdc-source-code.sas [SAS-8KB] to do the calculations.
Instructions for SAS users
Step 1: Download the SAS program (cdc-source-code.sas [SAS -8KB]) and the reference data file (CDCref_d.sas7bdat). Do not alter these files, but move them to a folder (directory) that SAS can access.
For the following example, the files have been saved in c:\sas\growth charts\cdc\data.
Step 2: Create a libname statement in your SAS program to point at the folder location of ‘CDCref_d.sas7bdat’. An example would be:
libname refdir 'c:\sas\growth charts\cdc\data';
Note the SAS code expects this name to be refdir; do not change this name.
Step 3: Set your existing dataset containing height, weight, sex, age and other variables into a temporary dataset, named mydata. Variables in your dataset should be renamed and coded as follows:
Table 1
Variable | Description |
---|---|
agemos | Child's age in months; must be present. The program assumes you know the number of months to the nearest day based on the dates of birth and examination. For example, if a child was born on Oct 1, 2007 and was examined on Nov 15, 2011, the child’s age would be 1506 days or 49.48 months. In everyday usage, this age would be stated as 4 years or as 49 months. However, if 49 months were used as the age of all children who were between 49.0 and <50 months in your data, the estimated z-scores would be slightly too high because, on average, these children would be taller, weigh more, and have a higher BMI than children who are exactly 49.0 months of age. This bias would be greater if only completed years of age were known, and the age of all children between 4 and <5 years was represented as 48 months. |
sex | Coded as 1 for boys and 2 for girls. |
height | Height in cm. This is either standing height (for children who are ≥ 24 months of age or recumbent length (for children < 24 months of age); both are input as height. If standing height was measured for some children less than 24 months of age, you should add 0.8 cm to these values (see page 8 of http://www.cdc.gov/nchs/data/series/sr_11/sr11_246.pdf [PDF-5.4MB]). If recumbent length was measured for some children who are ≥ 24 months of age, subtract 0.8 cm. |
weight | Weight (kg) |
bmi | BMI (Weight (kg) /Height (m)^{2}). If your data doesn’t contain BMI, the program calculates it. If BMI is present in your data, the program will not overwrite it. |
headcir | Head circumference (cm) |
Z-scores and percentiles for variables that are not in mydata will be coded as missing (.) in the output dataset (named _cdcdata). Sex (coded as 1 for boys and 2 for girls) and agemos must be in mydata. It’s unlikely that the SAS code will overwrite other variables in your dataset, but you should avoid having variable names that begin with an underscore, such as _bmi.
Step 4: Copy and paste the following line into your SAS program after the line (or lines) in step #3.
%include 'c:\sas\growth charts\cdc\data\CDC-source-code.sas'; run;
If necessary, change this statement to point at the folder containing the downloaded ‘CDC-source-code.sas’ file. This tells your SAS program to run the statements in ‘CDC-source-code.sas’.
Step 5: Submit the %include statement. This will create a dataset, named _cdcdata, which contains all of your original variables along with z-scores, percentiles, and flags for extreme values. The names and descriptions of these new variables in _cdcdata are in Table 2. Additional information on the extreme z-scores is given in a separate section that follows the “Example SAS Code”.
Table 2: Z-Scores, percentiles, and extreme (biologically implausible, BIV) values in output dataset, _cdcdata
Description | Variable | Cutoff for Extreme Z-Scores | ||||
---|---|---|---|---|---|---|
Percentile | Z-score | Modified Z-score to Identify Extreme Values | Flag for Extreme | Low z-score (Flag coded as -1) | High z-score (Flag coded | |
Weight-for-age for children between 0 and 239 (inclusive) months of age | wapct | waz | _Fwaz | _bivwt | < -5 | > 5 |
Height-for-age for children between 0 and 239 (inclusive) months of age. | hapct | haz | _Fhaz | _bivht | < -5 | >3 |
Weight-for-height for children with heights between 45 and 121 cm (this height range approximately covers ages 0 to 6 y) | whpct | whz | _Fwhz | _bivwh | < -4 | >5 |
BMI-for-age for children between 24 and 239 months of age | bmipct | bmiz | _Fbmiz | _bivbmi | < -4 | >5 |
Head circumference-for-age for children between 0 and 35 (inclusive) months of age | headcpct | headcz | _Fheadcz | _bivhc | < -5 | >5 |
Step 6: Examine the new dataset, _cdcdata, with PROC MEANS or some other procedure to verify that the z-scores and other variables have been created. If a variable in Table 1 was not in your original dataset (e.g., head circumference), the output dataset will indicate that all values for the percentiles and z-scores of this variable are missing. If values for other variables are unexpectedly missing, make sure that you’ve renamed and recoded variables as indicated in Table 1 and that your SAS dataset is named mydata. The program should not modify your original data, but will add new variables to your original dataset.
Example SAS code corresponding to steps 2 to 6. You can simply cut and paste these lines into a SAS program, but you’ll need to change the libname and %include statements to point at the folders containing the downloaded files.
libname refdir 'c:\sas\growth charts\cdc\data';
data mydata; set whatever-your-original-dataset-is-named;
%include 'c:\sas\growth charts\cdc\data\CDC-source-code.sas';
proc means data=_cdcdata; run;
Additional Information
Z-scores are calculated as =
Z = [ ((value / M)**L) – 1] / (S * L) ,
in which ‘value’ is the child’s BMI, weight, height, etc. The L, M, and S values are in CDCref_d.sas7bdat and vary according to the child’s sex and age or according to the child’s sex and height. Percentiles are then calculated from the z-scores (for example, a z-score of 1.96 would be equal to the 97.5 percentile). For more information on the LMS method, see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC27365/
Extreme or Biologically implausible Values
The SAS code also flags extreme values (biologically implausible values, or BIVs). As explained in the BIV cutoffs documentation [PDF - 27KB], these BIVs are based on modified z-scores that were calculated using a different method. These BIV flag variables are coded as -1 (modified z-score is extremely low), +1 (modified z-score is extremely high), or 0 (modified z-score is between these 2 cut-points). These BIVs flags, along with other variables that are in the output dataset, _cdcdata, are shown in Table 2.
The modified z-scores (3rd column of Table 2) can be used to construct other cut-points for extreme (or biologically implausible) values. For example, if the distribution of BMI is strongly skewed to the right, you might use F_bmiz > 8 (rather than 5) as the definition of an extremely high BMI-for-age. This could be recoded as:
if -5 <= _Fbmiz <= 8 then _bivbmi=0; *plausible;
else if _Fbmiz > 8 then _bivbmi=1; *high BIV;
else if . < _Fbmiz < -5 then _bivbmi= -1; *low BIV;
There are also 2 overall indicators of extreme values in the output dataset: _bivlow and _bivhigh. These 2 variables indicate whether any measurement is extremely high (_bivhigh=1) or extremely low (_bivlow=1). If a child does not have an extreme value for any measurement, both variables are coded as 0. A biologically implausible value is not necessarily incorrect, but the value should further studied, possibly in conjunction with other characteristics of the child. For example, if a child’s weight is implausibly high, is the child also very tall and are there other children who weigh nearly as much?
Defining Extreme Obesity (the 99th percentile of BMI-for-age)
The use of the LMS parameters of the CDC growth charts has been shown to result in inaccurate estimates of the empirical percentiles at very high BMI values (e.g., the 99th percentile) http://www.ajcn.org/content/90/5/1314.full.pdf [PDF - 154KB]. Therefore, rather than using the BMI-for-age percentiles (and z-scores) to identify and track children who are extremely obese, it is recommended that these high BMI values be expressed as a percentage of the 95th percentile. A BMI value that is 20% greater than the 95th percentile (relative to the CDC reference population) is approximately equal to the 99th percentile of the reference population.
The SAS code creates a variable, bmipct95, to simplify the use of this definition. This variable expresses a child’s BMI as a percentage of the 95th percentile for that child’s sex and age. Bmipct95 can range from <50 (for very thin children) to >220 (for very heavy children). A child with a bmipct95 of 100 is at the 95th percentile of BMI-for-age. A value of 120 would indicate that the child’s BMI is 20% greater than the 95th percentile.
- Page last reviewed: May 7, 2015
- Page last updated: May 7, 2015
- Content source: