How To... Review Data Quality
CDC Data Editing
CDC Editing of the Data for Completeness and Data Quality
The edit process at CDC begins with a transaction file from the
contributor that is a flat ASCII data file containing either PedNSS or
PNSS transaction records. The transaction file is received electronically
using the Secure Data Network (SDN) or on mailed CD-Rom on a monthly or quarterly schedule. Once the transaction
file is received by CDC the automated editing process is initiated.
- First, the computer edit program counts the total number of
transaction records received and the record volume by the month and year
of the child's initial date of visit for PedNSS and the mother's
expected date of delivery (EDD) or last menstrual period (LMP) if EDD is
not available or infant's birth date for PNSS.
- Next, the transaction file is edited for duplicate records and
errors in the critical fields of a record. Duplicate records and
records with critical errors are rejected by CDC and are not included in additional
editing for data quality.
- A duplicate transaction record is a record that is mostly or
entirely identical to another record in the same transaction file.
When duplicate records are identified the first reported record is
- A critical error is missing or invalid data in a field that is
considered critical for data analysis, that is, without it, the analysis
of the PedNSS or PNSS data is not possible. Therefore, records with
critical errors are rejected from the transaction file. Critical
fields are defined differently for PedNSS and PNSS, however, they
include fields such as state, substate, date of visit, and individual
- Then, the fields in each record are edited for data quality. The
data quality edits are conducted in the following order: missing,
biologically implausible values (BIVs), cross-check errors, unusual data distributions, and low and high
- When a field is identified with a data quality
problem, the records causing the problem are not included in the next
level of analysis to prevent a data quality problem from appearing in
more than one data quality problem category in the report. For example,
missing data are not included in any of the remaining data quality
- One exception to this rule is the unusual data distribution edits for
PedNSS and PNSS. These edit errors occur only if values for a specific
field on an entire PedNSS or PNSS transaction file are unusually
distributed. A field can have a mis-code error as well as an unusual
distribution error, for example.
Another exception to this rule is the completion code and record
linkage edit for PNSS. This edit is conducted after the missing data
edit and if a completion code and record linkage error is identified,
the individual fields in the record are further edited until the next
data quality error in the field is found. So for PNSS, a field can have
a completion code and record linkage error as well as an additional
data quality error.
- The transaction file is added to the master file and edited for
duplicate records on the master file. If duplicate records are
identified, the record on the master file is replaced.
- Finally, the Periodic Summary of Record Volume and Data Quality
report is generated.
- The transaction file is then used to update the master
file of records for the contributor. The master file is a cleaned file that the CDC updates after editing each
transaction file from the contributor. The master file is saved for the
next transaction file and update.
See a graphic illustration of the CDC editing process.
back to top
Page last reviewed: May 1, 2009
Page last updated: May 1, 2009
Content Source: Division of Nutrition, Physical Activity and Obesity,
National Center for Chronic Disease
Prevention and Health Promotion