How To... Review Data Quality - Periodic Data
Summary and Data Quality Sections
The Summary section of the PNSS Periodic Summary of Record Volume and
Data Quality Report summarizes the errors identified in the Data Quality
Section of the report. The Summary includes a list of the types of data
quality problems identified in the report and the number of fields with
each type of data quality problem.
Data Quality Section
The data quality section includes:
Missing is used to measure the completeness of the data. The edit
criteria is a field with missing data on more than 10% of PedNSS records
and more than 20% of PNSS records.
If 100% of data are missing then ask the questions: a) is the information
being collected in clinics, b) is the computer information system
capturing the information, and c) is the data being extracted from the
computer information system and included in the transaction file?
If more than 10% or 20% but less than 100% of data are missing, this
indicates that the data are captured by the computer information system,
but not all clinics are collecting data or only some of the clinics
collect data some of the time. However, for PNSS, this may be the result of data not being selected and extracted from
the computer information system for all the appropriate record types
(complete, prenatal only, or postpartum only records).
- In PedNSS, missing data for hemoglobin and hematocrit are not
calculated since hematology assessment is not performed at every clinic
- When data are required for one field or the other, for example
hemoglobin or hematocrit, both fields are assessed together to determine
Review list of PedNSS fields edited for Missing.
Review list of PNSS
fields edited for Missing.
back to top
Mis-codes are unacceptable data for a specific field. The
edit criteria for miscode errors are:
- A clinic code number on more than 10 records does not match a
clinic code number on the PedNSS/PNSS code file at the CDC. This code
file prepared by the contributor contains geographic codes for the
state, clinic/school, county (all required) and a choice of one or more
set of codes for local agencies, metro areas, and regions/school
districts (all optional). An updated code file should be sent to CDC
anytime there are changes in these geographic codes.
- A field that contains zero when zero is not an acceptable code or value on
more than 2% of records.
For example, if valid codes for the field, Food Stamps are 1 = Yes, 2
= No, and blank or 9 = Unknown or refused, then a code of 0 (zero)
is a miscode because it is an invalid code.
- A field that has unacceptable codes on more than 5% of the records.
For example, if valid codes for the field, Currently Breastfed, are
1 = Yes and 2 = No, then a code of 3 is an unacceptable value or a mis-code.
Review list of PedNSS fields edited for Mis-codes.
Review list of PNSS fields edited for Mis-codes.
back to top
A biologically implausible value (BIV) is a data value beyond the range
considered to be biologically plausible. These BIVs represent values that
are rarely observed, generally fewer than 1 in 10,000 records (.0001% of
records) and therefore thought to be in error. When more than 3% of
records have a field with a BIV the field is reported as an error. CDC has
tried to develop a consistent definition for BIVs across the different
health indicators by using cut-off points that generally represent ± 4
For example, the biologically plausible range of prenatal hemoglobin (Hb)
is 8.0–17.0 g/d, so the biologically implausible range is defined as < 8.0 g/dL
or >17.0 g/dL. Hemoglobin BIVs on the high side may in part reflect
hematocrits mistakenly entered in the Hb field. Similarly hematocrit (Hct)
BIVs on the low side may in part reflect Hb mistakenly entered in the Hct
Often reporting and recording errors contribute to a high proportion of
records with BIVs in a particular field. The BIV cut-offs selected for the
edit criteria for each field or indicator were based on a review of PedNSS
and PNSS data and external data sources. Additional information about how
the cut-offs for BIVs were developed for each field that is edited is
Review list of PedNSS fields edited and BIV
Review list of PNSS fields edited and BIV
back to top
Cross-check errors are coding inconsistencies between specific fields.
The edit criteria for cross-check errors are:
- Field coding is inconsistent on more than 5% of the records.
For example, if Currently Breastfed = 1 (yes, infant is currently breastfed) and Length of
Time Breastfed = 5 (number of weeks breastfed for infant who has quit
breastfeeding), both fields are listed as a cross-check error.
- For PNSS only, field coding resulting in invalid combination of
dates on more than 5% of the records. Dates in PNSS are expected to
follow certain patterns as shown in the examples below.
- For complete records, the initial visit date should be before the
infant's date of birth, and the
infant's date of birth should come before the postpartum visit date. An invalid date combination would be if the date of
birth occurs before the initial visit date for complete records.
- The date of last menstrual period (LMP) should be before the
initial visit date, WIC enrollment date, estimated date of delivery
(EDD), date of birth, and postpartum visit date. An invalid date
combination would be if the date of LMP does not precede or is not
before these dates.
- For postpartum only records, the initial visit date and postpartum
visit date should be the same and after the infant's date of birth. For CDC
data analysis purposes, the postpartum visit date is copied into the
initial visit date field of the postpartum only transaction record. An
invalid date combination would be if the date of initial visit and
postpartum visit were not the same for postpartum only records.
- PNSS date combinations that are considered invalid and are
cross-check errors are included in the list of PNSS fields that are
edited for cross-check errors below.
Review list of PedNSS fields edited for Cross-Check Errors.
Review list of PNSS
fields edited for Cross-Check Errors.
back to top
Unusual data distributions are fields that have data following a
pattern that is not typical based on observations of national PedNSS and
The edit criteria for unusual data distribution errors are:
- A field containing no data in the acceptable ranges of the field
other than zero.
For example, zero is a valid code for Field 58, Drinks/Week-Last 3
Months, however when zero is coded on 100% of the records, this may
indicate that the field was initialized to zero and no valid data were
added to the field.
- Fields with values of unusual data distributions.
- Measured data in PNSS and PedNSS that are edited for unusual data
distributions include maternal weight and height, prepregnancy weight,
weight gain, birthweight, and hemoglobin and hematocrit. When these
data fields have more than 20% of values below the 5th or above the
95th percentile of the national data distribution, they have an
unusual data distribution compared to national data and are therefore
suspected of errors.
Hemoglobin (Hb) and hematocrit (Hct) values are also evaluated for
digit preference defined as rounding to the nearest integer or half
integer. Hemoglobin and Hematocrit values should be recorded as actual
values and not rounded values. Based on national PedNSS and PNSS data,
20% of Hb and Hct values are expected to fall on the integer and half
integer. A higher than expected percentage indicates excessive
rounding of the Hb or Hct values that results in an unusual
distribution of the data. The edit criteria for digit preference is
more than 30% of Hb or Hct values that fall on the integer or half
integer (e.g. 11.0, 11.5, 12.0 etc.).
- Specific field edits for frequency of responses for data items
have been developed to identify data that do not follow the
distribution of coded responses in the national PedNSS and PNSS. For
example, the edits to identify unusual data distributions for maternal
education in PNSS are:
- more than 20% of women completed less than the 7th grade
(national data distribution is about 5% for women that completed
less than the 7th grade),
- more than 20% of women completed over 15 years of education
(national data distribution is about 5% for women that completed at
over 15 years of education),
- fewer women completed 12 grades of education than completed any
other single grade (national data distribution is about 40% for
women that completed 12 grades of education, more than any other
single grade), or
- more than 1% of women received no education (national data
distribution is about 0.4% for women that received no education).
Additional information about how the edits for unusual data
distributions were developed for each field that is edited is provided
Review list of PedNSS fields and edit criteria for Unusual Data
Review list of PNSS fields and edit criteria for Unusual Data
back to top
Standard deviation (SD) is a measure of the amount of variation among
values such as hemoglobin or weight-for-height in a
population. Low or smaller standard deviation define data that are
more or less spread out (with more or less variation) than would be expected for
the population. High or larger standard deviation define data that is more
spread out than would be expected for the population.
In PNSS, the standard deviation of the prenatal hemoglobin (Hb)/hematocrit
(Hct)distribution compares the variability in the hemoglobin/hematocrit
measures reported in the PNSS to the variability observed for healthy iron
supplemented pregnant women measured in four European studies. Data from
the four studies are aggregated into a reference for hematologic status
during pregnancy. Because hemoglobin changes during pregnancy, and the PNSS
data reflect measures taken throughout pregnancy on iron supplemented and unsupplemented women, we expect greater variability in the PNSS data than
in the European reference (SD=0.9 g/dL hemoglobin value and SD= 2.5%
hematocrit concentration). Therefore, the expected SD in PNSS is 0.9 to
1.2 g/dl for hemoglobin and 2.5% to 3.5% for hematocrit concentration. The
cutoffs for low and high standard deviation were established slightly
outside these limits (Hb < 0.8 g/dL or > 1.3 g/dL and Hct < 2.4% or >
In PedNSS, the standard deviation of the hemoglobin/hematocrit
distribution compares the variability in Hb/Hct measures reported to the
PedNSS to the variability observed for Hbs and Hcts measured among
children 1-5 years old in the Second National Health and Nutrition
Examination Survey (NHANES II). We do not expect the PedNSS standard
deviations to be identical to the Hb/Hct SD of NHANES (SD=0.8 g/dL
hemoglobin value and 2.3% hematocrit concentration). Therefore, the
expected SD in PedNSS is 0.8 to 1.1 for hemoglobin and 2.3% to 3.3% for
hematocrit concentration. The cutoffs for low and high standard deviations
were established slightly outside these limits (Hb < 0.7 g/dL or > 1.2 g/dL
and Hct < 2.2% or >3.4%.)
In PedNSS, the low and high standard deviation errors for growth
indicators including BMI-for-age, weight-for-length, weight-for-age and
height-for-age are identified only in the Annual Summary of Record
Volume and Data Quality report and will be discussed in that section.
Review list of PedNSS fields edited for Low or High Standard Deviation.
Review list of PNSS fields edited for Low or High Standard Deviation.
back to top
PNSS records contain prenatal and postpartum data that are recorded at
different times, i.e., during and after a pregnancy. Contributors are
expected to combine information from these two different time periods into
a single record. A completion code is assigned to a record to indicate
whether the record contains data from both time periods (prenatal or
postpartum) defined as a "complete record." Data from only the prenatal or
postpartum periods are therefore defined as "prenatal only" or "postpartum only"
This data quality error identifies problems with:
- assigning completion codes to PNSS records and
- linking prenatal and postpartum record information in PNSS records.
Completion Code or Record Linkage Errors are errors that result in
incorrect data for the record type or insufficient data for the record
type, or duplicate field values on a record. The errors that are reported
- Prenatal Only Records Containing Data in Postpartum (PP) Fields on >
2% of Records.
- Prenatal only records containing data in postpartum fields may result
from incorrect assignment of completion code. It is possible that these
records are really complete records, not prenatal only records.
Alternatively, if 100% of prenatal only records contain data in a
particular postpartum field, it may be a result of initializing the
field to zero when generating the PNSS record. For some postpartum
fields such as Multivitamin Consumption Prior to Pregnancy, zero is a valid value. Postpartum
program participation (WIC, Food Stamps, Medicaid, TANF) are examples of
other such fields that should be left blank on prenatal only records. If
infant fields, such as Infant's Date of Birth contain data on prenatal
only records, the records should be labeled as complete records, even if
not all infant fields are extracted. Lastly, a prenatal field value may
be incorrectly moved to a postpartum field when the record is extracted.
- Postpartum Only Records Containing Data in Prenatal Fields on > 2%
of Records that have incorrect data for the record type.
- Postpartum only records containing data in prenatal fields can be
caused by similar errors as described above for prenatal only records
that is incorrect assignment of completion code, initializing prenatal
fields to zero, and incorrectly moving postpartum field values into
prenatal fields when extracting the record.
- Complete and Prenatal Only Records with Insufficient Prenatal Data
and Complete and Postpartum Only Records with Insufficient Postpartum
Data that have insufficient data for record type. The edit criteria are:
- More than 10% of complete and prenatal only records with less than
2 prenatal fields containing data values.
- More than 10% of complete and
postpartum only records with less than 2 postpartum fields containing
Errors of insufficient data for the record type are most likely the
result of incorrect assignment of the Completion Codes. For example, a
prenatal only record with less than 2 prenatal fields containing data is
probably a postpartum only record that was incorrectly assigned the
Completion Code of Prenatal Only rather than Postpartum Only.
- Duplicate Field Values on >90% of Complete Records that include two
different PNSS fields that contain exactly the same data on the majority
of complete records, e.g. a woman's weight value at her prenatal visit is
the same as her weight at her postpartum visit. The edit criteria are:
- Complete records with duplicate field values on more than 90% of
Review list of PNSS fields edited for Completion Code or Record Linkage
back to top
Page last reviewed: May 1, 2009
Page last updated: May 1, 2009
Content Source: Division of Nutrition, Physical Activity and Obesity,
National Center for Chronic Disease
Prevention and Health Promotion