Skip Navigation Links
Centers for Disease Control and Prevention


 CDC Home Search A-Z Index
Pediatric and Pregnancy Nutrition Surveillance System
Site Map Topic Index Glossary Bibliography Help
Illustration of a mother and children
Home
Pediatric Data Tables
Pregnancy Data Tables
Publications
What Is PedNSS/PNSS?
How To...
 Read A Data Table
Review Data Quality
 CDC Data Editing
 Periodic Data Quality Report
 Periodic Report PNSS Case Study
 Interpret Data
 Disseminate Data
Additional Tools

How To... Review Data Quality - Periodic Data Quality
Summary and Data Quality Sections

 
More info on Periodic Data Quality Report:
 Record Volume Update Section
 Summary and Data Quality Sections
   

Summary Section

The Summary section of the PNSS Periodic Summary of Record Volume and Data Quality Report summarizes the errors identified in the Data Quality Section of the report. The Summary includes a list of the types of data quality problems identified in the report and the number of fields with each type of data quality problem.


Data Quality Section

The data quality section includes:


Missing

Missing is used to measure the completeness of the data. The edit criteria is a field with missing data on more than 10% of PedNSS records and more than 20% of PNSS records.

If 100% of data are missing then ask the questions: a) is the information being collected in clinics, b) is the computer information system capturing the information, and c) is the data being extracted from the computer information system and included in the transaction file?

If more than 10% or 20% but less than 100% of data are missing, this indicates that the data are captured by the computer information system, but not all clinics are collecting data or only some of the clinics collect data some of the time. However, for PNSS, this may be the result of data not being selected and extracted from the computer information system for all the appropriate record types (complete, prenatal only, or postpartum only records).

Special Considerations:

  • In PedNSS, missing data for hemoglobin and hematocrit are not calculated since hematology assessment is not performed at every clinic visit.
  • When data are required for one field or the other, for example hemoglobin or hematocrit, both fields are assessed together to determine percent missing.

Review list of PedNSS fields edited for Missing.
Review list of PNSS fields edited for Missing.

back to top


Mis-codes

Mis-codes are unacceptable data for a specific field. The edit criteria for miscode errors are:

  • A clinic code number on more than 10 records does not match a clinic code number on the PedNSS/PNSS code file at the CDC. This code file prepared by the contributor contains geographic codes for the state, clinic/school, county (all required) and a choice of one or more set of codes for local agencies, metro areas, and regions/school districts (all optional). An updated code file should be sent to CDC anytime there are changes in these geographic codes.
  • A field that contains zero when zero is not an acceptable code or value on more than 2% of records.
    For example, if valid codes for the field, Food Stamps are 1 = Yes, 2 = No, and blank or 9 = Unknown or refused, then a code of 0 (zero) is a miscode because it is an invalid code.
  • A field that has unacceptable codes on more than 5% of the records.
    For example, if valid codes for the field, Currently Breastfed, are 1 = Yes and 2 = No, then a code of 3 is an unacceptable value or a mis-code.

Review list of PedNSS fields edited for Mis-codes.
Review list of PNSS fields edited for Mis-codes.

back to top


Biologically Implausible Values (BIVs)

A biologically implausible value (BIV) is a data value beyond the range considered to be biologically plausible. These BIVs represent values that are rarely observed, generally fewer than 1 in 10,000 records (.0001% of records) and therefore thought to be in error. When more than 3% of records have a field with a BIV the field is reported as an error. CDC has tried to develop a consistent definition for BIVs across the different health indicators by using cut-off points that generally represent 4 standard deviations.

For example, the biologically plausible range of prenatal hemoglobin (Hb) is 8.017.0 g/d, so the biologically implausible range is defined as < 8.0 g/dL or >17.0 g/dL. Hemoglobin BIVs on the high side may in part reflect hematocrits mistakenly entered in the Hb field. Similarly hematocrit (Hct) BIVs on the low side may in part reflect Hb mistakenly entered in the Hct field.

Often reporting and recording errors contribute to a high proportion of records with BIVs in a particular field. The BIV cut-offs selected for the edit criteria for each field or indicator were based on a review of PedNSS and PNSS data and external data sources. Additional information about how the cut-offs for BIVs were developed for each field that is edited is provided below.

Review list of PedNSS fields edited and BIV cutoffs.
Review list of PNSS fields edited and BIV cutoffs.

back to top


Cross-Check Errors

Cross-check errors are coding inconsistencies between specific fields. The edit criteria for cross-check errors are:

  • Field coding is inconsistent on more than 5% of the records.
    For example, if Currently Breastfed = 1 (yes, infant is currently breastfed) and Length of Time Breastfed = 5 (number of weeks breastfed for infant who has quit breastfeeding), both fields are listed as a cross-check error.
  • For PNSS only, field coding resulting in invalid combination of dates on more than 5% of the records. Dates in PNSS are expected to follow certain patterns as shown in the examples below.
    • For complete records, the initial visit date should be before the infant's date of birth, and the infant's date of birth should come before the postpartum visit date. An invalid date combination would be if the date of birth occurs before the initial visit date for complete records.
    • The date of last menstrual period (LMP) should be before the initial visit date, WIC enrollment date, estimated date of delivery (EDD), date of birth, and postpartum visit date. An invalid date combination would be if the date of LMP does not precede or is not before these dates.
    • For postpartum only records, the initial visit date and postpartum visit date should be the same and after the infant's date of birth. For CDC data analysis purposes, the postpartum visit date is copied into the initial visit date field of the postpartum only transaction record. An invalid date combination would be if the date of initial visit and postpartum visit were not the same for postpartum only records.
    • PNSS date combinations that are considered invalid and are cross-check errors are included in the list of PNSS fields that are edited for cross-check errors below.

Review list of PedNSS fields edited for Cross-Check Errors.
Review list of PNSS fields edited for Cross-Check Errors.

back to top


Unusual Data Distribution

Unusual data distributions are fields that have data following a pattern that is not typical based on observations of national PedNSS and PNSS data.

The edit criteria for unusual data distribution errors are:

  • A field containing no data in the acceptable ranges of the field other than zero. For example, zero is a valid code for Field 58, Drinks/Week-Last 3 Months, however when zero is coded on 100% of the records, this may indicate that the field was initialized to zero and no valid data were added to the field.
  • Fields with values of unusual data distributions.
    • Measured data in PNSS and PedNSS that are edited for unusual data distributions include maternal weight and height, prepregnancy weight, weight gain, birthweight, and hemoglobin and hematocrit. When these data fields have more than 20% of values below the 5th or above the 95th percentile of the national data distribution, they have an unusual data distribution compared to national data and are therefore suspected of errors.

      Hemoglobin (Hb) and hematocrit (Hct) values are also evaluated for digit preference defined as rounding to the nearest integer or half integer. Hemoglobin and Hematocrit values should be recorded as actual values and not rounded values. Based on national PedNSS and PNSS data, 20% of Hb and Hct values are expected to fall on the integer and half integer. A higher than expected percentage indicates excessive rounding of the Hb or Hct values that results in an unusual distribution of the data. The edit criteria for digit preference is more than 30% of Hb or Hct values that fall on the integer or half integer (e.g. 11.0, 11.5, 12.0 etc.).
    • Specific field edits for frequency of responses for data items have been developed to identify data that do not follow the distribution of coded responses in the national PedNSS and PNSS. For example, the edits to identify unusual data distributions for maternal education in PNSS are:
      1. more than 20% of women completed less than the 7th grade (national data distribution is about 5% for women that completed less than the 7th grade),
      2. more than 20% of women completed over 15 years of education (national data distribution is about 5% for women that completed at over 15 years of education),
      3. fewer women completed 12 grades of education than completed any other single grade (national data distribution is about 40% for women that completed 12 grades of education, more than any other single grade), or
      4. more than 1% of women received no education (national data distribution is about 0.4% for women that received no education).

Additional information about how the edits for unusual data distributions were developed for each field that is edited is provided below.

Review list of PedNSS fields and edit criteria for Unusual Data Distributions.
Review list of PNSS fields and edit criteria for Unusual Data Distributions.

back to top


Low and High Standard Deviation

Standard deviation (SD) is a measure of the amount of variation among values such as hemoglobin or weight-for-height in a population. Low or smaller standard deviation define data that are more or less spread out (with more or less variation) than would be expected for the population. High or larger standard deviation define data that is more spread out than would be expected for the population.

In PNSS, the standard deviation of the prenatal hemoglobin (Hb)/hematocrit (Hct)distribution compares the variability in the hemoglobin/hematocrit measures reported in the PNSS to the variability observed for healthy iron supplemented pregnant women measured in four European studies. Data from the four studies are aggregated into a reference for hematologic status during pregnancy. Because hemoglobin changes during pregnancy, and the PNSS data reflect measures taken throughout pregnancy on iron supplemented and unsupplemented women, we expect greater variability in the PNSS data than in the European reference (SD=0.9 g/dL hemoglobin value and SD= 2.5% hematocrit concentration). Therefore, the expected SD in PNSS is 0.9 to 1.2 g/dl for hemoglobin and 2.5% to 3.5% for hematocrit concentration. The cutoffs for low and high standard deviation were established slightly outside these limits (Hb < 0.8 g/dL or > 1.3 g/dL and Hct < 2.4% or > 3.6%.)

In PedNSS, the standard deviation of the hemoglobin/hematocrit distribution compares the variability in Hb/Hct measures reported to the PedNSS to the variability observed for Hbs and Hcts measured among children 1-5 years old in the Second National Health and Nutrition Examination Survey (NHANES II). We do not expect the PedNSS standard deviations to be identical to the Hb/Hct SD of NHANES (SD=0.8 g/dL hemoglobin value and 2.3% hematocrit concentration). Therefore, the expected SD in PedNSS is 0.8 to 1.1 for hemoglobin and 2.3% to 3.3% for hematocrit concentration. The cutoffs for low and high standard deviations were established slightly outside these limits (Hb < 0.7 g/dL or > 1.2 g/dL and Hct < 2.2% or >3.4%.)

In PedNSS, the low and high standard deviation errors for growth indicators including BMI-for-age, weight-for-length, weight-for-age and height-for-age are identified only in the Annual Summary of Record Volume and Data Quality report and will be discussed in that section.

Review list of PedNSS fields edited for Low or High Standard Deviation.
Review list of PNSS fields edited for Low or High Standard Deviation.

back to top


PNSS Completion Code or Record Linkage Errors

PNSS records contain prenatal and postpartum data that are recorded at different times, i.e., during and after a pregnancy. Contributors are expected to combine information from these two different time periods into a single record. A completion code is assigned to a record to indicate whether the record contains data from both time periods (prenatal or postpartum) defined as a "complete record." Data from only the prenatal or postpartum periods are therefore defined as "prenatal only" or "postpartum only" records.

This data quality error identifies problems with:

  1. assigning completion codes to PNSS records and
  2. linking prenatal and postpartum record information in PNSS records.

Completion Code or Record Linkage Errors are errors that result in incorrect data for the record type or insufficient data for the record type, or duplicate field values on a record. The errors that are reported include:
 

  • Prenatal Only Records Containing Data in Postpartum (PP) Fields on > 2% of Records.
    • Prenatal only records containing data in postpartum fields may result from incorrect assignment of completion code. It is possible that these records are really complete records, not prenatal only records. Alternatively, if 100% of prenatal only records contain data in a particular postpartum field, it may be a result of initializing the field to zero when generating the PNSS record. For some postpartum fields such as Multivitamin Consumption Prior to Pregnancy, zero is a valid value. Postpartum program participation (WIC, Food Stamps, Medicaid, TANF) are examples of other such fields that should be left blank on prenatal only records. If infant fields, such as Infant's Date of Birth contain data on prenatal only records, the records should be labeled as complete records, even if not all infant fields are extracted. Lastly, a prenatal field value may be incorrectly moved to a postpartum field when the record is extracted.
  • Postpartum Only Records Containing Data in Prenatal Fields on > 2% of Records that have incorrect data for the record type.
    • Postpartum only records containing data in prenatal fields can be caused by similar errors as described above for prenatal only records that is incorrect assignment of completion code, initializing prenatal fields to zero, and incorrectly moving postpartum field values into prenatal fields when extracting the record.
  • Complete and Prenatal Only Records with Insufficient Prenatal Data and Complete and Postpartum Only Records with Insufficient Postpartum Data that have insufficient data for record type. The edit criteria are:
    • More than 10% of complete and prenatal only records with less than 2 prenatal fields containing data values.
    • More than 10% of complete and postpartum only records with less than 2 postpartum fields containing data values.

    Errors of insufficient data for the record type are most likely the result of incorrect assignment of the Completion Codes. For example, a prenatal only record with less than 2 prenatal fields containing data is probably a postpartum only record that was incorrectly assigned the Completion Code of Prenatal Only rather than Postpartum Only.

  • Duplicate Field Values on >90% of Complete Records that include two different PNSS fields that contain exactly the same data on the majority of complete records, e.g. a woman's weight value at her prenatal visit is the same as her weight at her postpartum visit. The edit criteria are:
    • Complete records with duplicate field values on more than 90% of records.

Review list of PNSS fields edited for Completion Code or Record Linkage Errors.

back to top

Page last reviewed: May 1, 2009
Page last updated: May 1, 2009
Content Source: Division of Nutrition, Physical Activity and Obesity, National Center for Chronic Disease Prevention and Health Promotion

 

 



Policies and Regulations | Accessibility

CDC Home | Search | A-Z Index

United States Department of Health and Human Services
Centers for Disease Control and Prevention
National Center for Chronic Disease Prevention and Health Promotion
Division of Nutrition and Physical Activity