Skip Navigation Links
Centers for Disease Control and Prevention


 CDC Home Search A-Z Index
Pediatric and Pregnancy Nutrition Surveillance System
Site Map Topic Index Glossary Bibliography Help
Illustration of a mother and children
Home
Pediatric Data Tables
Pregnancy Data Tables
Publications
What Is PedNSS/PNSS?
How To...
 Read A Data Table
 Review Data Quality
 CDC Data Editing
 Periodic Data Quality Report
 Periodic Report PNSS Case Study
 Interpret Data
 Disseminate Data
Additional Tools

How To... Review Data Quality - Periodic Report PNSS Case Study
Data Quality Section

 
 More info on Periodic Report PNSS Case Study:
 Record Volume Update Section
 Summary Section
 Data Quality Section
   

The third section of the PNSS Periodic Summary of Record Volume and Data Quality Report identifies specific fields that contain data errors. Errors are broken down into types, for example Missing Data or Mis-codes, and the field number, name, and percent of records that contain errors for each error type are listed. An error threshold (percent of records) is identified for each type of error. Unusual distribution errors and standard deviation errors list the number and distribution of the records for each field in error.


Missing Data

The PNSS data quality report identifies fields with more than 20% of data missing. The threshold of 20% was established based on criteria used by other CDC surveillance systems such as the Behavioral Risk Factor Surveillance System. The percent of records missing data for a field are reported for each required record type. Supplemental (optional) fields are listed in the missing data section of the PNSS Data Quality Report only if more than 20% and less than 100% of records are missing data.

To identify supplemental fields, review list of PNSS fields edited for Missing

Information on whether data are missing on all records, missing on a particular record type, or missing on a portion of records and record types can be used to target efforts to improve the completeness of the data.

Sample Data Quality Section:
Missing Data

Sample PedNSS Table

view tableView Sample Data Quality Section, PNSS Table Missing Data

1 The field numbers and names with Missing Data on more than 20% of Records are listed in the first two columns. Reading across the page, note that the record types checked are listed next. Only records that should contain the data are searched for missing data. "All" record types may be checked, or only certain types of record types may be checked. The Record Types Checked column refers to the type of PNSS records that should contain the data field. "All" includes complete, postpartum only, and prenatal only records. The completion code of 1, A, or B includes complete records, the completion code 2—6 includes prenatal only records, and the completion code 7—8 includes postpartum only records.

For example, the Food Stamps—Prenatal Visit field should be reported for complete records and prenatal only records.

2 The column, % Records Missing Data indicates that 89.1% of the data for the Food Stamps—Prenatal Visit field was missing among complete and prenatal only records that should contain the data.

Sample PedNSS Table

view tableView Sample Data Quality Section, PNSS Table Missing Data

3 The data field is further analyzed by percent missing for record type. The percent of complete records that are missing food stamp data was 100%, the percent of prenatal only records that are missing data was 0%.


Completion Code and Record Linkage Errors

This data quality error identifies problems with 1) assigning completion codes to PNSS records which specify whether the record is a complete record, prenatal only record, or postpartum only record and 2) linking prenatal and postpartum record information in PNSS records.

Sample Data Quality Section:
Completion Code or Record Linkage Errors

sample PNSS table
 

view tableView Sample Data Quality Section, PNSS Table Completion Code or Record Linkage Errors

1 One of the Completion Code or Record Linkage Error listed in this report is Prenatal Only Records Containing Data in PP (postpartum) Fields on more than 2% of Records. Field numbers and names are listed in the first two columns.
2 The % of Prenatal Only Records Containing Data in PP (postpartum) fields shows the percent of records with postpartum data in a field. The 2 postpartum fields listed here, Cigarettes/Day-Last 3 Months and Ever Breastfed, relate to maternal postpartum data and infant health data collected at the postpartum visit. These fields were incorrectly populated on prenatal only records.
3 The other completion code or record linkage error listed in this report is Postpartum Only Records Containing Data in Prenatal Fields on more than 2% of Records. The percent of Postpartum Only Records Containing Data in Prenatal Fields shows the percent of records with prenatal data in a field. The 3 prenatal fields listed here, Woman's Weight English-Prenatal Visit, Hemoglobin/Hematocrit-Prenatal Visit and Date of Hemoglobin/Hematocrit Measure relate to maternal prenatal data collected at the prenatal visit. These fields were incorrectly populated on postpartum only records.

The other types of Completion Code and Record Linkage errors were not found on this transaction file. These errors include 1) Complete and Prenatal Only Records With Insufficient Prenatal Data, 2) Complete and Postpartum Only Records With Insufficient Postpartum Data, and 3) Duplicate Field Values on > 90% of Complete Records.


Mis-codes

Mis-codes are unacceptable or invalid codes for a specific field. Mis-codes in the data quality report include clinics with at least 10 records that have a clinic number not included in the PedNSS/PNSS Code File, fields containing zero as a value when zero is never valid on more than 2% of records or fields with other unacceptable values on more than 5% of records.

Sample Data Quality Section:
Mis-codes

sample PNSS table

view tableView Sample Data Quality Section, PNSS Table Mis-codes

1 Clinic Numbers Not on CDC's List of Numbers Provided by State/Contributor includes 1 clinic with at least 10 records that were not on the PedNSS/PNSS Code File submitted to CDC by the contributor. The clinic number and the number of records included for the clinic is listed in the first two columns.

This data error can be easily corrected.  The contributor should send an updated  PedNSS/PNSS Code File to CDC anytime a change occurs in the code file such as deleting clinics, adding new clinics, or changing the number or name of an existing clinic.

 

sample PNSS table

view tableView Sample Data Quality Section, PedNSS Table Mis-codes

2 Fields Containing Zero as a Value When Zero is Never Valid includes fields that have an invalid code of zero on more than 2% of the records. The Household Smoking - Prenatal Visit field contained zero as a value on 68.0% of the records. Acceptable codes for this field are 1 = Yes, 2 = No and 9 or blank = Uunknown. This field may have been either 1) initialized to zero, or 2) miscoded 0 = No and 1 = Yes on complete and/or prenatal only records.

The Household Smoking - PP Visit field had about 73% of records in error. Many more records, 1,835, contain this error. This field may also have been either initialized to zero or miscoded on complete and/or postpartum only records. Fields With Other Unacceptable Values were not identified on this PNSS transaction file.


Biologically Implausible Values

A biologically implausible value (BIV) is a data value beyond what is considered to be a biologically plausible range. BIVs are extremely rare and therefore considered to be an error. The fields with BIVs on more than 3% of records are listed as errors in this report. Information on BIVs include, the field number and name, the values that are outside of the expected range and the number and percent of records with the error

Sample Data Quality Section:
BIVs

sample PNSS table

view tableView Sample Data Quality Section, PNSS Table Biologically Implausible Values (BIVs)

1 Fields with BIVs on > 3% of Records includes Hematocrit-Prenatal Visit field coded as 20.0, 22.0, 23.0, and 25.0 (%) on a total of 3.1% of the records with data. The Edit Criteria, listed beneath the erroneous code, indicate that values less than 24% (240) and greater than 51% (510) are considered biologically implausible. Because the contributor predominantly reports hemoglobin values to the PNSS, very few records on this file contained hemocrit values. Therefore, this is not a significant data quality problem.


Cross-Check Errors

Cross-check errors are coding inconsistencies between specific fields and for PNSS only, they include invalid date combinations that occur on more than 5% of records.

Sample Data Quality Section:
Cross - Check Errors

sample PNSS table

view tableView Sample Data Quality Section, PNSS Table Cross-Check Errors

1 Fields With Cross-Check Errors on > 5% of Records shows an inconsistency between the Ever Breastfed and the Currently Breastfeeding fields. The edit criteria explains that if Currently Breastfeed = 1 (yes) then Ever Breastfed should = 9 or blank (not applicable) on complete on postpartum only records. However, this was not the case on over 5% of the records.


Unusual Data Distribution

Unusual data distributions are fields that have data with a distribution pattern that is not typical based on observations of national PNSS data.

Sample Data Quality Section:
Unusual Data Distribution

sample PNSS table

view tableView Sample Data Quality Section, PNSS Table Unusual Data Distribution

1 Fields With No Acceptable Data Other Than Zeros shows that no fields with no acceptable data other than zeroes were identified.

 

sample PNSS table

view tableView Sample Data Quality Section, PNSS Table Unusual Data Distribution

1 Fields With Other Unusual Data Distributions includes Infant's Date of Birth. Specific field edits for frequency of certain codes or values have been developed to identify data that do not follow the national PNSS data distributions and are therefore considered to be a data quality error. The edit criteria for a field is listed when the data distribution is in error. Often times there are several edits for the data distribution of a field, so it is necessary to review all the edits to determine which edit caused the field to be identified as an error in the report.
2 The edit criteria for Infant's Date of Birth is:

a. more than 10% of infants had a gestational age at birth of exactly 280 days (national data distribution is about 4% o infants with such a gestational age).

In this case, the error was the result of 11.4% of infants having a gestational age of exactly 280 days.

Gestational age of an infant is calculated in the PNSS by establishing the number of days between the mother's last menstrual period and the infant's date of birth. Only 4% of infants are born exactly on the mother's expected date of delivery (a full term pregnancy is 40 weeks or 280 days). If more than 10% of infants have such a gestational age, the PNSS contributor may have either 1) reported the infant's date of birth as the mother's expected date of delivery or vice versa (when one or the other is unknown), or 2) estimated the mother's last menstrual period (when unknown) by subtracting 280 days from the infant's date of birth.


Low or High Standard Deviation

Standard deviation (SD) is a measure of the amount of variation among the values such as hemoglobin or hematocrit in a population. Low or High Standard Deviation cutoffs are used to define data that are more or less spread out and with more or less variation than would be expected for the population. The field number and name, the number of records that were analyzed with the field, and the standard deviation (SD) for the data in the field are reported for each field with a low or high SD.

Sample Data Quality Section:
Low or High Standard Deviations

sample PNSS table

view tableView Sample Data Quality Section, PNSS Table Standard Deviation

1 Fields With Low or High Standard Deviations include the Hemoglobin - Prenatal Visit field. The SD is 1.32g/dL higher than the SD edit criteria of 1.3 g/dL listed in the Edit Criteria.
2 The data distribution shown in the Percent of Records column indicates more records with a both lower and higher hemoglobin values (i.e. < 10 g/dL or > 13 g/dL) when compared to the column of Expected Percent of Records based on the reference population.

back to top

Page last reviewed: May 1, 2009
Page last updated: May 1, 2009
Content Source: Division of Nutrition, Physical Activity and Obesity, National Center for Chronic Disease Prevention and Health Promotion

*This document is available in Portable Document Format (PDF). You will need Acrobat Reader (a free application) to view and print this document.

 

 

 



Policies and Regulations | Accessibility

CDC Home | Search | A-Z Index

United States Department of Health and Human Services
Centers for Disease Control and Prevention
National Center for Chronic Disease Prevention and Health Promotion
Division of Nutrition and Physical Activity