|
|
|
|
|||||||||||||
|
||||||||||||||||
|
|
How To... Review Data Quality - Periodic Report
PNSS Case Study
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||
The third section of the PNSS Periodic Summary of Record Volume and Data Quality Report identifies specific fields that contain data errors. Errors are broken down into types, for example Missing Data or Mis-codes, and the field number, name, and percent of records that contain errors for each error type are listed. An error threshold (percent of records) is identified for each type of error. Unusual distribution errors and standard deviation errors list the number and distribution of the records for each field in error.
The PNSS data quality report identifies fields with more than 20% of data missing. The threshold of 20% was established based on criteria used by other CDC surveillance systems such as the Behavioral Risk Factor Surveillance System. The percent of records missing data for a field are reported for each required record type. Supplemental (optional) fields are listed in the missing data section of the PNSS Data Quality Report only if more than 20% and less than 100% of records are missing data.
To identify supplemental fields, review list of PNSS fields edited for Missing
Information on whether data are missing on all records, missing on a particular record type, or missing on a portion of records and record types can be used to target efforts to improve the completeness of the data.
Sample Data Quality Section:
Missing Data
|
|
The field numbers and names with Missing Data on
more than 20% of Records are listed in the first two columns.
Reading across the page, note that the record types checked are listed
next. Only records that should contain the data are searched for
missing data. "All" record types may be checked, or only certain types
of record types may be checked. The Record Types Checked column refers
to the type of PNSS records that should contain the data field. "All"
includes complete, postpartum only, and prenatal only records. The
completion code of 1, A, or B includes complete records, the completion
code 2—6 includes prenatal only records, and the completion code 7—8
includes postpartum only records. For example, the Food Stamps—Prenatal Visit field should be reported for complete records and prenatal only records. |
|
|
The column, % Records Missing Data indicates that 89.1% of the data for the Food Stamps—Prenatal Visit field was missing among complete and prenatal only records that should contain the data. |
|
|
The data field is further analyzed by percent missing for record type. The percent of complete records that are missing food stamp data was 100%, the percent of prenatal only records that are missing data was 0%. |
This data quality error identifies problems with 1) assigning completion codes to PNSS records which specify whether the record is a complete record, prenatal only record, or postpartum only record and 2) linking prenatal and postpartum record information in PNSS records.
Sample Data Quality Section:
Completion Code or Record Linkage Errors
|
|
One of the Completion Code or Record Linkage Error listed in this report is Prenatal Only Records Containing Data in PP (postpartum) Fields on more than 2% of Records. Field numbers and names are listed in the first two columns. |
|
|
The % of Prenatal Only Records Containing Data in PP (postpartum) fields shows the percent of records with postpartum data in a field. The 2 postpartum fields listed here, Cigarettes/Day-Last 3 Months and Ever Breastfed, relate to maternal postpartum data and infant health data collected at the postpartum visit. These fields were incorrectly populated on prenatal only records. |
|
|
The other
completion code or record linkage error listed in this report is
Postpartum Only Records Containing Data in Prenatal Fields on more
than 2% of Records. The percent of Postpartum Only Records
Containing Data in Prenatal Fields shows the percent of records with
prenatal data in a field. The 3 prenatal fields listed here, Woman's
Weight English-Prenatal Visit, Hemoglobin/Hematocrit-Prenatal Visit
and Date of Hemoglobin/Hematocrit Measure relate
to maternal prenatal data collected at the prenatal visit. These
fields were incorrectly populated on postpartum only records. The other types of Completion Code and Record Linkage errors were not found on this transaction file. These errors include 1) Complete and Prenatal Only Records With Insufficient Prenatal Data, 2) Complete and Postpartum Only Records With Insufficient Postpartum Data, and 3) Duplicate Field Values on > 90% of Complete Records. |
Mis-codes are unacceptable or invalid codes for a specific field. Mis-codes in the data quality report include clinics with at least 10 records that have a clinic number not included in the PedNSS/PNSS Code File, fields containing zero as a value when zero is never valid on more than 2% of records or fields with other unacceptable values on more than 5% of records.
Sample Data Quality Section:
Mis-codes
|
|
Clinic Numbers Not on CDC's List of Numbers
Provided by State/Contributor includes 1 clinic with at least 10
records that were not on the PedNSS/PNSS Code File submitted to CDC by
the contributor. The clinic number and the number of records included
for the clinic is listed in the first two columns. This data error can be easily corrected. The contributor should send an updated PedNSS/PNSS Code File to CDC anytime a change occurs in the code file such as deleting clinics, adding new clinics, or changing the number or name of an existing clinic. |
|
|
Fields Containing Zero as a Value When Zero is
Never Valid includes fields that have an invalid code of zero on
more than 2% of the records. The Household Smoking - Prenatal Visit field contained
zero as a value on 68.0% of the records. Acceptable codes for this
field are 1 = Yes, 2 = No and 9 or blank = Uunknown. This field may have been either 1) initialized to zero, or 2) miscoded 0 =
No
and 1 = Yes on complete and/or
prenatal only records. The Household Smoking - PP Visit field had about 73% of records in error. Many more records, 1,835, contain this error. This field may also have been either initialized to zero or miscoded on complete and/or postpartum only records. Fields With Other Unacceptable Values were not identified on this PNSS transaction file. |
A biologically implausible value (BIV) is a data value beyond what is
considered to be a biologically plausible range. BIVs are extremely rare
and therefore considered to be an error. The fields with BIVs on more than
3% of records are listed as errors in this report. Information on BIVs
include, the field number and name, the values that are outside of the
expected range and the number and percent of records with the error
Sample Data Quality Section:
BIVs
|
|
Fields with BIVs on > 3% of Records includes Hematocrit-Prenatal Visit field coded as 20.0, 22.0, 23.0, and 25.0 (%) on a total of 3.1% of the records with data. The Edit Criteria, listed beneath the erroneous code, indicate that values less than 24% (240) and greater than 51% (510) are considered biologically implausible. Because the contributor predominantly reports hemoglobin values to the PNSS, very few records on this file contained hemocrit values. Therefore, this is not a significant data quality problem. |
Cross-check errors are coding inconsistencies between specific fields
and for PNSS only, they include invalid date combinations that occur on
more than 5% of records.
Sample Data Quality Section:
Cross - Check Errors
|
|
Fields With Cross-Check Errors on > 5% of Records shows an inconsistency between the Ever Breastfed and the Currently Breastfeeding fields. The edit criteria explains that if Currently Breastfeed = 1 (yes) then Ever Breastfed should = 9 or blank (not applicable) on complete on postpartum only records. However, this was not the case on over 5% of the records. |
Unusual data distributions are fields that have data with a
distribution pattern that is not typical based on observations of national
PNSS data.
Sample Data Quality Section:
Unusual Data Distribution
|
|
Fields With No Acceptable Data Other Than Zeros shows that no fields with no acceptable data other than zeroes were identified. |
|
|
Fields With Other Unusual Data Distributions includes Infant's Date of Birth. Specific field edits for frequency of certain codes or values have been developed to identify data that do not follow the national PNSS data distributions and are therefore considered to be a data quality error. The edit criteria for a field is listed when the data distribution is in error. Often times there are several edits for the data distribution of a field, so it is necessary to review all the edits to determine which edit caused the field to be identified as an error in the report. |
|
|
The edit criteria for Infant's Date of Birth is:
In this case, the error was the result of 11.4% of infants having a gestational age of exactly 280 days. Gestational age of an infant is calculated in the PNSS by establishing the number of days between the mother's last menstrual period and the infant's date of birth. Only 4% of infants are born exactly on the mother's expected date of delivery (a full term pregnancy is 40 weeks or 280 days). If more than 10% of infants have such a gestational age, the PNSS contributor may have either 1) reported the infant's date of birth as the mother's expected date of delivery or vice versa (when one or the other is unknown), or 2) estimated the mother's last menstrual period (when unknown) by subtracting 280 days from the infant's date of birth. |
Standard deviation (SD) is a measure of the amount of variation among
the values such as hemoglobin or hematocrit in a population. Low or High
Standard Deviation cutoffs are used to define data that are more or less spread out
and with more or less variation than would be expected for the population. The field number and name, the number of records
that were analyzed with the field, and the standard deviation (SD) for the
data in the field are reported for each field with a low or high SD.
Sample Data Quality Section:
Low or High Standard Deviations
|
|
Fields With Low or High Standard Deviations include the Hemoglobin - Prenatal Visit field. The SD is 1.32g/dL higher than the SD edit criteria of 1.3 g/dL listed in the Edit Criteria. |
|
|
The data distribution shown in the Percent of Records column indicates more records with a both lower and higher hemoglobin values (i.e. < 10 g/dL or > 13 g/dL) when compared to the column of Expected Percent of Records based on the reference population. |
Page last reviewed: May 1, 2009
Page last updated: May 1, 2009
Content Source: Division of Nutrition, Physical Activity and Obesity,
National Center for Chronic Disease
Prevention and Health Promotion
*This document is available in Portable Document Format (PDF). You will need Acrobat Reader (a free application) to view and print this document.