Deduplication of immunization records is a two-fold problem that includes deduplication at the demographic/patient level (e.g., two records describe the same patient) and deduplication at the vaccination event level (e.g., two records describe the same immunization). The Patient Data deduplication Expert Panel was established by the Centers for Disease Control and Prevention (CDC) to address deduplication at the demographic/patient level. This page will be updated with resources as they are completed.
Vaccine Level Deduplication
For best practices regarding vaccine level deduplication, see the Modeling of Immunization Registry Operations Workgroup (MIROW) Vaccination Level Deduplication in Immunization Information Systems [102 pages] guide.
Patient Level Deduplication
A toolkit to assist immunization information systems (IIS) in the evaluation of their deduplication algorithms was developed by the National Center for Immunization and Respiratory Diseases of the CDC. This toolkit helps registries assess their system's ability to prevent/remove duplicate records. The data and procedures in this toolkit can help identify strengths and weaknesses in the deduplication algorithms. The test data set consists of test cases that are representative of known duplicate record problems in real data, based on the information provided by various IIS personnel. These test cases are fictitious examples; they do not correspond to information on real children. The evaluation tool application will calculate sensitivity and specificity values for the IIS's algorithms based on the test results. The sensitivity value measures how well the system performs at recognizing known duplicate records. The specificity is the value that reflects how accurate the duplicate record detection is by measuring the rate at which non-duplicate records are misidentified.
For more specific information on client level deduplication, NCIRD is currently working on a project to improve the capacity and synergies of client deduplication up to best practice standards in the IIS communities. This includes the development of a robust test suite and test cases to test both sensitivity and specificity. More information can be found on the Expert Panel Project page. Until these guidelines and test cases are available, please refer to the 2004 test cases below.
- Click on the link to download the zip file containing the toolkit components.
Download deduplication toolkit [13.34 MB]
WARNING: This zip file is extremely large and may take a while to download.
- Toolkit User Manual (EvaluationToolManual.doc) - a document that will guide the user through installation of the tool and use of the test data set.
- Evaluation Tool (DupEval.cab) - a program that will calculate sensitivity and specificity values for an IIS's deduplication algorithm.
- Test Data Set (DupTestData.csw) - a file containing test cases representative of known duplicate record problems.
- Run the setup.exe program* to install the Evaluation Tool on a PC.
*NOTE: for older Windows 95 systems you may first need to run DCOM98.exe which is included in the kit.
If you have any questions, please send them via email to firstname.lastname@example.org.
- Page last reviewed: December 12, 2012
- Page last updated: December 12, 2012
- Content source: