Registry Plus™ Link Plus Features and Future Plans
Features in Current Version (2.0)
- Improved file import process.
- Enhanced support for deduplication linkage.
- Ability to use nicknames in the First Name matching method.
- New and powerful manual review and file export functions.
- Improved context-sensitive online help.
- Supports North American Association of Central Cancer Registries (NAACCR) file, fixed width file, delimited file, and CRS Plus database.
- Computes probabilistic record linkage scores based on the theoretical framework developed by Fellegi and Sunter. (Fellegi IP, Sunter AB. A theory for record linkage. Journal of the American Statistical Association 1969;64:1183–1210).
- Handles missing values of matching variables by treating null or empty values as missing data automatically, and allows the user to indicate additional values to treat as missing data.
- Facilitates a simple and efficient blocking mechanism ("OR blocking") by indexing the variables for blocking and comparing the pairs with the identical values on at least one variable.
- Offers a choice of two phonetic coding systems (Soundex and NYSIIS), as well as several variable-specific matching methods that find partial, approximate, or fuzzy matches.
- Provides the following matching methods, or comparators (in addition to the exact matching method, several approximate matching methods find partial, approximate, or fuzzy matches, and are customized for the content of specific data items or types)—
- Value-specific (frequency-based): Sets weights for matching values based on the frequencies of values in the files being compared. A match on a frequent value is associated with a low weight, but a match on a rare value is associated with a high weight.
- Last name and first name: Incorporates both partial matching and value-specific matching and NYSIIS phonetic code to account for minor typographical errors, misspellings, and hyphenated names. For first names, nicknames are matched with formal names.
- Middle name: Accounts for occurrence of the middle initial versus the full middle name.
- Date: Incorporates partial matching on separate date components, and accounts for transposition of date components, as well as missing month or day values.
- Social Security number: Accounts for typographical errors and transposition of digits. Also matches a 9-digit number in one file with a 4-digit number in another file.
- Generic string: Uses an edit distance function and incorporates partial matching to account for typographical errors.
- ZIP Code: Enables the match between a 9-digit ZIP Code and a 5-digit ZIP Code.
Features in Beta Version (3.0 Beta)
External Data Linkage
- Removes the limitation on the number of records included in file 2; the program works for any number of records in file 2 as long as the computer has sufficient memory to read in data from file 1, which has a limit of 4.5–4.8 million records.
- Provides a new Best Match option to choose whether to write all potential matches (many:many linkages) or only the matches with the highest score to the linkage report (one:many linkages).
- Allows users to choose whether to output a non-match file.
- Provides confirmation-like method for variables like address that contribute positive weight for the linkage score with agreement, but 0 weight with disagreement.
- Provides Social Security number-like matching method for a generic ID.
- Provides a new name matching method that is more robust against the frequency of names or outlier names such as misspelled names.
- Allows variables to be selected as matching variables multiple times to perform array comparisons automatically.
- Allows users to provide their own name frequency files for use by name matching methods.
- Allows users to use "Assign Set ID" to group matches into mutually exclusive match sets.
- Removes the limitation of the maximum size of 30,000 pairs on manual review forms; the new limitation is 300,000 pairs.
- Allows users to export the results of manual review to a delimited format file or to a fixed-width file (including a NAACCR-formatted file).
- Allows users to export all non-matching records from the linkage in a single export file. The export file includes all records from file 2 that generated a linkage score lower than the specified cutoff value for the linkage on the Linkage Configuration window, and any records that have been assigned a false match status upon review of potential matches on the Manual Review window.
The NPCR Registry Plus development team is continually enhancing applications as required to meet the changing needs of the NPCR and the greater cancer registry community. As there is no currently scheduled release date for a version 3.0 production version of Link Plus, CDC is making the beta program available upon request. Please note that your organization may not allow installation of a beta program.
The Link Plus Development Priority List is a list of development tasks prioritized by the NPCR Registry Plus development team. Each task is the direct result of meetings with the Registry Plus User Group (RPUG) as well as requests from individual cancer registries and leaders in the cancer registry field. To obtain a copy of the current Link Plus Development Priority List, please contact firstname.lastname@example.org.
- Centers for Disease Control and Prevention
Division of Cancer Prevention and Control
c/o CDC Warehouse
3719 N Peachtree Rd
Building 100 MS F-76
Chamblee GA 30341
TTY: (888) 232-6348
- Contact CDC-INFO