Figure 2: Sources and Fixes for Geocoding Errors
Potential Source of Error Approach or Solution to Resolve
Addresses may be duplicated. More than one “Main Street” may exist in a state. Use city names or ZIP codes to refine the possible area for address matching.
Addresses may be misspelled or inaccurately represented. For example, “Maine Street” would not match “Main Street.” Software can be adjusted to overlook these minor differences in spelling (although it creates a risk when these really are different streets).
Address files may include post office boxes rather than addresses. Prefixes or suffixes may be missing altogether. Institutional names (e.g., a nursing home) or building numbers (e.g., apartment numbers) may not be included. No easy fix. Need to find or generate an actual address. Adoption of an address standard that requires that certain fields be filled in could also assist.
“Northwest” in the address file will not match “NW” in the road database. Develop an “alias table” where the software is told that “Northwest” and “NW” mean the same thing.
Road databases may not be geographically accurate. The accuracy depends primarily on how the road data were collected (e.g., via GPS, digitized from a map, hand drawn). GPS tends to produce the most accurate geographic coordinates. Small-scale maps (of a state or the nation) are much less accurate geographically than larger scale maps (of a neighborhood or city). The analysis being conducted determines the geographic accuracy needed.
Roads may be missing (e.g., new subdivisions).
Determine the currency of data and review metadata. Examine recent aerial photographs to identify missing features.
Road databases may have incorrect attributes. Street names may not be accurately encoded in the road database (missing or misspelled). Rural route addresses are not typically included in road databases. Clean up the road database to meet the needs of the analysis.
ZIP code boundaries can change frequently. Know the dates of both the address files and road databases and ensure they are appropriate timeframes for geocoding.
Geocoding against address ranges can introduce positional errors because the software assumes equal distribution of addresses on a block. This can be an issue in rural areas, where residences are not evenly distributed, or in urban areas that have significantly different lot sizes on a block. Encoding exact addresses via GPS is one solution.
Geocoding software is based on proprietary approaches using various assumptions to solve address or matching problems. The approaches are not all the same, meaning that different coordinates may result when address files are geocoded with different software packages. Know the vendor and the assumptions being made (algorithms being used) in the software.
close this window