On This Page
Data Sources and Methodology for County-Level Estimates of Diagnosed Diabetes and Selected Risk Factors
- What method was used to create county-level estimates?
- Are the same types of data available for all years?
- Can I download the map data of county-level estimates?
- How do I access the various types of maps available?
- What factors do I need to choose to display a map?
- How can I use the maps to look at trends?
- How do I interpret the different colors in the maps of county-level estimates?
- Can I use the county maps and estimates to make comparisons or rank counties?
- How were ranks created for the data?
- How can we use the county ranks?
- How can we map the county ranks?
The model specification for incidence is given in Barker et al.5
Excel files with county estimates for the entire nation and for each state are available for downloading. Click on the Download Data button then select an indicator. Next, you will select either the nation, which contains data for all the states, or individual state data that you want to download. The files are saved in xml format but can be easily opened and viewed in Excel. If you wish to import the data into statistical software, you will need to save the xml file as an xls file in Excel.
Both national-level and state-level views are available for county estimates. When you open the county data report the national map will be displayed. For state-level view, you can click on the "Select State" button to select a state which will be zoomed into display. To remove, the selected state click on the Deselect State button. To return to the national map, click on the "Zoom Out" button. To select an indicator, click on the "Indicator" button, which will display all available indicators, and chose one indicator to be displayed.
You will select the "Indicator" button then click on an indicator. Next, you will select the data type (percentae, age-adjusted percentage, rate, age-adjusted rate) then the year. To change the data classification you click on the "Legend Settings" button. You may change the number of classes. You can select a minimum of 2 classes to a maximum of 10 classes. You can change the data classifiers to include equal interval, quantile, natural break, continuous. The quartile cut-offs may differ from those presented elsewhere in the Data & Trends Web site because of different software used for data classification. To return to the original map settings, return to the "Indicator" button and select indicator, data type, and year.
For the national map, you click on the play button located on the time slider above the map to view trends over time for the nation. For a state map, you select a state, click on the play button on the time slider and it will display trends at the state level.
Colors used in the shaded area maps represent the different levels of the scale. The lighter color represents the lowest level of the scale whereas the darker color represents the highest level of the scale.
Caution should be exercised in making comparisons based on the county maps and estimates. The estimates are intended as individual point estimates. Significance testing or hypothesis testing may be inappropriate. The maps are presented for displaying possible geographic patterns and stimulating further investigation, but are not intended as formal representations of similarities and differences.
Bayesian 95% confidence intervals and standard deviations are provided as precision indicators of the individual county-level point estimates and should be used in data analyses.
One should not assume that counties mapped in different colors have significantly different prevalence. The county estimates are grouped in categories by various methods to produce a state or national map. This grouping does not incorporate the standard deviation or confidence interval and does not imply any formal comparison between counties.
Ranks for county-level data of diagnosed diabetes and selected risk factors were based on age-adjusted prevalence rates. Models were fit using a Bayesian simulation method known as Markov Chain Monte Carlo.3-5 As part of the model fitting process we generated and saved 2,000 draws from the distribution of each county's age-adjusted prevalence rate. For each of these draws we sorted the counties by prevalence and saved the counties' ranks. This gave us 2,000 draws from the distribution of each county's rank. We then used the median for the rank estimate and the 5th and 95th percentiles for a 90% confidence interval. Note that ranks for Puerto Rico were not included with the national dataset because Puerto Rico ranks were not generated using the national data. Ranks for Puerto Rico are specific to that territory.
A county's rank is a reflection of relative burden. The associated confidence interval quantifies the uncertainty associated with a county's rank and determines the extent to which conclusions may be based on ranks. For example, if a county's rank confidence interval is entirely below 1571, which is the median rank for all counties, we could confidently place that county in the lower half of counties.
For each indicator, confidence intervals of counties' ranks were used to identify counties that were either below the median rank for all counties or above the median rank for all counties. You can obtain the maps showing counties above and below the median rank by selecting either the County Ranks report or by clicking on the "Low-High Rank Maps" button in the County Data report. State-level maps are not available for ranks because the counties' ranks are based on the national estimates. For more information about mapping county ranks, see the related Morbidity and Mortality Weekly Report.
Methodology for Mapping County-Level Estimates of Diagnosed Diabetes and Selected Risk Factors
The maps were created by merging the modeled estimates in database format, with geographic boundary files, called shapefiles. In this manner, the statistical data in the database were spatially referenced with their associated state and county boundaries. As a result, the data can be viewed as a map and the user can interactively map the geospatially-based data. The Albers Equal-Area (Continental United States) projection was used for the national maps and the NAD 1983 UTM Zone 14N map projection was used for the state maps.
What color sequences were used for the maps?
Color schemes were chosen based on the number of data classes or categories, the types of data being mapped (e.g., number of adults versus percentage of adults), consideration of the display devices to be used for the resulting maps, and the need to avoid colors that cannot be differentiated by individuals with impaired color-vision.6 The color schemes for the maps were selected by referring to ColorBrewer (http://www.colorbrewer.org), an online tool for selecting color schemes.
5. Barker LE, Thompson TJ, Kirtland KA, Boyle JP, Geiss LS, McCauley MM, Albright AL (2013). Bayesian Small Area Estimates of Diabetes Incidence by United States County, 2009. Journal of Data Science 11:249–269.
6. Malec D, Sedransk J, Moriarity CL, LeClere FB. Small area inference for binary variables in the National Health Interview Survey. Journal of the American Statistical Association 1997;92(439):815-826.
7. Brewer, CA. Basic mapping principles for visualizing cancer data using geographic information systems (GIS). American Journal of Preventive Medicine, 2006;30(2S):S25–S36.