BRFSS Maps: Methods and Frequently Asked Questions (FAQs)
Some MMSAs are geographically large, comprising many cities and counties. The circle representing an MMSA on the map is placed at the MMSA’s geographic center, or centroid, occasionally placing the circle outside the actual city for which the MMSA is named.
For example, the circle representing the Washington-Arlington-Alexandria, DC-VA-MD-WV MMSA is not located in the map for the District of Columbia, but is located in northern Virginia, because that is the centroid of this statistical area. Additionally, because the Washington-Arlington-Alexandria, DC-VA-MD-WV MMSA encompasses a different and larger area than the actual boundaries of the District itself, the prevalence data for the metropolitan division and the district itself will be different.
Several different map projections were used to present the information in BRFSS Maps.
- Maps of the Continental United States, Alaska, and Hawaii are projected to the Albers Equal-Area (Continental United States) projection.
- Puerto Rico was projected to the Albers Equal-Area (North America) projection.
- Guam was projected to the World Mercator projection.
Alaska, Hawaii, Puerto Rico, and Guam are not in the same geographic scale relative to each other, nor to the continental United States in these maps.
Each method provided in the BRFSS Maps section enables the user to choose the data classification method that they feel is most appropriate. There is no single best data classification method; each classification method has advantages and disadvantages. When creating a map, the map user should consider the purpose of the map, the data distribution (if known), and the knowledge level (i.e., mapping and statistical awareness) of the intended audience. The following are brief descriptions of the four data classification methods available to users of the SMART and BRFSS data used in the BRFSS Map application.
Equal-interval: In equal-interval classifications, the data ranges for all classes are the same. In other words, the range of the entire dataset is divided by the desired number of data classes, such that each class occupies an equal interval along the range of data values. The major advantage of the equal-interval classification is that the resulting equal intervals may be easy for many map users to interpret. The major disadvantage of the equal-interval classification is that the data distribution is not considered when determining class breaks for the intervals (only the lower and upper data values are used).
Quantiles: In quantile classifications, an equal number of observations are placed in each class. For example, if there are 50 observations, 10 observations would be placed in each class of a five-class (quintile) quantile map. The data are first rank-ordered, and then the appropriate observations are assigned to each class (class 1, class 2, class 3, etc.). The number of classes also determines the specific type of quantile map (three classes = tertile; four classes = quartile; five classes = quintile). Two major advantages of the quantile classification are that it is useful for ordinal data (because the data are rank-ordered) and that it can help facilitate map comparisons (as long as the same number of classifications is used for all maps). The major disadvantage of the quantile classification is that it does not consider how the data are distributed. Therefore, if the data have a highly skewed distribution (e.g., many outliers) this classification will force data observations into the same class (either the lowest or highest, in this case) where this may not be appropriate; as a result, the quantile classification may give a false impression that there is a relatively normal data distribution.
Standard Deviations: In standard deviations classifications, the data are assigned to classes based on where they fall relative to the mean and standard deviations of the data distribution. The major advantage of this classification method is that by using the mean as a dividing point, a contrast of values above and below the mean is readily seen. This method only works well for a dataset that is normally distributed. An even number of classes should be used, such that the mean of the data serves as the dividing point between an even number of classes above and below the mean. The major disadvantage of the standard deviations classification is that it requires a basic understanding of statistical concepts, and hence may be difficult for some map users to interpret.
Natural Breaks: In this classification method (also variously known as Optimal Breaks and Jenks’ Method), the data are assigned to classes based upon their position along the data distribution relative to all other data values. This classification uses an iterative algorithm to optimally assign data to classes such that the variances within all classes are minimized, while the variances among classes are maximized. In this manner, the data distribution is explicitly considered for determining class breaks; this is the major advantage of the Natural Breaks classification method. The major disadvantage is that the concept behind the classification may not be easily understood by all map users, and the legend values for the class breaks (e.g., the data ranges) may not be intuitive.
Color schemes were chosen based upon the number of data classes, the types of data being mapped (e.g., sequential or diverging data), consideration of the display devices to be used for the resulting maps (e.g., computer CRT monitor, computer LCD monitor, LCD projector, and print copy), and the need to avoid colors that cannot be differentiated by individuals with impaired color-vision (e.g., red-green color combinations).
The two color schemes for the BRFSS Maps were selected by consulting ColorBrewerExternal
The color scheme chosen for natural breaks, quantile, and equal interval maps is the Sequential Oranges scheme. This scheme works well with ordinal, interval, and ratio data, such as the prevalence data in the BRFSS. The color scheme chosen for the standard deviation maps is the Diverging Purple-Orange scheme. This scheme emphasizes the natural midpoint of a diverging dataset (e.g., the mean) and the diverging values from the mean (e.g., positive and negative standard deviations). The color schemes are automatically selected based upon the user’s statistical classification method selection for categorizing the data.
The following Web sites feature reference maps with background data. You can use these resources to compare other sociodemographic data to health statistics.