Classifying Data
- Range is determined by subtracting the lowest value from the highest; then the range is divided by the desired number of classes, usually 4 or 5, to determine the beginning and end for each class.
- The major advantage of the equal-interval classification is that it may be easy for many map users to interpret; it is best when applied to data ranges such as percentages and temps.
- Each class contains the same number of observations (or geographic units); so with quintiles, 1/5 of the observations will be in each group; with quartiles, you have 4 classes with the same number of observations in each.
- Works well when you want to show top 25% or top 20% of population, regardless of break points.
- Mean is computed and established as the center of the distribution. Class intervals are determined by the standard deviation, a measure that determines the spread of the data around the mean.
- Works best with a normal distribution (bell-shaped curve).
- Where there are gaps in the distribution (i.e., few or no observations).
- This is ArcGIS’ default classification scheme.
- The data distribution is explicitly considered; this is the major advantage.
- The major disadvantage is that the concept behind the classification may not be easily understood by all map users, and the legend values for the class breaks (e.g., the data ranges) may not be intuitive.
- Consider the purpose of the map, the data distribution (if known), and the knowledge level (i.e., mapping and statistical awareness) of the intended audience.
- Data Class Breaks range shall not exceed the range of data presented. If the data range from 44–354, then the first data class shall start with 44, not 0 (zero)
- Never have a data class break that ends with the same number that the next class break begins with. Example: 22–45, 45–77 should read 22–45, 46–77.
- For an evaluation of the use of various classification schemes in choropleth mapping see this article by Cynthia Brewer and Linda Pickle[PDF-6M].
Last Reviewed: August 16, 2017