##### Contact Us:

**Division of Scientific Education and Professional Development**

1600 Clifton Rd

Mailstop E-92

Atlanta, GA 30333

Contact DSEPD- 800-CDC-INFO

(800-232-4636)

TTY: (888) 232-6348 - Contact CDC–INFO

# Lesson 2: Summarizing Data

## Section 6: Measures of Central Location

A measure of central location provides a single value that summarizes an entire distribution of data. Suppose you had data from an outbreak of gastroenteritis affecting 41 persons who had recently attended a wedding. If your supervisor asked you to describe the ages of the affected persons, you could simply list the ages of each person. Alternatively, your supervisor might prefer one summary number — a measure of **central location**. Saying that the mean (or average) age was 48 years rather than reciting 41 ages is certainly more efficient, and most likely more meaningful.

Measures of central location include the **mode**, **median**, **arithmetic mean**, **midrange**, and **geometric mean**. Selecting the best measure to use for a given distribution depends largely on two factors:

- The
**shape or skewness**of the distribution, and - The intended
**use**of the measure.

Each measure — what it is, how to calculate it, and when best to use it — is described in this section.

### Mode

#### Definition of mode

The mode is the value that occurs most often in a set of data. It can be determined simply by tallying the number of times each value occurs. Consider, for example, the number of doses of diphtheria-pertussis-tetanus (DPT) vaccine each of seventeen 2-year-old children in a particular village received:

0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4

Two children received no doses; two children received 1 dose; three received 2 doses; six received 3 doses; and four received all 4 doses. Therefore, the mode is 3 doses, because more children received 3 doses than any other number of doses.

#### Method for identifying the mode

**Step 1.**Arrange the observations into a frequency distribution, indicating the values of the variable and the frequency with which each value occurs. (Alternatively, for a data set with only a few values, arrange the actual values in ascending order, as was done with the DPT vaccine doses above.)**Step 2.**Identify the value that occurs most often.

#### EXAMPLES: Identifying the Mode

Example A: Table 2.8 (below) provides data from 30 patients who were hospitalized and received antibiotics. For the variable "length of stay" (LOS) in the hospital, identify the mode.

**Step 1.**Arrange the data in a frequency distribution.LOS Frequency 0 1 1 0 2 1 3 1 4 1 5 2 6 1 7 1 8 1 9 3 LOS Frequency 10 5 11 1 12 3 13 1 14 1 15 0 16 1 17 0 18 2 19 1 LOS Frequency 20 0 21 0 22 1 . 0 . 0 27 1 . 0 . 0 49 1 Alternatively, arrange the values in ascending order.

0, 2, 3, 4, 5, 5, 6, 7, 8, 9,

9, 9, 10, 10, 10, 10, 10, 11, 12, 12,

12, 13, 14, 16, 18, 18, 19, 22, 27, 49**Step 2.**Identify the value that occurs most often.Most values appear once, but the distribution includes two 5s, three 9s, five 10s, three 12s, and two 18s.

Because 10 appears most frequently, the mode is 10.

Example B: Find the mode of the following incubation periods for hepatitis A: 27, 31, 15, 30, and 22 days.

**Step 1.**Arrange the values in ascending order.

15, 22, 27, 30, and 31 days**Step 2.**Identify the value that occurs most often.

None

**Note:** When no value occurs more than once, the distribution is said to have no mode.

Example C: Find the mode of the following incubation periods for Bacillus cereus food poisoning:

**Step 1.**Arrange the values in ascending order.

Done**Step 2.**Identify the values that occur most often.

Five 3s and five 12s

Example C illustrates the fact that a frequency distribution can have more than one mode. When this occurs, the distribution is said to be **bi-modal**. Indeed, *Bacillus cereus* is known to cause two syndromes with different incubation periods: a short-incubation-period (1–6 hours) syndrome characterized by vomiting; and a long-incubation-period (6–24 hours) syndrome characterized by diarrhea.

Table 2.8 Sample Data from the Northeast Consortium Vancomycin Quality Improvement Project

ID | Admission Date | Discharge Date | LOS | DOB (mm/dd) | DOB (year) | Age | Sex | ESRD | No. Days Vancomycin | Vancomycin OK? |
---|---|---|---|---|---|---|---|---|---|---|

1 | 1/01 | 1/10 | 9 | 11/18 | 1928 | 66 | M | Y | 3 | N |

2 | 1/08 | 1/30 | 22 | 01/21 | 1916 | 78 | F | N | 10 | Y |

3 | 1/16 | 3/06 | 49 | 04/22 | 1920 | 74 | F | N | 32 | Y |

4 | 1/23 | 2/04 | 12 | 05/14 | 1919 | 75 | M | N | 5 | Y |

5 | 1/24 | 2/01 | 8 | 08/17 | 1929 | 65 | M | N | 4 | N |

6 | 1/27 | 2/14 | 18 | 01/11 | 1918 | 77 | M | N | 6 | Y |

7 | 2/06 | 2/16 | 10 | 01/09 | 1920 | 75 | F | N | 2 | Y |

8 | 2/12 | 2/22 | 10 | 06/12 | 1927 | 67 | M | N | 1 | N |

9 | 2/22 | 3/04 | 10 | 05/09 | 1915 | 79 | M | N | 8 | N |

10 | 2/22 | 3/08 | 14 | 04/09 | 1920 | 74 | F | N | 10 | N |

11 | 2/25 | 3/04 | 7 | 07/28 | 1915 | 79 | F | N | 4 | N |

12 | 3/02 | 3/14 | 12 | 04/24 | 1928 | 66 | F | N | 8 | N |

13 | 3/11 | 3/17 | 6 | 11/09 | 1925 | 69 | M | N | 3 | N |

14 | 3/18 | 3/23 | 5 | 04/08 | 1924 | 70 | F | N | 2 | N |

15 | 3/19 | 3/28 | 9 | 09/13 | 1915 | 79 | F | N | 1 | Y |

16 | 3/27 | 4/01 | 5 | 01/28 | 1912 | 83 | F | N | 4 | Y |

17 | 3/31 | 4/02 | 2 | 03/14 | 1921 | 74 | M | N | 2 | Y |

18 | 4/12 | 4/24 | 12 | 02/07 | 1927 | 68 | F | N | 3 | N |

19 | 4/17 | 5/06 | 19 | 03/04 | 1921 | 74 | F | N | 11 | Y |

20 | 4/29 | 5/26 | 27 | 02/23 | 1921 | 74 | F | N | 14 | N |

21 | 5/11 | 5/15 | 4 | 05/05 | 1923 | 72 | M | N | 4 | Y |

22 | 5/14 | 5/14 | 0 | 01/03 | 1911 | 84 | F | N | 1 | N |

23 | 5/20 | 5/30 | 10 | 11/11 | 1922 | 72 | F | N | 9 | Y |

24 | 5/21 | 6/08 | 18 | 08/08 | 1912 | 82 | M | N | 14 | Y |

25 | 5/26 | 6/05 | 10 | 09/28 | 1924 | 70 | M | Y | 5 | N |

26 | 5/27 | 5/30 | 3 | 05/14 | 1899 | 96 | F | N | 2 | N |

27 | 5/28 | 6/06 | 9 | 07/22 | 1921 | 73 | M | N | 1 | Y |

28 | 6/07 | 6/20 | 13 | 12/30 | 1896 | 98 | F | N | 3 | N |

29 | 6/07 | 6/23 | 16 | 08/31 | 1906 | 88 | M | N | 1 | N |

30 | 6/16 | 6/27 | 11 | 07/07 | 1917 | 77 | F | N | 7 | Y |

#### Properties and uses of the mode

To identify the mode from a data set in Analysis Module:

Epi Info does not have a Mode command. Thus, the best way to identify the mode is to create a histogram and look for the tallest column(s).

Select **graphs**, then choose histogram under **Graph Type.**

The tallest column(s) is(are) the mode(s).

*NOTE: The Means command provides a mode, but only the lowest value if a distribution has more than one mode.*

The mode is the easiest measure of central location to understand and explain. It is also the easiest to identify, and requires no calculations.

- The mode is the preferred measure of central location for addressing which value is the most popular or the most common. For example, the mode is used to describe which day of the week people most prefer to come to the influenza vaccination clinic, or the "typical" number of doses of DPT the children in a particular community have received by their second birthday.
- As demonstrated, a distribution can have a single mode. However, a distribution has more than one mode if two or more values tie as the most frequent values. It has no mode if no value appears more than once.
- The mode is used almost exclusively as a "descriptive" measure. It is almost never used in statistical manipulations or analyses.
- The mode is not typically affected by one or two extreme values (outliers).

### Exercise 2.3

Using the same vaccination data as in Exercise 2.2, find the mode. (If you answered Exercise 2.2, find the mode from your frequency distribution.)

2, 0, 3, 1, 0, 1, 2, 2, 4, 8, 1, 3, 3, 12, 1, 6, 2, 5, 1

### Median

To identify the median from a data set in Analysis Module:

Click on the **Means** command under the Statistics folder.

In the **Means Of** drop-down box, select the variable of interest

Select **Variable **

Click **OK**

You should see the list of the frequency by the variable you selected. Scroll down until you see the Median among other data.

#### Definition of median

The median is the middle value of a set of data that has been put into rank order. Similar to the median on a highway that divides the road in two, the statistical median is the value that divides the data into two halves, with one half of the observations being smaller than the median value and the other half being larger. The median is also the 50^{th} percentile of the distribution. Suppose you had the following ages in years for patients with a particular illness:

4, 23, 28, 31, 32

The median age is 28 years, because it is the middle value, with two values smaller than 28 and two values larger than 28.

#### Method for identifying the median

**Step 1.**Arrange the observations into increasing or decreasing order.**Step 2.**Find the middle position of the distribution by using the following formula:

Middle position = (n + 1) ⁄ 2- If the number of observations (n) is
**odd**, the middle position falls on a single observation. - If the number of observations is
**even**, the middle position falls between two observations.

- If the number of observations (n) is
**Step 3.**Identify the value at the middle position.- If the number of observations (n) is
**odd**and the middle position falls on a single observation, the median equals the value of that observation. - If the number of observations is
**even**and the middle position falls between two observations, the median equals the average of the two values.

- If the number of observations (n) is

#### EXAMPLES: Identifying the Median

Example A: Odd Number of Observations

Find the median of the following incubation periods for hepatitis A: 27, 31, 15, 30, and 22 days.

**Step 1.**Arrange the values in ascending order.

15, 22, 27, 30, and 31 days**Step 2.**Find the middle position of the distribution by using (n + 1) ⁄ 2.

Middle position = (5 + 1) ⁄ 2 = 6 ⁄ 2 = 3

Therefore, the median will be the value at the third observation.**Step 3.**Identify the value at the middle position.

Third observation = 27 days

**Example B: Even Number of Observations**

Suppose a sixth case of hepatitis was reported. Now find the median of the following incubation periods for hepatitis A: 27, 31, 15, 30, 22 and 29 days.

**Step 1.**Arrange the values in ascending order.

15, 22, 27, 29, 30, and 31 days**Step 2.**Find the middle position of the distribution by using (n + 1) ⁄ 2.

Middle location = 6 + 1 ⁄ 2 = 7 ⁄ 2 = 3½

Therefore, the median will be a value halfway between the values of the third and fourth observations.**Step 3.**Identify the value at the middle position.

The median equals the average of the values of the third (value = 27) and fourth (value = 29) observations:

Median = (27 + 29) ⁄ 2 = 28 days

#### Epi Info Demonstration: Finding the Median

Select Analyze Data.

#### Properties and uses of the median

- The median is a good descriptive measure, particularly for data that are skewed, because it is the central point of the distribution.
- The median is relatively easy to identify. It is equal to either a single observed value (if odd number of observations) or the average of two observed values (if even number of observations).
- The median, like the mode, is not generally affected by one or two extreme values (outliers). For example, if the values on the previous page had been 4, 23, 28, 31, and 131 (instead of 31), the median would still be 28.
- The median has less-than-ideal statistical properties. Therefore, it is not often used in statistical manipulations and analyses.

### Exercise 2.4

Determine the median for the same vaccination data used in Exercises 2.2. and 2.3.

2, 0, 3, 1, 0, 1, 2, 2, 4, 8, 1, 3, 3, 12, 1, 6, 2, 5, 1

### Arithmetic mean

#### Definition of mean

The arithmetic mean is a more technical name for what is more commonly called the **mean** or **average**. The arithmetic mean is the value that is closest to all the other values in a distribution.

#### Method for calculating the mean

**Step 1.**Add all of the observed values in the distribution.**Step 2.**Divide the sum by the number of observations.

#### EXAMPLE: Finding the Mean

Find the mean of the following incubation periods for hepatitis A: 27, 31, 15, 30, and 22 days.

**Step 1.**Add all of the observed values in the distribution.

27 + 31 + 15 + 30 + 22 = 125**Step 2.**Divide the sum by the number of observations.

125 ⁄ 5 = 25.0

Therefore, the mean incubation period is 25.0 days.

To identify the mean from a data set in Analysis Module:

Click on the **Means **command under the Statistics folder

In the **Means Of** drop-down box, select the variable of interest

→ Select **Variable**

Click **OK**

→ You should see the list of the frequency by the variable you selected. Scroll down until you see the Mean among other data.

#### Properties and uses of the arithmetic mean

- The mean has excellent statistical properties and is commonly used in additional statistical manipulations and analyses. One such property is called the
*centering property of the mean*. When the mean is subtracted from each observation in the data set, the sum of these differences is zero (i.e., the negative sum is equal to the positive sum). For the data in the previous hepatitis A example:

Value minus Mean | Difference |
---|---|

15 − 25.0 |
−10.0 |

22 − 25.0 |
−3.0 |

27 − 25.0 |
+ 2.0 |

30 − 25.0 |
+ 5.0 |

31 − 25.0 |
+ 6.0 |

125 − 125.0 = 0 |
+ 13.0 − 13.0 = 0 |

**Mean: the center of gravity of the distribution**

This demonstrates that the mean is the arithmetic center of the distribution.

- Because of this centering property, the mean is sometimes called the
**center of gravity**of a frequency distribution. If the frequency distribution is plotted on a graph, and the graph is balanced on a fulcrum, the point at which the distribution would balance would be the mean. - The arithmetic mean is the best descriptive measure for data that are normally distributed.
- On the other hand, the mean is not the measure of choice for data that are severely skewed or have extreme values in one direction or another. Because the arithmetic mean uses all of the observations in the distribution, it is affected by any extreme value. Suppose that the last value in the previous distribution was 131 instead of 31. The mean would be 225 ⁄ 5 = 45.0 rather than 25.0. As a result of one extremely large value, the mean is much larger than all values in the distribution except the extreme value (the "outlier").

#### Epi Info Demonstration: Creating a Frequency Distribution

Select Analyzing Data.

*weight < 770*, or select weight from available values, then type < 750, and click on OK.

### Exercise 2.5

Determine the mean for the same set of vaccination data.

2, 0, 3, 1, 0, 1, 2, 2, 4, 8, 1, 3, 3, 12, 1, 6, 2, 5, 1

### The midrange (midpoint of an interval)

#### Definition of midrange

The midrange is the half-way point or the midpoint of a set of observations. The midrange is usually calculated as an intermediate step in determining other measures.

#### Method for identifying the midrange

**Step 1.**Identify the smallest (minimum) observation and the largest (maximum) observation.**Step 2.**Add the minimum plus the maximum, then divide by two.

*Exception:**Age differs from most other variables because age does not follow the usual rules for rounding to the nearest integer. *Someone who is 17 years and 360 days old cannot claim to be 18 year old for at least 5 more days. Thus, to identify the midrange for age (in years) data, you must add the smallest (minimum) observation plus the largest (maximum) observation plus 1, then divide by two.

Midrange (most types of data) = (minimum + maximum) ⁄ 2

Midrange (age data) = (minimum + maximum + 1) ⁄ 2

Consider the following example:

In a particular pre-school, children are assigned to rooms on the basis of age on September 1. Room 2 holds all of the children who were at least 2 years old but not yet 3 years old as of September 1. In other words, every child in room 2 was 2 years old on September 1. What is the midrange of ages of the children in room 2 on September 1?

For descriptive purposes, a reasonable answer is 2. However, recall that the midrange is usually calculated as an intermediate step in other calculations. Therefore, more precision is necessary.

Consider that children born in August have just turned 2 years old. Others, born in September the previous year, are almost but not quite 3 years old. Ignoring seasonal trends in births and assuming a very large room of children, birthdays are expected to be uniformly distributed throughout the year. The youngest child, born on September 1, is exactly 2.000 years old. The oldest child, whose birthday is September 2 of the previous year, is 2.997 years old. For statistical purposes, the mean and midrange of this theoretical group of 2-year-olds are both 2.5 years.

#### Properties and uses of the midrange

- The midrange is not commonly reported as a measure of central location.
- The midrange is more commonly used as an intermediate step in other calculations, or for plotting graphs of data collected in intervals.

#### EXAMPLES: Identifying the Midrange

**Example A:** Find the midrange of the following incubation periods for hepatitis A: 27, 31, 15, 30, and 22 days.

**Step 1.**Identify the minimum and maximum values.

Minimum = 15, maximum = 31**Step 2.**Add the minimum plus the maximum, then divide by two.

Midrange = 15 + 31 ⁄ 2 = 46 ⁄ 2 = 23 days

**Example B:** Find the midrange of the grouping 15–24 (e.g., number of alcoholic beverages consumed in one week).

**Step 1.**Identify the minimum and maximum values.

Minimum = 15, maximum = 24**Step 2**. Add the minimum plus the maximum, then divide by two.

Midrange = 15 + 24 ⁄ 2 = 39 ⁄ 2 = 19.5

This calculation assumes that the grouping 15–24 really covers 14.50–24.49…. Since the midrange of 14.50–24.49… = 19.49…, the midrange can be reported as 19.5.

**Example C:** Find the midrange of the age group 15–24 years.

**Step 1.**Identify the minimum and maximum values.

Minimum = 15, maximum = 24**Step 2 .**Add the minimum plus the maximum plus 1, then divide by two.

Midrange = (15 + 24 + 1) ⁄ 2 = 40 ⁄ 2 = 20 years

Age differs from the majority of other variables because age does not follow the usual rules for rounding to the nearest integer. For most variables, 15.99 can be rounded to 16. However, an adolescent who is 15 years and 360 days old cannot claim to be 16 years old (and hence get his driver's license or learner's permit) for at least 5 more days. Thus, the interval of 15–24 years really spans 15.0–24.99… years. The midrange of 15.0 and 24.99… = 19.99… = 20.0 years.

### Geometric mean

#### Definition of geometric mean

The geometric mean is the mean or average of a set of data measured on a logarithmic scale. The geometric mean is used when the logarithms of the observations are distributed normally (symmetrically) rather than the observations themselves. The geometric mean is particularly useful in the laboratory for data from serial dilution assays (1/2, 1/4, 1/8, 1/16, etc.) and in environmental sampling data.

#### More About Logarithms

A logarithm is the power to which a base is raised.

To what power would you need to raise a base of 10 to get a value of 100?

Because 10 times 10 or 10^{2} equals 100, the log of 100 at base 10 equals 2. Similarly, the log of 16 at base 2 equals 4, because 24 = 2 × 2 × 2 × 2 = 16.

^{0}= 1 (anything raised to the 0 power is 1)

2

^{1}= 2 = 2

2

^{2}= 2 × 2 = 4

2

^{3}= 2 × 2 × 2 = 8

2

^{4}= 2 × 2 × 2 × 2 = 16

2

^{5}= 2 × 2 × 2 × 2 × 2 = 32

2

^{6}= 2 × 2 × 2 × 2 × 2 × 2 = 64

2

^{7}= 2 × 2 × 2 × 2 × 2 × 2 × 2 = 128

and so on.

^{0}= 1 (Anything raised to the 0 power equals 1)

10

^{1}= 10

10

^{2}= 100

10

^{3}= 1,000

10

^{4}= 10,000

10

^{5}= 100,000

10

^{6}= 1,000,000

10

^{7}= 10,000,000

and so on.

An antilog raises the base to the power (logarithm). For example, the antilog of 2 at base 10 is 102, or 100. The antilog of 4 at base 2 is 24, or 16. The majority of titers are reported as multiples of 2 (e.g., 2, 4, 8, etc.); therefore, base 2 is typically used when dealing with titers.

#### Method for calculating the geometric mean

There are two methods for calculating the geometric mean.

Method A

**Step 1.**Take the logarithm of each value.**Step 2.**Calculate the mean of the log values by summing the log values, then dividing by the number of observations.**Step 3.**Take the antilog of the mean of the log values to get the geometric mean.

Method B

**Step 1.**Calculate the product of the values by multiplying all of the values together.**Step 2.**Take the n^{th}root of the product (where n is the number of observations) to get the geometric mean.

#### EXAMPLES: Calculating the Geometric Mean

Example A: Using Method A

Calculate the geometric mean from the following set of data.

10, 10, 100, 100, 100, 100, 10,000, 100,000, 100,000, 1,000,000

Because these values are all multiples of 10, it makes sense to use logs of base 10.

**Step 1.**Take the log (in this case, to base 10) of each value.

log10(x_{i}) = 1, 1, 2, 2, 2, 2, 4, 5, 5, 6**Step 2.**Calculate the mean of the log values by summing and dividing by the number of observations (in this case, 10).

Mean of log10(x_{i}= (1+1+2+2+2+2+4+5+5+6) ⁄ 10 = 30 ⁄ 10 = 3**Step 3.**Take the antilog of the mean of the log values to get the geometric mean.

Antilog10(3) = 10The geometric mean of the set of data is 1,000.^{3}= 1,000.

Example B: Using Method B

Calculate the geometric mean from the following 95% confidence intervals of an odds ratio: 1.0, 9.0

**Step 1.**Calculate the product of the values by multiplying all values together.

1.0 × 9.0 = 9.0**Step 2.**Take the square root of the product.

The geometric mean = square root of 9.0 = 3.0.

### Properties and uses of the geometric mean

**Scientific Calculator Tip**

On most scientific calculators, the sequence for calculating a geometric mean is:

- Enter a data point.
- Press either the <Log> or <Ln> function key.
- Record the result or store it in memory.
- Repeat for all values.
- Calculate the mean or average of these log values.
- Calculate the antilog value of this mean (<10
^{x}> key if you used <Log> key, <e^{x}> key if you used <Ln> key).

Practice: Find the geometric mean of 10, 100 and 1000 using a scientific calculator.

Enter: | Calculator Displays: |
---|---|

10 | 10 |

LOG | 1 |

+ | 1 |

100 | 100 |

LOG | 2 |

+ | 3 |

1000 | 1000 |

LOG | 3 |

= | 6 |

3 | 3 |

= | 2 |

10x | 100 |

The geometric mean is the average of logarithmic values, converted back to the base. The geometric mean tends to dampen the effect of extreme values and is always smaller than the corresponding arithmetic mean. In that sense, the geometric mean is less sensitive than the arithmetic mean to one or a few extreme values.

- The geometric mean is the measure of choice for variables measured on an exponential or logarithmic scale, such as dilutional titers or assays.
- The geometric mean is often used for environmental samples, when levels can range over several orders of magnitude. For example, levels of coliforms in samples taken from a body of water can range from less than 100 to more than 100,000.

### Exercise 2.6

Using the dilution titers shown below, calculate the geometric mean titer of convalescent antibodies against tularemia among 10 residents of Martha's Vineyard. [Hint: Use only the second number in the ratio, i.e., for 1:640, use 640.]

ID # | Acute | Convalescent |
---|---|---|

1 | 1:16 | 1:512 |

2 | 1:16 | 1:512 |

3 | 1:32 | 1:128 |

4 | not done | 1:512 |

5 | 1:32 | 1:1024 |

6 | "negative" | 1:1024 |

7 | 1:256 | 1:2048 |

8 | 1:32 | 1:128 |

9 | "negative" | 1:4096 |

10 | 1:16 | 1:1024 |

### Selecting the appropriate measure

Measures of central location are single values that summarize the observed values of a distribution. The mode provides the most common value, the median provides the central value, the arithmetic mean provides the average value, the midrange provides the midpoint value, and the geometric mean provides the logarithmic average.

The mode and median are useful as descriptive measures. However, they are not often used for further statistical manipulations. In contrast, the mean is not only a good descriptive measure, but it also has good statistical properties. The mean is used most often in additional statistical manipulations.

While the arithmetic mean is the measure of choice when data are normally distributed, the median is the measure of choice for data that are not normally distributed. Because epidemiologic data tend not to be normally distributed (incubation periods, doses, ages of patients), the median is often preferred. The geometric mean is used most commonly with laboratory data, particularly dilution titers or assays and environmental sampling data.

The arithmetic mean uses all the data, which makes it sensitive to outliers. Although the geometric mean also uses all the data, it is not as sensitive to outliers as the arithmetic mean. The midrange, which is based on the minimum and maximum values, is more sensitive to outliers than any other measures. The mode and median tend not to be affected by outliers.

In summary, each measure of central location — mode, median, mean, midrange, and geometric mean — is a single value that is used to represent all of the observed values of a distribution. Each measure has its advantages and limitations. The selection of the most appropriate measure requires judgment based on the characteristics of the data (e.g., normally distributed or skewed, with or without outliers, arithmetic or log scale) and the reason for calculating the measure (e.g., for descriptive or analytic purposes).

### Exercise 2.7

For each of the variables listed below from the line listing in Table 2.9, identify which measure of central location is best for representing the data.

- Mode
- Median
- Mean
- Geometric mean
- No measure of central location is appropriate

- ____ Year of diagnosis
- ____ Age (years)
- ____ Sex
- ____ Highest IFA titer
- ____ Platelets × 10
^{6}/L - ____ White blood cell count × 10
^{9}/L

Table 2.9 Line Listing for 12 Patients with Human Monocytotropic Ehrlichiosis — Missouri, 1998–1999

Patient ID | Year of Diagnosis | Age (years) | Sex | Highest IFA* Titer | Platelets × 10^{6}/L |
White Blood Cell Count × 10^{9}/L |
---|---|---|---|---|---|---|

01 | 1999 | 44 | M | 1:1024 | 90 | 1.9 |

02 | 1999 | 42 | M | 1:512 | 114 | 3.5 |

03 | 1999 | 63 | M | 1:2048 | 83 | 6.4 |

04 | 1999 | 53 | F | 1:512 | 180 | 4.5 |

05 | 1999 | 77 | M | 1:1024 | 44 | 3.5 |

06 | 1999 | 43 | F | 1:512 | 89 | 1.9 |

10 | 1998 | 22 | F | 1:128 | 142 | 2.1 |

11 | 1998 | 59 | M | 1:256 | 229 | 8.8 |

12 | 1998 | 67 | M | 1:512 | 36 | 4.2 |

14 | 1998 | 49 | F | 1:4096 | 271 | 2.6 |

15 | 1998 | 65 | M | 1:1024 | 207 | 4.3 |

18 | 1998 | 27 | M | 1:64 | 246 | 8.5 |

Measure of Central Location |
Year of Diagnosis | Age (years) | Sex | Highest IFA* Titer | Platelets × 106/L | White Blood Cell Count × 109/L |
---|---|---|---|---|---|---|

Mean: | 1998.5 | 50.92 | na | 1:976.00 | 144.25 | 4.35 |

Median: | 1998.5 | 51 | na | 1:512 | 128 | 3.85 |

Geometric Mean: | 1998.5 | 48.08 | na | 1:574.70 | 120.84 | 3.81 |

Mode: | none | none | M | 1:512 | none | 1.9, 3.5 |

* Immunofluorescence assay

Data Source: Olano JP, Masters E, Hogrefe W, Walker DH. Human monocytotropic ehrlichiosis, Missouri. Emerg Infect Dis 2003;9:1579-86.