# Lesson 2: Summarizing Data

## Exercise Answers

1. C
2. A
3. D
4. A
5. D

### Exercise 2.2

Previous Years Frequency
0 2
1 5
2 4
3 3
4 1
5 1
6 1
7 0
8 1
9 0
10 0
11 0
12 1
Total 19

### Exercise 2.3

1. Create frequency distribution (done in Exercise 2.2, above)
2. Identify the value that occurs most often.
Most common value is 1, so mode is 1 previous vaccination.

### Exercise 2.4

1. Arrange the observations in increasing order.
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12
2. Find the middle position of the distribution with 19 observations.
Middle position = (19 + 1) ⁄ 2 = 10
3. Identify the value at the middle position.
0, 0, 1, 1, 1, 1, 1, 2, 2, *2*, 2, 3, 3, 3, 4, 5, 6, 8, 12
Counting from the left or right to the 10th position, the value is 2. So the median = 2 previous vaccinations.

### Exercise 2.5

1. Add all of the observed values in the distribution.
2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1 = 57
2. Divide the sum by the number of observations
57 ⁄ 19 = 3.0

So the mean is 3.0 previous vaccinations

### Exercise 2.6

Using Method A:

1. Take the log (in this case, to base 2) of each value.
ID # Convalescent Log base 2
1 1:512 9
2 1:512 9
3 1:128 7
4 1:512 9
5 1:1024 10
6 1:1024 10
7 1:2048 11
8 1:128 7
9 1:4096 12
10 1:1024 10
2. Calculate the mean of the log values by summing and dividing by the number of observations (10).
Mean of log2(xi) = (9 + 9 + 7 + 9 + 10 + 10 + 11 + 7 + 12 + 10) ⁄ 10 = 94 ⁄ 10 = 9.4
3. Take the antilog of the mean of the log values to get the geometric mean.
Antilog2(9.4) = 29.4 = 675.59. Therefore, the geometric mean dilution titer is 1:675.6.

### Exercise 2.7

1. E or A; equal number of patients in 1999 and 1998.
2. C or B; mean and median are very close, so either would be acceptable.
3. E or A; for a nominal variable, the most frequent category is the mode.
4. D
5. B; mean is skewed, so median is better choice.
6. B; mean is skewed, so median is better choice.

### Exercise 2.8

1. Arrange the observations in increasing order.
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12
2. Find the position of the 1st and 3rd quartiles. Note that the distribution has 19 observations.
Position of Q1 = (n + 1) ⁄ 4 = (19 + 1) ⁄ 4 = 5
Position of Q3 = 3(n + 1) ⁄ 4 = 3(19 + 1) ⁄ 4 = 15
3. Identify the value of the 1st and 3rd quartiles.
Value at Q1 (position 5) = 1
Value at Q3 (position 15) = 4
4. Calculate the interquartile range as Q3 minus Q1.
Interquartile range = 4 − 1 = 3
5. The median (at position 10) is 2. Note that the distance between Q1 and the median is 2 − 1 = 1. The distance between Q3 and the median is 4 − 2 = 2. This indicates that the vaccination data is skewed slightly to the right (tail points to greater number of previous vaccinations).

### Exercise 2.9

1. Calculate the arithmetic mean.
Mean = (2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1) ⁄ 19
= 57 ⁄ 19
= 3.0
2. Subtract the mean from each observation. Square the difference.
3. Sum the squared differences.
Value Minus Mean Difference Difference Squared
2 − 3.0 −1.0 1.0
0 − 3.0 −3.0 9.0
3 − 3.0 0.0 0.0
1 − 3.0 −2.0 4.0
0 − 3.0 − 3.0 9.0
1 − 3.0 −2.0 4.0
2 − 3.0 −1.0 1.0
2 − 3.0 −1.0 1.0
4 − 3.0 1.0 1.0
8 −3.0 5.0 25.0
1 − 3.0 −2.0 4.0
57 − 57.0 = 0 0.0 162.0
4. Divide the sum of the squared differences by n − 1.
Variance = 162 ⁄ (19 − 1) = 162 ⁄ 18 = 9.0 previous vaccinations squared
5. Take the square root of the variance. This is the standard deviation.
Standard deviation = 9.0 = 3.0 previous vaccinations

### Exercise 2.10

Standard error of the mean = 42 divided by the square root of 4,462 = 0.629

### Exercise 2.11

1. Summarize the blood level data with a frequency distribution.
Table 2.14 Frequency Distribution (1:g/dL Intervals) of Blood Lead Levels — Rural Village, 1996 (Intervals with No Observations Not Shown)
17 1
26 2
35 1
38 1
39 1
44 1
45 1
46 1
49 1
50 1
54 1
56 1
57 2
58 3
61 1
63 1
64 1
67 1
68 1
69 1
72 1
73 1
74 1
76 2
78 3
79 1
84 1
86 1
103 1
104 1
Unknown 48
To summarize the data further you could use intervals of 5, 10, or perhaps even 20 mcg/dL. Table 2.15 below uses 10 mcg/dL intervals.

Table 2.15 Frequency Distribution (10 mcg/dL Intervals) of Blood Lead Levels — Rural Village, 1996

0–9 0
10–19 1
20–29 2
30–39 3
40–49 6
50–59 8
60–69 6
70–79 9
80–89 2
90–99 0
100–110 2
Total 39
2. Calculate the arithmetic mean.
Arithmetic mean = sum ⁄ n = 2,363 ⁄ 39 = 60.6 mcg/dL
3. Identify the median and interquartile range.
Median at (39 + 1) ⁄ 2 = 20th position. Median = value at 20th position = 58
Q1 at (39 + 1) ⁄ 4 = 10th position. Q1 = value at 10th position = 48
Q3 at 3 × Q1 position = 30th position. Q3 = value at 30th position = 76
4. Subtract the arithmetic mean (question 2) from each of the 39 observed blood level levels.
Square each of these differences (“deviations”).
Sum the squared deviations = 14,577.59
Divide the sum of the squared deviations by n-1 to find the variance.
14,577.59 ∕ 39 = 383.62
Take the square root of the variance to find the standard deviation.
√383.62 = 19.6.
5. Calculate the geometric mean using the log lead levels provided.
Geometric mean = 10(68.45 ⁄ 39) = 10(1.7551) = 56.9 mcg/dL

Lesson 2 Overview
Page last reviewed: May 18, 2012