# Lesson 2: Summarizing Data

• Division of Scientific Education and Professional Development
1600 Clifton Rd
Mailstop E-92
Atlanta, GA 30333
Contact DSEPD
• 800-CDC-INFO
(800-232-4636)
TTY: (888) 232-6348
• Contact CDC–INFO

1. C
2. A
3. D
4. A
5. D

### Exercise 2.2

Previous YearsFrequency
Total19
02
15
24
33
41
51
61
70
81
90
100
110
121

### Exercise 2.3

1. Create frequency distribution (done in Exercise 2.2, above)
2. Identify the value that occurs most often.
Most common value is 1, so mode is 1 previous vaccination.

### Exercise 2.4

1. Arrange the observations in increasing order.
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12
2. Find the middle position of the distribution with 19 observations.
Middle position = (19 + 1) ⁄ 2 = 10
3. Identify the value at the middle position.
0, 0, 1, 1, 1, 1, 1, 2, 2, *2*, 2, 3, 3, 3, 4, 5, 6, 8, 12
Counting from the left or right to the 10th position, the value is 2. So the median = 2 previous vaccinations.

### Exercise 2.5

1. Add all of the observed values in the distribution.
2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1 = 57
2. Divide the sum by the number of observations
57 ⁄ 19 = 3.0

So the mean is 3.0 previous vaccinations

### Exercise 2.6

Using Method A:

1. Take the log (in this case, to base 2) of each value.
ID #ConvalescentLog base 2
11:5129
21:5129
31:1287
41:5129
51:102410
61:102410
71:204811
81:1287
91:409612
101:102410
2. Calculate the mean of the log values by summing and dividing by the number of observations (10).
Mean of log2(xi) = (9 + 9 + 7 + 9 + 10 + 10 + 11 + 7 + 12 + 10) ⁄ 10 = 94 ⁄ 10 = 9.4
3. Take the antilog of the mean of the log values to get the geometric mean.
Antilog2(9.4) = 29.4 = 675.59. Therefore, the geometric mean dilution titer is 1:675.6.

### Exercise 2.7

1. E or A; equal number of patients in 1999 and 1998.
2. C or B; mean and median are very close, so either would be acceptable.
3. E or A; for a nominal variable, the most frequent category is the mode.
4. D
5. B; mean is skewed, so median is better choice.
6. B; mean is skewed, so median is better choice.

### Exercise 2.8

1. Arrange the observations in increasing order.
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12
2. Find the position of the 1st and 3rd quartiles. Note that the distribution has 19 observations.
Position of Q1 = (n + 1) ⁄ 4 = (19 + 1) ⁄ 4 = 5
Position of Q3 = 3(n + 1) ⁄ 4 = 3(19 + 1) ⁄ 4 = 15
3. Identify the value of the 1st and 3rd quartiles.
Value at Q1 (position 5) = 1
Value at Q3 (position 15) = 4
4. Calculate the interquartile range as Q3 minus Q1.
Interquartile range = 4 − 1 = 3
5. The median (at position 10) is 2. Note that the distance between Q1 and the median is 2 − 1 = 1. The distance between Q3 and the median is 4 − 2 = 2. This indicates that the vaccination data is skewed slightly to the right (tail points to greater number of previous vaccinations).

### Exercise 2.9

1. Calculate the arithmetic mean.
Mean = (2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1) ⁄ 19
= 57 ⁄ 19
= 3.0
2. Subtract the mean from each observation. Square the difference.
3. Sum the squared differences.
Value Minus MeanDifferenceDifference Squared
2 − 3.0−1.01.0
0 − 3.0−3.09.0
3 − 3.00.00.0
1 − 3.0−2.04.0
0 − 3.0− 3.09.0
1 − 3.0−2.04.0
2 − 3.0−1.01.0
2 − 3.0−1.01.0
4 − 3.01.01.0
8 −3.05.025.0
1 − 3.0−2.04.0
57 − 57.0 = 00.0162.0
4. Divide the sum of the squared differences by n − 1.
Variance = 162 ⁄ (19 − 1) = 162 ⁄ 18 = 9.0 previous vaccinations squared
5. Take the square root of the variance. This is the standard deviation.
Standard deviation = 9.0 = 3.0 previous vaccinations

### Exercise 2.10

Standard error of the mean = 42 divided by the square root of 4,462 = 0.629

### Exercise 2.11

1. Summarize the blood level data with a frequency distribution.
Table 2.14 Frequency Distribution (1:g/dL Intervals) of Blood Lead Levels — Rural Village, 1996 (Intervals with No Observations Not Shown)
171
262
351
381
391
441
451
461
491
501
541
561
572
583
611
631
641
671
681
691
721
731
741
762
783
791
841
861
1031
1041
Unknown48
To summarize the data further you could use intervals of 5, 10, or perhaps even 20 mcg/dL. Table 2.15 below uses 10 mcg/dL intervals.
Table 2.15 Frequency Distribution (10 mcg/dL Intervals) of Blood Lead Levels — Rural Village, 1996
0–90
10–191
20–292
30–393
40–496
50–598
60–696
70–799
80–892
90–990
100–1102
Total39
2. Calculate the arithmetic mean.
Arithmetic mean = sum ⁄ n = 2,363 ⁄ 39 = 60.6 mcg/dL
3. Identify the median and interquartile range.
Median at (39 + 1) ⁄ 2 = 20th position. Median = value at 20th position = 58
Q1 at (39 + 1) ⁄ 4 = 10th position. Q1 = value at 10th position = 48
Q3 at 3 × Q1 position = 30th position. Q3 = value at 30th position = 76
4. Calculate the standard deviation.
Square of sum = 2,3632 = 5,583,769
Sum of squares × n = 157,743 × 39 = 6,157,977
Difference = 6,151,977 − 5,583,769 = 568,208
Variance = 568,208 ⁄ (39 × 38) = 383.4062
Standard deviation = square root (383.4062) = 19.58 mcg/dL
5. Calculate the geometric mean using the log lead levels provided.
Geometric mean = 10(68.45 ⁄ 39) = 10(1.7551) = 56.9 mcg/dL

Top