Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content

Lesson 2: Summarizing Data

Section 1: Organizing Data

Whether you are conducting routine surveillance, investigating an outbreak, or conducting a study, you must first compile information in an organized manner. One common method is to create a line list or line listing. Table 2.1 is a typical line listing from an epidemiologic investigation of an apparent cluster of hepatitis A.

A variable can be any characteristic that differs from person to person, such as height, sex, smallpox vaccination status, or physical activity pattern. The value of a variable is the number or descriptor that applies to a particular person, such as 5'6" (168 cm), female, and never vaccinated.

The line listing is one type of epidemiologic database, and is organized like a spreadsheet with rows and columns. Typically, each row is called a record or observation and represents one person or case of disease. Each column is called a variable and contains information about one characteristic of the individual, such as race or date of birth. The first column or variable of an epidemiologic database usually contains the person's name, initials, or identification number. Other columns might contain demographic information, clinical details, and exposures possibly related to illness.

Table 2.1 Line Listing of Hepatitis A Cases, County Health Department, January — February 2004

IDDate of
TownAge (Years)SexHospJaundiceOutbreakIV DrugsIgM PosHighest ALT*

* ALT = Alanine aminotransferase

Some epidemiologic databases, such as line listings for a small cluster of disease, may have only a few rows (records) and a limited number of columns (variables). Such small line listings are sometimes maintained by hand on a single sheet of paper. Other databases, such as birth or death records for the entire country, might have thousands of records and hundreds of variables and are best handled with a computer. However, even when records are computerized, a line listing with key variables is often printed to facilitate review of the data.

Epi InfoIcon of the Epi Info computer software developed at CDC

One computer software package that is widely used by epidemiologists to manage data is Epi Info, a free package developed at CDC. Epi Info allows the user to design a questionnaire, enter data right into the questionnaire, edit the data, and analyze the data. Two versions are available:

Epi Info 3 (formerly Epi Info 2000 or Epi Info 2002) is Windows-based, and continues to be supported and upgraded. It is the recommended version and can be downloaded from the CDC website:

Epi Info 6 is DOS-based, widely used, but being phased out.

This lesson includes Epi Info commands for creating frequency distributions and calculating some of the measures of central location and spread described in the lesson. Since Epi Info 3 is the recommended version, only commands for this version are provided in the text; corresponding commands for Epi Info 6 are offered at the end of the lesson.