Step 4: Gather Credible Evidence

Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide

PAGE 10 of 14

View Table of Contents

Now that you have developed a logic model, chosen an evaluation focus, and selected your evaluation questions, your next task is to gather the evidence. Gathering evidence for an evaluation resembles gathering evidence for any research or data-oriented project, with a few exceptions noted below.

What’s Involved in Gathering Evidence?

Evidence gathering must include consideration of each of the following:

  • Indicators
  • Sources of evidence/methods of data collection
  • Quality
  • Quantity
  • Logistics

Developing Indicators

Because the components of our programs are often expressed in global or abstract terms, indicators — specific, observable, and measurable statements — help define exactly what we mean or are looking for. Outcome indicators provide clearer definitions of global outcome statements such as “Children receive medical treatment” or “Families adopt in-home techniques.” The medical treatment indicator might specify the type of medical treatment, the duration, or the adherence to the regimen. Likewise, the family indicator might indicate the in-home techniques and/or the intensity or duration of their adoption. For example, “Families with EBLL children clean all window sills and floors with the designated cleaning solution each week” or “Families serve leafy green vegetables at three or more meals per week.” Outcome indicators such as these indicators provide clearer definitions of the global statement and help guide the selection of data collection methods and the content of data collection instruments.

Process indicators help define global activity statements such as “good coalition,” “culturally competent training,” and “appropriate quality patient care.” What does “good” mean, what does “quality” or “appropriate” mean?

Keep the following tips in mind when selecting your indicators:

  • Indicators can be developed for activities (process indicators) and/or for outcomes (outcome indicators). [19]
  • There can be more than one indicator for each activity or outcome.
  • The indicator must be focused and must measure an important dimension of the activity or outcome.
  • The indicator must be clear and specific in terms of what it will measure.
  • The change measured by the indicator should represent progress toward implementing the activity or achieving the outcome.

Consider CDC’s immunization program, for example. The table below lists the components of the logic model included in our focus in Step 3. Each of these components has been defined in one or more indicators.

 Top of Page

Table 4.1 – Provider Immunization Education Program: Indicators for Program Component in Our Evaluation Focus

Program Component Indicator(s)
Provider training A series of 3 trainings will be conducted in all 4 regions of the state
Nurse educator LHD presentations Nurse educators will make presentations to the10 largest local health departments (LHDs)
Physicians peer rounds Physicians will host peer education rounds at the 10 largest hospitals
Providers attend trainings and rounds Trainings will be well attended and reflect a good mix of specialties and geographic representation
Providers receive and use tool kits 50%+ of providers who receive tool kit will report use of it (or “call to action” cards will be received from 25% of all providers receiving tool kit)
LHD nurses conduct private provider consults Trained nurses in LHDs will conduct provider consults with largest provider practices in county
Provider KAB increases Providers show increases in knowledge, attitudes, and beliefs (KAB) on selected key immunization items
Provider motivation increases Provider intent to immunize increases

You may need to develop your own indicators or you may be able to draw on existing indicators developed by others. Some large CDC programs have developed indicator inventories tied to major activities and outcomes for the program. Advantages of these indicator inventories:

  • They may have been pre-tested for relevance and accuracy.
  • They define the best data sources for collecting the indicator.
  • There are often many potential indicators for each activity or outcome, ensuring that at least one will be appropriate for your program.
  • Because many programs are using the same indicator(s), you can compare performance across programs or even construct a national performance summary.

 Top of Page

Selecting Data Collection Methods and Sources

Now that you have determined the activities and outcomes you want to measure and the indicators you will use to measure progress on them, you need to select data collection methods and sources.

A key decision is whether there are existing data sources—secondary data collection—to measure your indicators or whether you need to collect new data—primary data collection.

Depending on your evaluation questions and indicators, some secondary data sources may be appropriate. Some existing data sources that often come into play in measuring outcomes of public health programs are:

  • Current Population Survey and other U.S. Census files
  • Behavioral Risk Factor Surveillance System (BRFSS)
  • Youth Risk Behavior Survey (YRBS)
  • Pregnancy Risk Assessment Monitoring System (PRAMS)
  • Cancer registries
  • State vital statistics
  • Various surveillance databases
  • National Health Interview Survey (NHIS)

Before using secondary data sources, ensure that they meet your needs. Although large ongoing surveillance systems have the advantages of collecting data routinely and having existing resources and infrastructure, some of them (e.g., Current Population Survey [CPS]) have little flexibility with regard to the questions asked in the survey, making it nearly impossible to use these systems to collect special data for your evaluation. By contrast, other surveys such as BRFSS or PRAMS are more flexible. You might be able to add program‑specific questions, or you might expand the sample size for certain geographic areas or target populations, allowing for more accurate estimates in smaller populations.

Primary data collection methods also fall into several broad categories. Among the most common are:

  • Surveys, including personal interviews, telephone interviews, and instruments completed by respondent, received through the mail or e-mail
  • Group discussions/focus groups
  • Observation
  • Document review, such as medical records, but also diaries, logs, minutes of meetings, etc.

Choosing the right method from the many secondary and primary data collection choices must consider both the context (How much money can be devoted to collection and measurement? How soon are results needed? Are there ethical considerations?) and the content of the question (Is it a sensitive issue? Is it about a behavior that is observable? Is it something the respondent is likely to know?).

Some methods yield qualitative data and some yield quantitative data. If the question involves an abstract concept or one where measurement is poor, using multiple methods is often helpful. Insights from stakeholder discussions in Step 1 and the clarity on purpose/user/use obtained in Step 3 will help direct the choice of sources and methods. For example, stakeholders may know which methods will work best with some intended respondents and/or have a strong bias toward quantitative or qualitative data collection that must be honored if the results are to be credible. More importantly, the purpose and use/user may dictate the need for valid, reliable data that will withstand close scrutiny or may allow for less rigorous data collection that can direct managers.

Each method comes with advantages and disadvantages depending on the context and content of the data collection (see Table 4.2)

 Top of Page

Table 4.2 – Advantages and Disadvantages of Various Survey Methods

Method Advantages Disadvantages
Personal interviews
  • Least selection bias: can interview people without telephones—even homeless people.
  • Greatest response rate: people are most likely to agree to be surveyed when asked face to face.
  • Visual materials may be used.
  • Most costly: requires trained interviewers and travel time and costs.
  • Least anonymity: therefore, most likely that respondents will shade their responses toward what they believe is socially acceptable.
Telephone interviews
  • Most rapid method.
  • Most potential to control the quality of the interview: inter­viewers remain in one place,
    so supervisors can oversee
    their work.
  • Easy to select telephone
    numbers at random.
  • Less expensive than personal interviews.
  • Better response rate than
    for mailed surveys.
  • Most selection bias: omits homeless people and people without telephones.
  • Less anonymity for respondents than for those completing instruments in private.
  • As with personal interviews, requires a trained interviewer.
Instruments to be completed by respondent
  • Most anonymity: therefore, least bias toward socially acceptable responses.
  • Cost per respondent varies with response rate: the higher the response rate, the lower the cost per respondent.
  • Less selection bias than with telephone interviews.
  • Least control over quality of data.
  • Dependent on respondent’s reading level.
  • Mailed instruments have lowest response rate.
  • Surveys using mailed instruments take the most time to complete because such instruments require time in the mail and time for respondent to complete.


Some Sources of Data

Who might you survey or interview?

  • Clients, program participants, nonparticipants
  • Staff, program managers, administrators
  • Partner agency staff
  • General public
  • Community leaders or key members of a community
  • Funders
  • Representatives of advocacy groups
  • Elected officials, legislators, policymakers
  • Local and state health officials

What might you observe?

  • Meetings
  • Special events or activities
  • On the job performance
  • Service encounters

Which documents might you analyze?

  • Meeting minutes, administrative records
  • Client medical records or other files
  • Newsletters, press releases
  • Strategic plans or work plans
  • Registration, enrollment, or intake forms
  • Previous evaluation reports
  • Records held by funders or collaborators
  • Web pages
  • Graphs, maps, charts, photographs, videotapes

The text box to the right lists possible sources of information for evaluations clustered in three broad categories: people, observations, and documents.

When choosing data collection methods and sources, select those that meet your project’s needs. Avoid choosing a data method/source that may be familiar or popular but does not necessarily answer your questions. Keep in mind that budget issues alone should not drive your evaluation planning efforts.

The four evaluation standards can help you reduce the enormous number of data collection options to a manageable number that best meet your data collection situation.
Here is a checklist of issues — based on the evaluation standards — that will help you choose appropriately:


  • Purpose and use of data collection: Do you seek a point-in-time determination of a behavior, or to examine the range and variety of experiences, or to tell an in-depth story?
  • Users of data collection: Will some methods make the data more credible with skeptics or key users?


  • Resources available: Which methods can you afford?
  • Time: How long until the results are needed?
  • Frequency: How often do you need the data?
  • Your background: Are you trained in the method, or will you need help from an outside consultant?



  • Characteristics of the respondents: Will issues such as literacy or language make some methods preferable to others?
  • Degree of intrusion to program/participants: Will the data collection method disrupt the program or be seen as intrusive by participants?
  • Other ethical issues: Are there issues of confidentiality or respondents’ safety in seeking answers to questions on this issue?


  • Nature of the issue: Is it about an observable behavior?
  • Sensitivity of the issue: How open and honest will respondents be answering questions on this issue?
  • Respondent knowledge: Is it something the respondent is likely to know?

 Top of Page

Using Multiple Methods and Mixed Methods

Sometimes a single method is not sufficient to accurately measure an activity or outcome because the thing being measured is complex and/or the data method/source does not yield data reliable or accurate enough. Employing multiple methods (sometimes called “triangulation”) helps increase the accuracy of the measurement and the certainty of your conclusions when the various methods yield similar results. Mixed data collection refers to gathering both quantitative and qualitative data. Mixed methods can be used sequentially, when one method is used to prepare for the use of another, or concurrently. An example of sequential use of mixed methods is when focus groups (qualitative) are used to develop a survey instrument (quantitative), and then personal interviews (qualitative and quantitative) are conducted to investigate issues that arose during coding or interpretation of survey data. An example of concurrent use of mixed methods would be using focus groups or open-ended personal interviews to help affirm the response validity of a quantitative survey.

Different methods reveal different aspects of the program. Consider some interventions related to tobacco control:

  • You might include a group assessment of a school-based tobacco control program to hear the group’s viewpoint, as well as individual student interviews to get a range of opinions.
  • You might conduct a survey of all legislators in a state to gauge their interest in managed care support of cessation services and products, and you might also interview certain legislators individually to question them in greater detail.
  • You might conduct a focus group with community leaders to assess their attitudes regarding tobacco industry support of cultural and community activities. You might follow the focus group with individual structured or semi-structured interviews with the same participants.

When the outcomes under investigation are very abstract or no one quality data source exists, combining methods maximizes the strengths and minimizes the limitations of each method. Using multiple or mixed methods can increase the cross-checks on different subsets of findings and generate increased stakeholder confidence in the overall findings.

Illustrations from Cases

Table 4.3 presents data collection methods/sources for each of the indicators for the provider immunization education program. Table 4.4 shows both the indicators and the data sources for key components of the CLPP effort Note that in both cases the methods/sources can vary widely and that in some cases multiple methods will be used and synthesized.

 Top of Page

Table 4.3 – Provider Immunization Education Program: Data Collection Methods and Sources for Indicators

Indicator(s) Data Collection Methods/Sources
A series of 3 trainings will be conducted in all 4 regions of the state Training logs
Nurse educators will make presentations to the 10 largest local health departments (LHDs) Training logs
Physicians will host peer education rounds at the10 largest hospitals Training logs
Trainings will be well-attended and reflect a good mix of specialties and geographic representation Registration information
50%+ of providers who receive the tool kit will report use of it (or “call to action” cards will be received from 25% of all providers receiving tool kit) Survey of providers

Analysis/count of call-to-action cards

Trained nurses in LHDs will conduct provider consults with the largest provider practices in county Survey of nurses, survey of providers, or training logs
Providers show increases in knowledge, attitudes, and
beliefs (KAB) on selected key immunization items
Survey of providers, or focus groups, or intercepts
Provider intent to immunize increases Survey of providers, or focus groups, or intercepts

 Top of Page

Table 4.4 – CLPP: Indicators and Data Collection Methods/Sources

Logic Model Element Indicator(s) Data Source(s) and Method(s)
Outreach High-risk children and families in the district have been reached with relevant information Logs of direct mail and health fair contacts

Demographic algorithm

Geographic Information System (GIS) algorithm

Screening High-risk children have completed initial and follow-up screening Logs and lab data
Environment assessment Environments of all children over EBLL threshold have been assessed for lead poisoning Logs of environmental health staff
Case management All children over EBLL threshold have a case management plan including social, medical, and environmental components Case file of EBLL child
Family training Families of all children over EBLL threshold have received training on household behaviors to reduce EBLL Logs of case managers

Survey of families

“Leaded” houses referred All houses of EBLL children with evidence of lead have been referred to housing authority Logs and case files
“Leaded” houses cleaned All referred houses have been cleaned up Follow-up assessment by environmental health staff

Housing authority logs

 Top of Page

Quality of Data

A quality evaluation produces data that are reliable, valid, and informative. An evaluation is reliable to the extent that it repeatedly produces the same results, and it is valid if it measures what it is intended to measure. The advantage of using existing data sources such as the BRFSS, YRBS, or PRAMS is that they have been pretested and designed to produce valid and reliable data. If you are designing your own evaluation tools, you should be aware of the factors that influence data quality:

  • Design of the data collection instrument and how questions are worded
  • Data collection procedures
  • Training of data collectors
  • Selection of data sources
  • How the data are coded
  • Data management
  • Routine error checking as part of data quality control

A key way to enhance the quality of primary data collection is through a pretest. The pretest need not be elaborate but should be extensive enough to determine issues of the logistics of data collection or the intelligibility of instruments prior to rollout. Obtaining quality data involves trade-offs (i.e., breadth vs. depth). Thus, you and stakeholders must decide at the beginning of the evaluation process what level of quality is necessary to meet stakeholders’ standards for accuracy and credibility.

 Top of Page

Quantity of Data

You will also need to determine the amount of data you want to collect during the evaluation. There are cases where you will need data of the highest validity and reliability, especially when traditional program evaluation is being supplemented with research studies. But there are other instances where the insights from a few cases or a convenience sample may be appropriate. If you use secondary data sources, many issues related to the quality of data—such as sample size—have already been determined. If you are designing your own data collection tool and the examination of your program includes research as well as evaluation questions, the quantity of data you need to collect (i.e., sample sizes) will vary with the level of detail and the types of comparisons you hope to make. You will also need to determine the jurisdictional level for which you are gathering the data (e.g., state, county, region, congressional district). Counties often appreciate and want county-level estimates; however, this usually means larger sample sizes and more expense. Finally, consider the size of the change you are trying to detect. In general, detecting small amounts of change requires larger sample sizes. For example, detecting a 5% increase would require a larger sample size than detecting a 10% increase. You may need the help of a statistician to determine adequate sample size.

 Top of Page

Logistics and Protocols

Logistics are the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations have cultural preferences that dictate acceptable ways of asking questions and collecting information, and influence who is perceived as an appropriate person to ask the questions (i.e., someone known within the community versus a stranger from a local health agency). The techniques used to gather evidence in an evaluation must be in keeping with a given community’s cultural norms. Data collection procedures should also protect confidentiality.

In outlining procedures for collecting the evaluation data, consider these issues:

  • When will you collect the data? You will need to determine when (and at what intervals) to collect the information. If you are measuring whether your objectives have been met, your objectives will provide guidance as to when to collect certain data. If you are evaluating specific program interventions, you might want to obtain information from participants before they begin the program, upon completion of the program, and several months after the program. If you are assessing the effects of a community campaign, you might want to assess community knowledge, attitudes, and behaviors among your target audience before and after the campaign.
  • Who will be participating in the evaluation? Are you targeting a relatively specific group (African‑American young people), or are you assessing trends among a more general population (all women of childbearing age)?
  • Are you going to collect data from all participants or a sample? Some programs are community‑based, and surveying a sample of the population participating in such programs is appropriate. However, if you have a small number of participants (such as students exposed to a curriculum in two schools), you may want to survey all participants.
  • Who will collect the information? Are those collecting the data trained consistently? Will the data collectors uniformly gather and record information? Your data collectors will need to be trained to ensure that they all collect information in the same way and without introducing bias. Preferably, interviewers should be trained together by the same person.
  • How will the security and confidentiality of the information be maintained? It is important to ensure the privacy and confidentiality of the evaluation participants. You can do this by collecting information anonymously and making sure you keep data stored in a locked and secure place.
  • If the examination of your program includes research as well as evaluation studies, do you need approval from an institutional review board (IRB) before collecting the data? What will your informed consent procedures be?

You may already have answered some of these questions while selecting your data sources and methods.

 Top of Page

Agreements: Affirming Roles and Responsibilities

Agreements summarize the evaluation procedures, clarify everyone’s role and responsibilities, and describe how the evaluation procedures will be implemented. Elements of an agreement include statements concerning the intended users, uses, purpose, questions, design, and methods, as well as a summary of the deliverables, timeline, and budget. An agreement might be a legal contract, a memorandum of understanding, or a detailed protocol. Creating an agreement establishes a mutual understanding of the activities associated with the evaluation. It also provides a basis for modification if necessary.

 Top of Page

Standards for Step 4: Gather Credible Evidence

Standard Questions
  • Have key stakeholders who can assist with access to respondents been consulted?
  • Are methods and sources appropriate to the intended purpose and use of the data?
  • Have key stakeholders been consulted to ensure there are no preferences for or obstacles to selected methods or sources?
  • Are there specific methods or sources that will enhance the credibility of the data with key users and stakeholders?
  • Can the data methods and sources be implemented within the time and budget for the project?
  • Does the evaluation team have the expertise to implement the chosen methods?
  • Are the methods and sources consistent with the culture and characteristics of the respondents, such as language and literacy level?
  • Are logistics and protocols realistic given the time and resources that can be devoted to data collection?
  • Will data collection be unduly disruptive?
  • Are there issues of safety of respondents or confidentiality that must be addressed?
  • Are the methods and sources appropriate to the culture and characteristics of the respondents—will they understand what they are being asked?
  • Are appropriate QA procedures in place to ensure quality of data collection?
  • Are enough data being collected—i.e., to support chosen confidence levels or statistical power?
  • Are methods and sources consistent with the nature of the problem, the sensitivity of the issue, and the knowledge level of the respondents?


 Top of Page

Checklist for Step 4: Gathering Credible Evidence

  • Identify indicators for activities and outcomes in the evaluation focus.
  • Determine whether existing indicators will suffice or whether new ones must be developed.
  • Consider the range of data sources and choose the most appropriate one.
  • Consider the range of data collection methods and choose those best suited to your context and content.
  • Pilot test new instruments to identify and/or control sources of error.
  • Consider a mixed-method approach to data collection.
  • Consider quality and quantity issues in data collection.
  • Develop a detailed protocol for data collection.

 Top of Page

Worksheet 4A – Evaluation Questions, Indicators, and Data Collection Methods/Sources

Logic Model Components in Evaluation Focus Indicator(s) or Evaluation Questions Data Method(s)/Source(s)

 Top of Page

Worksheet 4B – Data Collection Logistics

Data Collection Method/Source From whom will these data be collected By whom will these data be collected and when Security or confidentiality steps

 Top of Page

Contact Evaluation Program