Step 3: Focus the Evaluation Design
Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide
- Types of Evaluations
- Exhibit 3.1
- Exhibit 3.2
- Determining the Evaluation Focus
- Are You Ready to Evaluate Outcomes?
- Illustrating Evaluation Focus Decisions
- Defining the Specific Evaluation Questions
- Deciding On the Evaluation Design
- Standards for Step 3: Focus the Evaluation Design
- Checklist for Step 3: Focusing the Evaluation Design
- Worksheet 3A - Focusing the Evaluation in the Logic Model
- Worksheet 3B - “Reality Checking” the Evaluation Focus
After completing Steps 1 and 2, you and your stakeholders should have a clear understanding of the program and have reached consensus. Now your evaluation team will need to focus the evaluation. This includes determining the most important evaluation questions and the appropriate design for the evaluation. Focusing the evaluation assumes that the entire program does not need to be evaluated at any point in time. Rather, the right evaluation of the program depends on what question is being asked, who is asking the question, and what will be done with the information.
Since resources for evaluation are always limited, this chapter provides a series of decision criteria to help you determine the best evaluation focus at any point in time. These criteria are inspired by the evaluation standards: specifically, utility (who will use the results and what information will be most useful to them) and feasibility (how much time and resources are available for the evaluation).
The logic models developed in Step 2set the stage for determining the best evaluation focus. The approach to evaluation focus in the CDC Evaluation Framework differs slightly from traditional evaluation approaches. Rather than a summative evaluation, conducted when the program had run its course and asking “Did the program work?” the CDC framework views evaluation as an ongoing activity over the life of a program that asks,” Is the program working?”
Hence, a program is always ready for some evaluation. Because the logic model displays the program from inputs through activities/outputs through to the sequence of outcomes from short-term to most distal, it can guide a discussion of what you can expect to achieve at a given point in the life of your project. Should you focus on distal outcomes, or only on short- or mid-term ones? Or conversely, does a process evaluation make the most sense right now?
Many different questions can be part of a program evaluation; depending on how long the program has been in existence, who is asking the question, and why the evaluation information is needed. In general, evaluation questions for an existing program  fall into one of the following groups:
Implementation evaluations (process evaluations) document whether a program has been implemented as intended—and why or why not? In process evaluations, you might examine whether the activities are taking place, who is conducting the activities, who is reached through the activities, and whether sufficient inputs have been allocated or mobilized. Process evaluation is important to help distinguish the causes of poor program performance—was the program a bad idea, or was it a good idea that could not reach the standard for implementation that you set? In all cases, process evaluations measure whether actual program performance was faithful to the initial plan. Such measurements might include contrasting actual and planned performance along all or some of the following:
- The locale where services or programs are provided (e.g., rural, urban)
- The number of people receiving services
- The economic status and racial/ethnic background of people receiving services
- The quality of services
- The actual events that occur while the services are delivered
- The amount of money the project is using
- The direct and in-kind funding for services
- The staffing for services or programs
- The number of activities and meetings
- The number of training sessions conducted
When evaluation resources are limited, only the most important issues of implementation can be included. Here are some “usual suspects” that compromise implementation and might be considered for inclusion in the process evaluation focus:
- Transfers of Accountability: When a program’s activities cannot produce the intended outcomes unless some other person or organization takes appropriate action, there is a transfer of accountability.
- Dosage: The intended outcomes of program activities (e.g., training, case management, counseling) may presume a threshold level of participation or exposure to the intervention.
- Access: When intended outcomes require not only an increase in consumer demand but also an increase in supply of services to meet it, then the process evaluation might include measures of access.
- Staff Competency: The intended outcomes may presume well-designed program activities delivered by staff that are not only technically competent but also matched appropriately to the target audience. Measures of the match of staff and target audience might be included in the process evaluation.
Our childhood lead poisoning logic model illustrates such potential process issues. Reducing EBLL presumes the house will be cleaned, medical care referrals will be fulfilled, and specialty medical care will be provided. These are transfers of accountability beyond the program to the housing authority, the parent, and the provider, respectively. For provider training to achieve its outcomes, it may presume completion of a three-session curriculum, which is a dosage issue. Case management results in medical referrals, but it presumes adequate access to specialty medical providers. And because lead poisoning tends to disproportionately affect children in low-income urban neighborhoods, many program activities presume cultural competence of the caregiving staff. Each of these components might be included in a process evaluation of a childhood lead poisoning prevention program.
Outcome evaluations assess progress on the sequence of outcomes the program is to address. Programs often describe this sequence using terms like short-term, intermediate, and long-term outcomes, or proximal (close to the intervention) or distal (distant from the intervention). Depending on the stage of development of the program and the purpose of the evaluation, outcome evaluations may include any or all of the outcomes in the sequence, including
- Changes in people’s attitudes and beliefs
- Changes in risk or protective behaviors
- Changes in the environment, including public and private policies, formal and informal enforcement of regulations, and influence of social norms and other societal forces
- Changes in trends in morbidity and mortality
While process and outcome evaluations are the most common, there are several other types of evaluation questions that are central to a specific program evaluation. These include the following:
- Efficiency: Are your program’s activities being produced with minimal use of resources such as budget and staff time? What is the volume of outputs produced by the resources devoted to your program?
- Cost-Effectiveness: Does the value or benefit of your program’s outcomes exceed the cost of producing them?
- Attribution: Can the outcomes be related to your program, as opposed to other things going on at the same time?
All of these types of evaluation questions relate to part, but not all, of the logic model. Exhibits 3.1 and 3.2 show where in the logic model each type of evaluation would focus. Implementation evaluations would focus on the inputs, activities, and outputs boxes and not be concerned with performance on outcomes. Effectiveness evaluations would do the opposite—focusing on some or all outcome boxes, but not necessarily on the activities that produced them. Efficiency evaluations care about the arrows linking inputs to activities/outputs—how much output is produced for a given level of inputs/resources. Attribution would focus on the arrows between specific activities/outputs and specific outcomes—whether progress on the outcome is related to the specific activity/output.
Determining the correct evaluation focus is a case-by-case decision. Several guidelines inspired by the “utility” and “feasibility” evaluation standards can help determine the best focus.
1) What is the purpose of the evaluation?
Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the evaluation questions, design, and methods. Some common purposes:
- Gain new knowledge about program activities
- Improve or fine-tune existing program operations (e.g., program processes or strategies)
- Determine the effects of a program by providing evidence concerning the program’s contributions to a long-term goal
- Affect program participants by acting as a catalyst for self-directed change (e.g., teaching)
2) Who will use the evaluation results?
Users are the individuals or organizations that will employ the evaluation findings. The users will likely have been identified during Step 1 in the process of engaging stakeholders. In this step, you need to secure their input into the design of the evaluation and the selection of evaluation questions. Support from the intended users will increase the likelihood that the evaluation results will be used for program improvement.
3) How will they use the evaluation results?
Many insights on use will have been identified in Step 1. Information collected may have varying uses, which should be described in detail when designing the evaluation. Some examples of uses of evaluation information:
- To document the level of success in achieving objectives
- To identify areas of the program that need improvement
- To decide how to allocate resources
- To mobilize community support
- To redistribute or expand the locations where the intervention is carried out
- To improve the content of the program’s materials
- To focus program resources on a specific population
- To solicit more funds or additional partners
4) What do other key stakeholders need from the evaluation?
Of course, the most important stakeholders are those who request or who will use the evaluation results. Nevertheless, in Step 1, you may also have identified stakeholders who, while not using the findings of the current evaluation, have key questions that may need to be addressed in the evaluation to keep them engaged. For example, a particular stakeholder may always be concerned about costs, disparities, or attribution. If so, you may need to add those questions to your evaluation focus.
The first four questions help identify the most useful focus of the evaluation, but you must also determine whether it is a realistic/feasible one. Three questions provide a reality check on your desired focus:
5) What is the stage of development of the program?
During Step 2, you will have identified the program’s stage of development. There are roughly three stages in program development –planning, implementation, and maintenance — that suggest different focuses. In the planning stage, a truly formative evaluation—who is your target, how do you reach them, how much will it cost—may be the most appropriate focus. An evaluation that included outcomes would make little sense at this stage. Conversely, an evaluation of a program in maintenance stage would need to include some measurement of progress on outcomes, even if it also included measurement of implementation.
Here are some handy rules to decide whether it is time to shift the evaluation focus toward an emphasis on program outcomes:
- Sustainability: Political and financial will exists to sustain the intervention while the evaluation is conducted.
- Fidelity: Actual intervention implementation matches intended implementation. Erratic implementation makes it difficult to know what “version” of the intervention was implemented and, therefore, which version produced the outcomes.
- Stability: Intervention is not likely to change during the evaluation. Changes to the intervention over time will confound understanding of which aspects of the intervention caused the outcomes.
- Reach: Intervention reaches a sufficiently large number of clients (sample size) to employ the proposed data analysis. For example, the number of clients needed may vary with the magnitude of the change expected in the variables of interest (i.e., effect size) and the power needed for statistical purposes.
- Dosage: Clients have sufficient exposure to the intervention to result in the intended outcomes. Interventions with limited client contact are less likely to result in measurable outcomes, compared to interventions that provide more in-depth intervention.
6) How intensive is the program?
Some programs are wide-ranging and multifaceted. Others may use only one approach to address a large problem. Some programs provide extensive exposure (“dose”) of the program, while others involve participants quickly and superficially. Simple or superficial programs, while potentially useful, cannot realistically be expected to make significant contributions to distal outcomes of a larger program, even when they are fully operational.
7) What are relevant resource and logistical considerations?
Resources and logistics may influence decisions about evaluation focus. Some outcomes are quicker, easier, and cheaper to measure, while others may not be measurable at all. These facts may tilt the decision about evaluation focus toward some outcomes as opposed to others.
Early identification of inconsistencies between utility and feasibility is an important part of the evaluation focus step. But we must also ensure a “meeting of the minds” on what is a realistic focus for program evaluation at any point in time.
The affordable housing example shows how the desired focus might be constrained by reality. The elaborated logic model was important in this case. It clarified that, while program staff were focused on production of new houses, important stakeholders like community-based organizations and faith-based donors were committed to more distal outcomes such as changes in life outcomes of families, or on the outcomes of outside investment in the community. The model led to a discussion of reasonable expectations and, in the end, to expanded evaluation indicators that included some of the more distal outcomes, that led to stakeholders’ greater appreciation of the intermediate milestones on the way to their preferred outcomes.
Because the appropriate evaluation focus is case-specific, let’s apply these focus issues to a few different evaluation scenarios for the CLPP program.
At the 1-year mark, a neighboring community would like to adopt your program but wonders, “What are we in for?” Here you might determine that questions of efficiency and implementation are central to the evaluation. You would likely conclude this is a realistic focus, given the stage of development and the intensity of the program. Questions about outcomes would be premature.
At the 5-year mark, the auditing branch of your government funder wants to know, “Did you spend our money well?” Clearly, this requires a much more comprehensive evaluation, and would entail consideration of efficiency, effectiveness, possibly implementation, and cost-effectiveness. It is not clear, without more discussion with the stakeholder, whether research studies to determine causal attribution are also implied. Is this a realistic focus? At year 5, probably yes. The program is a significant investment in resources and has been in existence for enough time to expect some more distal outcomes to have occurred.
Note that in either scenario, you must also consider questions of interest to key stakeholders who are not necessarily intended users of the results of the current evaluation. Here those would be advocates, concerned that families not be blamed for lead poisoning in their children, and housing authority staff, concerned that amelioration include estimates of costs and identification of less costly methods of lead reduction in homes. By year 5, these look like reasonable questions to include in the evaluation focus. At year 1, stakeholders might need assurance that you care about their questions, even if you cannot address them yet.
These focus criteria identify the components of the logic model to be included in the evaluation focus, i.e., these activities, but not these; these outcomes, but not these. At this point, you convert the components of your focus into specific questions, i.e., implementation, effectiveness, efficiency, and attribution. Were my activities implemented as planned? Did my intended outcomes occur? Were the outcomes due to my activities as opposed to something else? If the outcomes occurred at some but not all sites, what barriers existed at less successful locations and what factors were related to success? At what cost were my activities implemented and my outcomes achieved?
Besides determining the evaluation focus and specific evaluation questions, at this point you also need to determine the appropriate evaluation design. Of chief interest in choosing the evaluation design is whether you are being asked to monitor progress on outcomes or whether you are also asked to show attribution—that progress on outcomes is related to your program efforts. Attribution questions may more appropriately be viewed as research as opposed to program evaluation, depending on the level of scrutiny with which they are being asked.
Three general types of research designs are commonly recognized: experimental, quasi-experimental, and non-experimental/observational. Traditional program evaluation typically uses the third type, but all three are presented here because, over the life of the program, traditional evaluation approaches may need to be supplemented with other studies that look more like research.
Experimental designs use random assignment to compare the outcome of an intervention on one or more groups with an equivalent group or groups that did not receive the intervention. For example, you could select a group of similar schools, and then randomly assign some schools to receive a prevention curriculum and other schools to serve as controls. All schools have the same chance of being selected as an intervention or control school. Random assignment, reduces the chances that the control and intervention schools vary in any way that could influence differences in program outcomes. This allows you to attribute change in outcomes to your program. For example, if the students in the intervention schools delayed onset or risk behavior longer than students in the control schools, you could attribute the success to your program. However, in community settings it is hard, or sometimes even unethical, to have a true control group.
While there are some solutions that preserve the integrity of experimental design, another option is to use a quasi-experimental design. These designs make comparisons between nonequivalent groups and do not involve random assignment to intervention and control groups.
An example would be to assess adults’ beliefs about the harmful outcomes of environmental tobacco smoke (ETS) in two communities, then conduct a media campaign in one of the communities. After the campaign, you would reassess the adults and expect to find a higher percentage of adults believing ETS is harmful in the community that received the media campaign. Critics could argue that other differences between the two communities caused the changes in beliefs, so it is important to document that the intervention and comparison groups are similar on key factors such as population demographics and related current or historical events.
Related to quasi-experimental design, comparison of outcomes/outcome data among states and between one state and the nation as a whole are common ways to evaluate public health efforts. Such comparisons will help you establish meaningful benchmarks for progress. States can compare their progress with that of states with a similar investment in their area of public health, or they can contrast their outcomes with the results to expect if their programs were similar to those of states with a larger investment.
Comparison data are also useful for measuring indicators in anticipation of new or expanding programs. For example, noting a lack of change in key indicators over time prior to program implementation helps demonstrate the need for your program and highlights the comparative progress of states with comprehensive public health programs already in place. A lack of change in indicators can be useful as a justification for greater investment in evidence-based, well-funded, and more comprehensive programs. Between-state comparisons can be highlighted with time–series analyses. For example, questions on many of the larger national surveillance systems have not changed in several years, so you can make comparisons with other states over time, using specific indicators. Collaborate with state epidemiologists, surveillance coordinators, and statisticians to make state and national comparisons an important component of your evaluation.
Observational designs include, but are not limited to, time–series analysis, cross-sectional surveys, and case studies. Periodic cross-sectional surveys (e.g.., the YTS or BRFSS) can inform your evaluation. Case studies may be particularly appropriate for assessing changes in public health capacity in disparate population groups. Case studies are applicable when the program is unique, when an existing program is used in a different setting, when a unique outcome is being assessed, or when an environment is especially unpredictable. Case studies can also allow for an exploration of community characteristics and how these may influence program implementation, as well as identifying barriers to and facilitators of change.
This issue of “causal attribution,” while often a central research question, may or may not need to supplement traditional program evaluation. The field of public health is under increasing pressure to demonstrate that programs are worthwhile, effective, and efficient. During the last two decades, knowledge and understanding about how to evaluate complex programs have increased significantly. Nevertheless, because programs are so complex, these traditional research designs described here may not be a good choice. As the World Health Organization notes, “the use of randomized control trials to evaluate health promotion initiatives is, in most cases, inappropriate, misleading, and unnecessarily expensive.” 
Consider the appropriateness and feasibility of less traditional designs (e.g., simple before–after [pretest–posttest] or posttest-only designs). Depending on your program’s objectives and the intended use(s) for the evaluation findings, these designs may be more suitable for measuring progress toward achieving program goals. Even when there is a need to prove that the program was responsible for progress on outcomes, traditional research designs may not be the only or best alternative. Depending on how rigorous the proof needs to be, proximity in time between program implementation and progress on outcomes, or systematic elimination of alternative explanations may be enough to persuade key stakeholders that the program is making a contribution. While these design alternatives often cost less and require less time, keep in mind that saving time and money should not be the main criteria selecting an evaluation design. It is important to choose a design that will measure what you need to measure and that will meet both your immediate and long-term needs.
Another alternative to experimental and quasi-experimental models is a goal-based evaluation model, that uses predetermined program goals and the underlying program theory as the standards for evaluation, thus holding the program accountable to prior expectations. The CDC Framework’s emphasis on program description and the construction of a logic model sets the stage for strong goal-based evaluations of programs. In such cases, evaluation planning focuses on the activities; outputs; and short-term, intermediate, and long-term outcomes outlined in a program logic model to direct the measurement activities.
The design you select influences the timing of data collection, how you analyze the data, and the types of conclusions you can make from your findings. A collaborative approach to focusing the evaluation provides a practical way to better ensure the appropriateness and utility of your evaluation design.
- Define the purpose(s) and user(s) of your evaluation.
- Identify the use(s) of the evaluation results.
- Consider stage of development, program intensity, and logistics and resources.
- Determine the components of your logic model that should be part of the focus given these utility and feasibility considerations.
- Formulate the evaluation questions to be asked of the program components in your focus, i.e., implementation, effectiveness, efficiency, and attribution questions.
- Review evaluation questions with stakeholders, program managers, and program staff.
- Review options for the evaluation design, making sure that the design fits the evaluation questions.
|If this is the situation…||Then these are the parts of the logic model, I would include in my evaluation focus:|
|1||Who is asking evaluation questions of the program?|
|2||Who will use the evaluation results and for what purpose?|
|3||In Step 1, did we identify interests of other stakeholders that we must take into account?|
|If this is my answer to these questions…||Then I would conclude the questions in my evaluation focus are/are not reasonable ones to ask right now.|
|1||How long has the intervention been underway?|
|2||How intensive/ambitious is the intervention? Multi-faceted effort or simple intervention?|
|3||How much (time and money) can be devoted to evaluation of this effort?|
 There is another type of evaluation—“formative” evaluation—where the purpose of the evaluation is to gain insight into the nature of the problem so that you can “formulate” a program or intervention to address it. While many steps of the Framework will be helpful for formative evaluation, the emphasis in this manual is on instances wherein the details of the program/intervention are already known even though it may not yet have been implemented.