Evaluation Manual: Step 4: Gather Credible Evidence
Introduction to Program Evaluation for Public Health Program
Evaluating Appropriate Antibiotic Use Programs
Released April 2006
Now that you have developed a logic model, chosen an evaluation focus, and selected your evaluation questions, your next task is to gather the evidence. The gathering of evidence for an evaluation resembles the gathering of evidence for any research or data-oriented project, with a few exceptions noted below.
What's Involved in Gathering Evidence?
Evidence gathering must include consideration of each of the following:
- Sources of evidence/methods of data collection
Because the components of our programs are often expressed in global or abstract terms, indicators are specific, observable, and measurable statements that help define exactly what we mean or are looking for. For example, the CLPP model includes global statements such as “Children receive medical treatment” or “Families adopt in-home techniques.” The medical treatment indicator might specify the type of medical treatment, the duration, or perhaps the adherence to the regimen. Likewise, the family indicator might indicate the in-home techniques or the intensity or duration of their adoption. For example, “Families with EBLL children clean all window sills and floors with the designated cleaning solution each week” or “Families serve leafy green vegetables at three or more meals per week.” Outcome indicators such as these indicators provide clearer definitions of the global statement and help guide the selection of data collection methods and the content of data collection instruments.
The activities in your focus may also include global statements such as “good coalition,” “culturally competent training,” and “appropriate quality patient care.” These activities would benefit from elaboration into indicators, often called “process indicators.” What does “good” mean, what does “quality” or “appropriate” mean?
Keep the following tips in mind when selecting your indicators:
- Indicators can be developed for activities (process indicators) and/or for outcomes (outcome indicators).(45)
- There can be more than one indicator for each activity or outcome.
- The indicator must be focused and must measure an important dimension of the activity or outcome.
- The indicator must be clear and specific in terms of what it will measure.
- The change measured by the indicator should represent progress toward implementing the activity or achieving the outcome.
Consider CDC’s immunization program, for example. The table below lists the components of the logic model that were included in our focus in Step 3. Then each of these components has been defined in one or more indicators.
Table 4.1 - Provider Immunization Program:
Indicators for Program Component in Our Evaluation Focus
|Provider training||A series of 3 trainings will be conducted in all 4 regions of the state|
|Nurse educator LHD presentations||Nurse educators will make presentations to 10 largest local health departments (LHDs)|
|Physicians peer ed rounds||Physicians will host peer ed rounds at 10 largest hospitals|
|Providers attend trainings and rounds||Trainings will be well attended and reflect good mix of specialties and geographic representation|
|Providers receive and use tool kits||50%+ of providers who receive tool kit will report use of it (or “call to action” cards will be received from 25% of all providers receiving tool kit)|
|LHD nurses conduct private provider consults||Trained nurses in LHDs will conduct provider consults with largest provider practices in county|
|Provider KAB increases||Providers show increases in knowledge, attitudes, and beliefs (KAB) on selected key immunization items|
|Provider motivation increases||Provider intent to immunize increases|
You may need to develop your own indicators or you may be able to draw on existing indicators developed by others. Some large CDC programs have developed indicator inventories that are tied to major activities and outcomes for the program. Advantages of these indicator inventories:
- They may have been pre-tested for “relevance” and accuracy.
- They define the best data sources for collecting the indicator.
- There are often many potential indicators for each activity or outcome, ensuring that at least one will be appropriate for your program.
- Because many programs are using the same indicator(s), you can compare performance across programs or even construct a national summary of performance.
Selecting Data Collection Methods and Sources
Now that you have determined the activities and outcomes you want to measure and the indicators you will use to measure progress on them, you need to select data collection methods and sources from which to gather information on your indicators.
A key decision is whether there are existing data sources—secondary data collection—to measure your indicators or whether you need to collect new data—primary data collection.
Depending on your evaluation questions and indicators, some secondary data sources may be appropriate data collection sources. Some existing data sources that often come into play in measuring outcomes of public health programs:
- Current Population Survey and other U.S. Census files
- Behavioral Risk Factor Surveillance System (BRFSS)
- Youth Risk Behavior Survey (YRBS)
- Pregnancy Risk Assessment Monitoring System (PRAMS)
- Cancer registries
- State vital statistics
- Various surveillance databases
- National Health Interview Survey (NHIS)
Before using secondary data sources, ensure that they meet your needs. Although large ongoing surveillance systems have the advantages of collecting data routinely and having existing resources and infrastructure, some of them (e.g., Current Population Survey [CPS]) have little flexibility with regard to the questions asked in the survey, making it nearly impossible to use these systems to collect the special data you may need for your evaluation. By contrast, other surveys such as BRFSS or PRAMS are more flexible. For example, you might be able to add program-specific questions, or you might expand the sample size for certain geographic areas or target populations, allowing for more accurate estimates in smaller populations.
The most common primary data collection methods also fall into several broad categories. Among the most common are:
- Surveys, including personal interviews, telephone, or instruments completed in person or received through the mail or e-mail
- Group discussions/focus groups
- Document review, such as medical records, but also diaries, logs, minutes of meetings, etc.
Choosing the “right” method from the many secondary and primary data collection choices must consider both the context in which it is asked (How much money can be devoted to collection and measurement? How soon are results needed? Are there ethical considerations?) and the content of the question (Is it a sensitive issue? Is it about a behavior that is observable? Is it something the respondent is likely to know?).
Some methods yield qualitative data and some yield quantitative data. If the question involves an abstract concept or one where measurement is poor, using multiple methods is often helpful. Insights from stakeholder discussions in Step 1 and the clarity on purpose/user/use obtained in Step 3 will usually help direct the choice of sources and methods. For example, stakeholders may know which methods will work best with some intended respondents and/or have a strong bias toward quantitative or qualitative data collection that must be honored if the results are to be credible. More importantly, the purpose and use/user may dictate the need for valid, reliable data that will withstand close scrutiny or may allow for less rigorous data collection that can direct managers.
Each method comes with advantages and disadvantages depending on the context and content of the data collection (see Table 4.2).
Table 4.2 - Advantages and Disadvantages of Various Survey Methods
|Instruments to be completed by respondent||
The text box below lists possible sources of information for evaluations clustered in three broad categories: people, observations, and documents.
Some Sources of DataWho might you survey or interview?
- Clients, program participants, nonparticipants
- Staff, program managers, administrators
- Partner agency staff
- General public
- Community leaders or key members of a community
- Representatives of advocacy groups
- Elected officials, legislators, policymakers
- Local and state health officials
- Special events or activities
- On the job performance
- Service encounters
- Meeting minutes, administrative records
- Client medical records or other files
- Newsletters, press releases
- Strategic plans or work plans
- Registration, enrollment, or intake forms
- Previous evaluation reports
- Records held by funders or collaborators
- Web pages
- Graphs, maps, charts, photographs, videotapes
When choosing data collection methods and sources, select those that meet your project’s needs. Try to avoid choosing a data method/source that may be familiar or popular but does not necessarily answer your questions. Keep in mind that budget issues alone should not drive your evaluation planning efforts.
The four evaluation standards can help you reduce the enormous number of data collection options to a more manageable number that best meet your data collection situation. Here is a checklist of issues — based on the evaluation standards — that will help you choose appropriately:
- Purpose and use of data collection: Do you seek a “point in time” determination of a behavior, or to examine the range and variety or experiences, or to tell an in-depth story?
- Users of data collection: Will some methods make the data more credible with skeptics or key users than others?
- Resources available: Which methods can you afford?
- Time: How long until the results are needed?
- Frequency: How often do you need the data?
- Your background: Are you trained in the method, or will you need help from an outside consultant?
- Characteristics of the respondents: Will issues such as literacy or language make some methods preferable to others?
- Degree of intrusion to program/participants: Will the data collection method disrupt the program or be seen as intrusive by participants?
- Other ethical issues: Are there issues of confidentiality or safety of the respondent in seeking answers to questions on this issue?
- Nature of the issue: Is it about a behavior that is observable?
- Sensitivity of the issue: How open and honest will respondents be in responding to the questions on this issue?
- Respondent knowledge: Is it something the respondent is likely to know?
Using Multiple Methods and Mixed Methods
Sometimes a single method is not sufficient to accurately measure an activity or outcome because the thing being measured is complex and/or the data method/source does not yield data that are reliable or accurate enough. Employing multiple methods (sometimes called “triangulation”) helps increase the accuracy of the measurement and the certainty of your conclusions when the various methods yield similar results. Mixed data collection methods refers to gathering both quantitative and qualitative data. Mixed methods can be used sequentially, when one method is used to prepare for the use of another, or concurrently, when both methods are used in parallel. An example of sequential use of mixed methods is when focus groups (qualitative) are used to develop a survey instrument (quantitative), and then personal interviews (qualitative and quantitative) are conducted to investigate issues that arose during coding or interpretation of survey data. An example of concurrent use of mixed methods would be using focus groups or open-ended personal interviews to help affirm the response validity of a quantitative survey.
Different methods reveal different aspects of the program. Consider some interventions related to tobacco control:
- You might include a group assessment of a school-based tobacco control program to hear the group’s viewpoint, as well as individual student interviews to get a range of opinions.
- You might conduct a survey of all legislators in a state to gauge their interest in managed care support of cessation services and products, and you might also interview certain legislators individually to question them in greater detail.
- You might conduct a focus group with community leaders to assess their attitudes regarding tobacco industry support of cultural and community activities. You might follow the focus group with individual structured or semi-structured interviews with the same participants.
When the outcomes under investigation are very abstract or no one quality data source exists, combining methods maximizes the strengths and minimizes the limitations of each method. Using multiple or mixed methods can increase the cross-checks on different subsets of findings and generate increased stakeholder confidence in the overall findings.
Illustrations from Cases
Consider the provider immunization education and the childhood lead poisoning examples. Table 4.3 presents data collection methods/sources for each of the indicators presented earlier for the provider immunization education program. Table 4.4 shows both the indicators and the data sources for key components of the CLPP effort presented earlier. Note in both cases that the methods/sources can vary widely and that in some cases multiple methods will be used and synthesized.
Table 4.3 - Provider Immunization Education Program:
Data Collection Methods and Sources for Indicators
|Indicator(s)||Data Collection Methods/Sources|
|A series of 3 trainings will be conducted in all 4 regions of the state Training logs||Training logs|
|Nurse educators will make presentations to 10 largest local health departments (LHDs)||Training logs|
|Physicians will host peer ed rounds at 10 largest hospitals||Training logs|
|Trainings will be well-attended and reflect good mix of specialties and geographic representation||Registration information|
|50%+ of providers who receive tool kit will report use of it (or “call to action” cards will be received from 25% of all providers receiving tool kit)||Survey of providers Analysis/count of call-to-action cards|
|Trained nurses in LHDs will conduct provider consults with largest provider practices in county||Survey of nurses, survey of providers, or training logs|
|Providers show increases in knowledge, attitudes, and beliefs (KAB) on selected key immunization items||Survey of providers, or focus groups, or intercepts|
|Provider intent to immunize increases||Survey of providers, or focus groups, or intercepts|
Table 4.4 - CLPP: Indicators and Data Collection Methods/Sources
|Logic Model Element||Indicator(s)||Data Source(s) and Method(s)|
|Outreach||High-risk children and families in the district have been reached with relevant information||
Logs of direct mail and health fair contacts
Geographic Information System (GIS) algorithm
|Screening||High-risk children have completed initial and follow-up screening||Logs and lab data|
|Environment assessment||Environments of all children over EBLL threshold have been assessed for lead poisoning||Logs of environmental health staff|
|Case management||All children over EBLL threshold have a case management plan including social, medical, and environmental components||Case file of EBLL child|
|Family training||Families of all children over EBLL threshold have received training on household behaviors to reduce EBLL||
Logs of case managers
Survey of families
|"Leaded" houses referred||All houses of EBLL children with evidence of lead have been referred to housing authority||Logs and case files|
|"Leaded" houses referred||All referred houses have been cleaned up||
Follow-up assessment by environmental health staff
Logs of housing authority
Quality of Data
A quality evaluation produces data that are reliable, valid, and informative. An evaluation is reliable to the extent that it repeatedly produces the same results, and it is valid if it measures what it is intended to measure. The advantage of using existing data sources such as the BRFSS, YRBS, or PRAMS is that they have been pretested and designed to produce valid and reliable data. If you are designing your own evaluation tools, you should be aware of the factors that influence data quality:
- The design of the data collection instrument and how questions are worded
- The data collection procedures
- The selection of data sources
- How the data are coded
- Data management
- Routine error checking as part of data quality control
A key way to enhance quality of primary data collection is through a pretest. The pretest need not be elaborate but should be extensive enough to determine issues of logistics of data collection or intelligibility of instruments prior to rollout. Obtaining quality data involves trade-offs (i.e., breadth vs. depth). Thus, you and stakeholders must decide at the beginning of the evaluation process what level of quality is necessary to meet stakeholders’ standards for accuracy and credibility.
Quantity of Data
You will also need to determine the amount of data you want to collect during the evaluation. There are cases where you will need data of the highest validity and reliability, especially when traditional program evaluation is being supplemented with research studies. But there are other instances where the insights from a few cases or a convenience sample may be appropriate. If you use secondary data sources, many issues related to quality of data—such as sample size—have already been determined. If you are designing your own data collection tool and your examination of your program includes research as well as evaluation questions, the quantity of data you need to collect (i.e., sample sizes) will vary with the level of detail and the types of comparisons you hope to make. You will also need to determine the jurisdictional level for which you are gathering the data (e.g., state, county, region, congressional district). Counties often appreciate and want county-level estimates; however, this usually means larger sample sizes and more expense. Finally, consider the size of the change you are trying to detect. In general, detecting small amounts of change requires larger sample sizes. For example, detecting a 5% increase would require a larger sample size than detecting a 10% increase. You may need the help of a statistician to determine adequate sample size.
Logistics and Protocols
Logistics are the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations have cultural preferences that dictate acceptable ways of asking questions and collecting information, and influence who is perceived as an appropriate person to ask the questions (i.e., someone known within the community versus a stranger from a local health agency). The techniques used to gather evidence in an evaluation must be in keeping with a given community’s cultural norms. Data collection procedures should also protect confidentiality. In outlining procedures for collecting the evaluation data, consider these issues:
- When will you collect the data? You will need to determine when (and at what intervals) it is most appropriate to collect the information. If you are measuring whether your objectives have been met, your objectives will provide guidance as to when to collect certain data. If you are evaluating specific program interventions, you might want to obtain information from participants before they begin the program, upon completion of the program, and several months after the program. If you are assessing the effects of a community campaign, you might want to assess community knowledge, attitudes, and behaviors among your target audience before and after the campaign.
- Who will be considered a participant in the evaluation? Are you targeting a relatively specific group (African-American young people), or are you assessing trends among a more general population (all women of childbearing age)?
- Are you going to collect data from all participants or a sample? Some programs are community-based, and surveying a sample of the population participating in such programs is appropriate. However, if you have a small number of participants (such as students exposed to a curriculum in two schools), you may want to survey all participants.
- Who will collect the information? Are those collecting the data trained and trained consistently? Will the data collectors uniformly gather and record information? Your data collectors will need to be trained to ensure that they all collect information in the same way and without introducing bias. Preferably, interviewers should be trained together and by the same person.
- How will the security and confidentiality of the information be maintained? It is important to ensure the privacy and confidentiality of the evaluation participants. You can do this by collecting information anonymously and making sure you keep data stored in a locked and secure place.
- If your examination of your program includes research as well as evaluation studies: Do you need approval from an institutional review board (IRB) before collecting the data? What will be your informed consent procedures?
You may already have answered some of these questions while selecting your data sources and methods.
Agreements: Affirming Roles and Responsibilities
Agreements summarize the evaluation procedures, clarify everyone’s role and responsibilities, and describe how the evaluation procedures will be implemented. Elements of an agreement include statements concerning the intended users, uses, purpose, questions, design, and methods, as well as a summary of the deliverables, timeline, and budget. An agreement might be a legal contract, a memorandum of understanding, or a detailed protocol. Creating an agreement establishes a mutual understanding of the activities associated with the evaluation. It also provides a basis for modification if necessary.
Standards for Step 4: Gather Credible Evidence
Checklist for Gathering Credible Evidence
__ Identify indicators for activities and outcomes in the evaluation focus.
__ Determine whether existing indicators will suffice or whether new ones must be developed.
__ Consider the range of data collection methods and choose those best suited to your context and content.
__ Pilot test new instruments to identify and/or control sources of error.
__ Consider a mixed-method approach to data collection.
__ Consider quality and quantity issues in data collection.
__ Consider the range of data sources and choose the most appropriate one.
__ Develop a detailed protocol for data collection.
Worksheet 4A - Evaluation Questions, Indicators, and Data Collection Methods/Sources
|Logic Model Components in Evaluation Focus||Indicator(s) or Evaluation Questions||Data Method(s) / Source(s)|
Worksheet 4B - Data Collection Logistics
|Data Collection||From whom will these data be collected||By whom will these data be collected and when||Security or confidentiality steps|
Gather Credible Evidence
Evaluating Appropriate Antibiotic Use Programs
The stakeholder discussions in Step 1 and the program description in Step 2 led to the selection of an evaluation focus in Step 3. At this point, you have a set of program components – activities and outcomes – that will be used in the evaluation. Next, you will need to develop tangible indicators (evaluation measures) for these components and identify data sources for each of the measures. The following table lists examples of indicators for selected appropriate antibiotic use activities and outcomes, as well as some associated data sources (Table 4.5).
Table 4.5: Appropriate Antibiotic Use Programs: Indicators and Data
|Formation of state or local coalition to develop and implement appropriate antibiotic use efforts||
||Sign-in sheets and meeting minutes|
|Implementation of media campaign||Number of impressions for print, television, radio, and outdoor media ads||Media tracking reports|
|Development of health education materials||Number and type of materials||Program logs|
|Increased public knowledge and awareness of appropriate antibiotic use messages||
|Increased knowledge and awareness among providers of appropriate antibiotic use messages||
|Improved skills among providers to communicate appropriate antibiotic use messages to consumers||
Patient satisfaction surveys
|Increased social norms favoring appropriate antibiotic prescribing||Percentage of providers who believe that their peers follow prescribing guidelines||Provider surveys|
|Increased adherence to appropriate antibiotic use guidelines||Percentage of providers who indicate that they follow appropriate antibiotic use guidelines (e.g., providers use rapid antigen test or throat culture to diagnose streptococcal pharyngitis)||
|Decreased patient demand for antibiotics||
|Increased adherence to prescribed antibiotics among consumers||
|Incorporation of prescribing guidelines by provider practices or organizations||Number of provider practices or organizations that adopt appropriate prescribing guidelines as policy||Surveys or interviews with practices or organizations|
|Changes in childcare or workplace policies supportive of appropriate antibiotic use||Number of childcare centers or work sites that do not require use of antibiotics before returning after an illness||Surveys or interviews with childcare centers or work site staffs|
|Decreased inappropriate antibiotic use||
Health plan data
Health Plan Employer Data and Information Set (HEDIS®) performance measures
Secondary Data Sources
In some cases, data to evaluate the effectiveness of appropriate antibiotic use programs can be found in existing data sources. Three key secondary data sources are described below.
- Health plan data – Health plans can be an excellent source of population-based data on antibiotic prescribing and utilization. When data are combined from several health plans, it is possible to obtain a good representation of the entire population. In addition, for patients with pharmacy benefits, pharmacy dispensing can be captured and linked to visit data. However, there are several limitations of working with health plan data. Missing claims and misclassification of diagnoses are common. In addition, health plan data usually do not cover drugs not paid for by the plan (e.g., samples dispensed in the office or drugs paid for out-of-pocket). Furthermore, the Health Insurance Portability and Accountability Act of 1996 (HIPAA), which protects the confidentiality of individually identifiable health information, may limit the ability of health plans to share these data unless all personal identifiers can be removed. While there may be significant limitations to using health plan data, this data remains one of the most precise and useful sources of information on antibiotic prescribing. Coalitions that include health plans can not only explore the use of health plan data for evaluation, but they can also use this data as part of their interventions (e.g., providing prescribing feedback to providers or to support organizational changes).
- Pharmacy data – Several companies collect and process data from pharmaceutical records of a number of sources, including drug manufacturers, wholesalers, retailers, pharmacies, mail order, long-term care facilities, and hospitals. Both antibiotic prescribing data and antibiotic retail sales data can be purchased, and these data can be used to evaluate the impact of a program on antibiotic prescribing. Some systems allow for data to be broken down to the level of the individual provider, and this information can be shared with providers as part of an intervention to promote more appropriate prescribing. These data are primarily used by pharmaceutical companies, and costs may be prohibitive for appropriate antibiotic use programs.
- Medicaid data – Medicaid claims data have been used by some programs to assess changes in prescribing. These data are freely available and contain information on prescribing to Medicaid recipients. However, the same caveats apply as described above for health plan data regarding HIPAA regulations, difficulties in interpreting administrative data, and completeness of reporting. In addition, in some states, the privatization of Medicaid has made these data no longer centrally available.
Data Collection Tools
In many cases, programs will not be able to obtain the necessary data from secondary data sources and will need to collect their own data for evaluation. Rather than developing entirely new data collection tools, programs can often use or adapt parts of existing tools. Many state and local programs have developed surveys to assess the knowledge, attitudes, and behaviors of both consumers and providers related to antibiotic use and prescribing. CDC has collected a number of these evaluation tools and has facilitated discussions of the strengths and limitations of tools and specific questions. Check the CDC Get Smart website (http://www.cdc.gov/getsmart) for a list of campaign partners and their current activities and evaluation plans. You can contact local program coordinators directly or request assistance through CDC.
In addition, questions on appropriate antibiotic use have been included in the population-based surveys described below. Programs may be able to access state or local data from these surveys. Programs can also model questions after these when designing their own questionnaires.
- Behavioral Risk Factor Surveillance System (BRFSS) – The BRFSS is a telephone survey conducted by the health departments of all states, the District of Columbia, Puerto Rico, the Virgin Islands, and Guam with assistance from CDC. The BRFSS is the primary source of information for states and the nation on the health-related behaviors of adults and includes questions related to behaviors associated with preventable chronic diseases, injuries, and infectious diseases. States can add questions specific to their needs, and in recent years, some states have added questions on appropriate antibiotic use. See http://www.cdc.gov/brfss/index.htm for more information.
- FoodNet Population Survey – The Foodborne Diseases Active Surveillance Network (FoodNet) is the principal foodborne disease component of CDC's Emerging Infections Program (EIP). FoodNet conducts population-based telephone surveys to estimate the burden of acute diarrheal illness in the United States and the frequency of important exposures. The 2002-2003 FoodNet Population Survey included several questions to assess knowledge, attitudes, and behaviors surrounding appropriate antibiotic use. EIP sites may be able to use these data to document the need for their programs or to assess changes over time in knowledge, attitudes, and behaviors. Other states can model questions after these for local use and may be able to compare local results with those from FoodNet sites. See http://www.cdc.gov/foodnet/studies_pages/pop.htm for more information.
- Note that if you are developing your evaluation after completing an evaluation plan, you may already have developed process or outcome objectives. If the objectives were written to be specific, measurable, action-oriented, realistic, and time-bound (so-called "SMART" objectives), then they may serve as indicators as well.
Images and logos on this website which are trademarked/copyrighted or used with permission of the trademark/copyright or logo holder are not in the public domain. These images and logos have been licensed for or used with permission in the materials provided on this website. The materials in the form presented on this website may be used without seeking further permission. Any other use of trademarked/copyrighted images or logos requires permission from the trademark/copyright holder...more
This graphic notice means that you are leaving an HHS Web site. For more information, please see the Exit Notification and Disclaimer policy.