  What Is Program Evaluation?
  Evaluation Supplements Other Types of Reflection and Data Collection
  Distinguishing Principles of Research and Evaluation
  Why Evaluate Public Health Programs?
  CDC's Framework for Program Evaluation in Public Health
  How to Establish an Evaluation Team and Select a Lead Evaluator
  Organization of This Manual

Most program managers assess the value and impact of their work all the time when they ask questions, consult partners, make assessments, and obtain feedback. They then use the information collected to improve the program. Indeed, such informal assessments fit nicely into a broad definition of evaluation as the “ examination of the worth, merit, or significance of an object. ” [4] And throughout this manual, the term “program” will be defined as “ any set of organized activities supported by a set of resources to achieve a specific and intended result. ” This definition is intentionally broad so that almost any organized public health action can be seen as a candidate for program evaluation:

  • Direct service interventions (e.g., a program that offers free breakfasts to improve nutrition for grade school children)
  • Community mobilization efforts (e.g., an effort to organize a boycott of California grapes to improve the economic well-being of farm workers)
  • Research initiatives (e.g., an effort to find out whether disparities in health outcomes based on race can be reduced)
  • Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
  • Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)

What distinguishes program evaluation from ongoing informal assessment is that program evaluation is conducted according to a set of guidelines. With that in mind, this manual defines program evaluation as “the systematic collection of information about the activities, characteristics, and outcomes of programs to make judgments about the program, improve program effectiveness, and/or inform decisions about future program development.” [5] Program evaluation does not occur in a vacuum; rather, it is influenced by real-world constraints. Evaluation should be practical and feasible and conducted within the confines of resources, time, and political context. Moreover, it should serve a useful purpose, be conducted in an ethical manner, and produce accurate findings. Evaluation findings should be used both to make decisions about program implementation and to improve program effectiveness.

Many different questions can be part of a program evaluation, depending on how long the program has been in existence, who is asking the question, and why the information is needed.

In general, evaluation questions fall into these groups:

  • Implementation: Were your program’s activities put into place as originally intended?
  • Effectiveness: Is your program achieving the goals and objectives it was intended to accomplish?
  • Efficiency: Are your program’s activities being produced with appropriate use of resources such as budget and staff time?
  • Cost-Effectiveness: Does the value or benefit of achieving your program’s goals and objectives exceed the cost of producing them?
  • Attribution: Can progress on goals and objectives be shown to be related to your program, as opposed to other things that are going on at the same time?

All of these are appropriate evaluation questions and might be asked with the intention of documenting program progress, demonstrating accountability to funders and policymakers, or identifying ways to make the program better.

Planning asks, “What are we doing and what should we do to achieve our goals?” By providing information on progress toward organizational goals and identifying which parts of the program are working well and/or poorly, program evaluation sets up the discussion of what can be changed to help the program better meet its intended goals and objectives.

Increasingly, public health programs are accountable to funders, legislators, and the general public. Many programs do this by creating, monitoring, and reporting results for a small set of markers and milestones of program progress. Such “performance measures” are a type of evaluation—answering the question “How are we doing?” More importantly, when performance measures show significant or sudden changes in program performance, program evaluation efforts can be directed to the troubled areas to determine “Why are we doing poorly or well?”

Linking program performance to program budget is the final step in accountability. Called “activity-based budgeting” or “performance budgeting,” it requires an understanding of program components and the links between activities and intended outcomes. The early steps in the program evaluation approach (such as logic modeling) clarify these relationships, making the link between budget and performance easier and more apparent.

While the terms surveillance and evaluation are often used interchangeably, each makes a distinctive contribution to a program, and it is important to clarify their different purposes. Surveillance is the continuous monitoring or routine data collection on various factors (e.g., behaviors, attitudes, deaths) over a regular interval of time. Surveillance systems have existing resources and infrastructure. Data gathered by surveillance systems are invaluable for performance measurement and program evaluation, especially of longer term and population-based outcomes. In addition, these data serve an important function in program planning and “formative” evaluation by identifying key burden and risk factors—the descriptive and analytic epidemiology of the public health problem. There are limits, however, to how useful surveillance data can be for evaluators. For example, some surveillance systems such as the Behavioral Risk Factor Surveillance System (BRFSS), Youth Tobacco Survey (YTS), and Youth Risk Behavior Survey (YRBS) can measure changes in large populations, but have insufficient sample sizes to detect changes in outcomes for more targeted programs or interventions. Also, these surveillance systems may have limited flexibility to add questions for a particular program evaluation.

In the best of all worlds, surveillance and evaluation are companion processes that can be conducted simultaneously. Evaluation may supplement surveillance data by providing tailored information to answer specific questions about a program. Data from specific questions for an evaluation are more flexible than surveillance and may allow program areas to be assessed in greater depth. For example, a state may supplement surveillance information with detailed surveys to evaluate how well a program was implemented and the impact of a program on participants’ knowledge, attitudes, and behavior. Evaluators can also use qualitative methods (e.g., focus groups, semi-structured or open-ended interviews) to gain insight into the strengths and weaknesses of a particular program activity.

Both research and program evaluation make important contributions to the body of knowledge, but fundamental differences in the purpose of research and the purpose of evaluation mean that good program evaluation need not always follow an academic research model. Even though some of these differences have tended to break down as research tends toward increasingly participatory models [6]  and some evaluations aspire to make statements about attribution, “pure” research and evaluation serve somewhat different purposes (See “Distinguishing Principles of Research and Evaluation” table, page 4), nicely summarized in the adage “Research seeks to prove; evaluation seeks to improve.” Academic research focuses primarily on testing hypotheses; a key purpose of program evaluation is to improve practice. Research is generally thought of as requiring a controlled environment or control groups. In field settings directed at prevention and control of a public health problem, this is seldom realistic. Of the ten concepts contrasted in the table, the last three are especially worth noting. Unlike pure academic research models, program evaluation acknowledges and incorporates differences in values and perspectives from the start, may address many questions besides attribution, and tends to produce results for varied audiences.

Research Principles

Program Evaluation Principles

Scientific method

  • State hypothesis.
  • Collect data.
  • Analyze data.
  • Draw conclusions.

Framework for program evaluation

  • Engage stakeholders.
  • Describe the program.
  • Focus the evaluation design.
  • Gather credible evidence.
  • Justify conclusions.
  • Ensure use and share lessons learned.

Decision Making


  • Authoritative.


  • Collaborative.
  • Internal (accuracy, precision).
  • External (generalizability).

Repeatability program evaluation standards

  • Feasibility.
  • Descriptions.
  • Associations.
  • Merit (i.e., quality).
  • Worth (i.e., value).
  • Significance (i.e., importance).

Isolate changes and control circumstances

  • Narrow experimental influences.
  • Ensure stability over time.
  • Minimize context dependence.
  • Treat contextual factors as confounding (e.g., randomization, adjustment, statistical control).
  • Understand that comparison groups are a necessity.

Incorporate changes and account for circumstances

  • Expand to see all domains of influence.
  • Encourage flexibility and improvement.
  • Maximize context sensitivity.
  • Treat contextual factors as essential information (e.g., system diagrams, logic models, hierarchical or ecological modeling).
  • Understand that comparison groups are optional (and sometimes harmful).

Data Collection

  • Limited number (accuracy preferred).
  • Sampling strategies are critical.
  • Concern for protecting human subjects.


  • Quantitative.
  • Qualitative.
  • Multiple (triangulation preferred).
  • Concern for protecting human subjects, organizations, and communities.
  • Mixed methods (qualitative, quantitative, and integrated).

Analysis & Synthesis

  • One-time (at the end).
  • Focus on specific variables.
  • Ongoing (formative and summative).
  • Integrate all data.
  • Attempt to remain value-free.
  • Examine agreement on values.
  • State precisely whose values are used.



  • Establish time sequence.
  • Demonstrate plausible mechanisms.
  • Control for confounding.
  • Replicate findings.

Attribution and contribution

  • Account for alternative explanations.
  • Show similar effects in similar contexts.

Disseminate to interested audiences

  • Content and format varies to maximize comprehension.

Feedback to stakeholders

  • Focus on intended uses by intended users.
  • Build capacity.
  • Emphasis on full disclosure.
  • Requirement for balanced assessment.
  • To monitor progress toward the program’s goals
  • To determine whether program components are producing the desired progress on outcomes
  • To permit comparisons among groups, particularly among populations with disproportionately high risk factors and adverse health outcomes
  • To justify the need for further funding and support
  • To find opportunities for continuous quality improvement.
  • To ensure that effective programs are maintained and resources are not wasted on ineffective programs

Program staff may be pushed to do evaluation by external mandates from funders, authorizers, or others, or they may be pulled to do evaluation by an internal need to determine how the program is performing and what can be improved. While push or pull can motivate a program to conduct good evaluations, program evaluation efforts are more likely to be sustained when staff see the results as useful information that can help them do their jobs better.

Data gathered during evaluation enable managers and staff to create the best possible programs, to learn from mistakes, to make modifications as needed, to monitor progress toward program goals, and to judge the success of the program in achieving its short-term, intermediate, and long-term outcomes. Most public health programs aim to change behavior in one or more target groups and to create an environment that reinforces sustained adoption of these changes, with the intention that changes in environments and behaviors will prevent and control diseases and injuries. Through evaluation, you can track these changes and, with careful evaluation designs, assess the effectiveness and impact of a particular program, intervention, or strategy in producing these changes.

Recognizing the importance of evaluation in public health practice and the need for appropriate methods, the World Health Organization (WHO) established the Working Group on Health Promotion Evaluation. The Working Group prepared a set of conclusions and related recommendations to guide policymakers and practitioners. [7] Recommendations immediately relevant to the evaluation of comprehensive public health programs include:

  • Encourage the adoption of participatory evaluation approaches that provide meaningful opportunities for involvement by all of those with a direct interest in initiatives (programs, policies, and other organized activities).
  • Require that a portion of total financial resources for a health promotion initiative be allocated to evaluation—they recommend 10%.
  • Ensure that a mixture of process and outcome information is used to evaluate all health promotion initiatives.
  • Support the use of multiple methods to evaluate health promotion initiatives.
  • Support further research into the development of appropriate approaches to evaluating health promotion initiatives.
  • Support the establishment of a training and education infrastructure to develop expertise in the evaluation of health promotion initiatives.
  • Create and support opportunities for sharing information on evaluation methods used in health promotion through conferences, workshops, networks, and other means.

The figure presents the steps and standards of the CDC Evaluation Framework.  The 6 steps are (1) engage stakeholders, (2) describe the program (3) focus the evaluation and its design, (4) gather credible evidence, (5) justify conclusions, and (6)ensure use and share lessons learned.

Program evaluation is one of ten essential public health services [8] and a critical organizational practice in public health. [9] Until recently, however, there has been little agreement among public health officials on the principles and procedures for conducting such studies. In 1999, CDC published Framework for Program Evaluation in Public Health and some related recommendations. [10] The Framework, as depicted in Figure 1.1, defined six steps and four sets of standards for conducting good evaluations of public health programs.

The underlying logic of the Evaluation Framework is that good evaluation does not merely gather accurate evidence and draw valid conclusions, but produces results that are used to make a difference. To maximize the chances evaluation results will be used, you need to create a “market” before you create the “product”—the evaluation. You determine the market by focusing evaluations on questions that are most salient, relevant, and important. You ensure the best evaluation focus by understanding where the questions fit into the full landscape of your program description, and especially by ensuring that you have identified and engaged stakeholders who care about these questions and want to take action on the results.

The steps in the CDC Framework are informed by a set of standards for evaluation. [11] These standards do not constitute a way to do evaluation; rather, they serve to guide your choice from among the many options available at each step in the Framework. The 30 standards cluster into four groups:

Utility: Who needs the evaluation results? Will the evaluation provide relevant information in a timely manner for them?

Feasibility: Are the planned evaluation activities realistic given the time, resources, and expertise at hand?

Propriety: Does the evaluation protect the rights of individuals and protect the welfare of those involved? Does it engage those most directly affected by the program and changes in the program, such as participants or the surrounding community?

Accuracy: Will the evaluation produce findings that are valid and reliable, given the needs of those who will use the results?

Sometimes the standards broaden your exploration of choices. Often, they help reduce the options at each step to a manageable number. For example, in the step “Engaging Stakeholders,” the standards can help you think broadly about who constitutes a stakeholder for your program, but simultaneously can reduce the potential list to a manageable number by posing the following questions: ( Utility ) Who will use these results? ( Feasibility ) How much time and effort can be devoted to stakeholder engagement? ( Propriety ) To be ethical, which stakeholders need to be consulted, those served by the program or the community in which it operates? ( Accuracy ) How broadly do you need to engage stakeholders to paint an accurate picture of this program?

Similarly, there are unlimited ways to gather credible evidence (Step 4). Asking these same kinds of questions as you approach evidence gathering will help identify ones what will be most useful, feasible, proper, and accurate for this evaluation at this time. Thus, the CDC Framework approach supports the fundamental insight that there is no such thing as the right program evaluation. Rather, over the life of a program, any number of evaluations may be appropriate, depending on the situation.

  • Experience in the type of evaluation needed
  • Comfortable with quantitative data sources and analysis
  • Able to work with a wide variety of stakeholders, including representatives of target populations
  • Can develop innovative approaches to evaluation while considering the realities affecting a program (e.g., a small budget)
  • Incorporates evaluation into all program activities
  • Understands both the potential benefits and risks of evaluation
  • Educates program personnel in designing and conducting the evaluation
  • Will give staff the full findings (i.e., will not gloss over or fail to report certain findings)

Good evaluation requires a combination of skills that are rarely found in one person. The preferred approach is to choose an evaluation team that includes internal program staff, external stakeholders, and possibly consultants or contractors with evaluation expertise.

An initial step in the formation of a team is to decide who will be responsible for planning and implementing evaluation activities. One program staff person should be selected as the lead evaluator to coordinate program efforts. This person should be responsible for evaluation activities, including planning and budgeting for evaluation, developing program objectives, addressing data collection needs, reporting findings, and working with consultants. The lead evaluator is ultimately responsible for engaging stakeholders, consultants, and other collaborators who bring the skills and interests needed to plan and conduct the evaluation.

Although this staff person should have the skills necessary to competently coordinate evaluation activities, he or she can choose to look elsewhere for technical expertise to design and implement specific tasks. However, developing in-house evaluation expertise and capacity is a beneficial goal for most public health organizations. Of the characteristics of a good evaluator listed in the text box below, the evaluator’s ability to work with a diverse group of stakeholders warrants highlighting. The lead evaluator should be willing and able to draw out and reconcile differences in values and standards among stakeholders and to work with knowledgeable stakeholder representatives in designing and conducting the evaluation.

Seek additional evaluation expertise in programs within the health department, through external partners (e.g., universities, organizations, companies), from peer programs in other states and localities, and through technical assistance offered by CDC. [12]

You can also use outside consultants as volunteers, advisory panel members, or contractors. External consultants can provide high levels of evaluation expertise from an objective point of view. Important factors to consider when selecting consultants are their level of professional training, experience, and ability to meet your needs. Overall, it is important to find a consultant whose approach to evaluation, background, and training best fit your program’s evaluation needs and goals. Be sure to check all references carefully before you enter into a contract with any consultant.

To generate discussion around evaluation planning and implementation, several states have formed evaluation advisory panels. Advisory panels typically generate input from local, regional, or national experts otherwise difficult to access. Such an advisory panel will lend credibility to your efforts and prove useful in cultivating widespread support for evaluation activities.

Evaluation team members should clearly define their respective roles. Informal consensus may be enough; others prefer a written agreement that describes who will conduct the evaluation and assigns specific roles and responsibilities to individual team members. Either way, the team must clarify and reach consensus on the:

  • Purpose of the evaluation
  • Potential users of the evaluation findings and plans for dissemination
  • Evaluation approach
  • Resources available
  • Protection for human subjects.

The agreement should also include a timeline and a budget for the evaluation.

This manual is organized by the six steps of the CDC Framework. Each chapter will introduce the key questions to be answered in that step, approaches to answering those questions, and how the four evaluation standards might influence your approach. The main points are illustrated with one or more public health examples that are composites inspired by actual work being done by CDC and states and localities. [13] Some examples that will be referred to throughout this manual:

The program aims to provide affordable home ownership to low-income families by identifying and linking funders/sponsors, construction volunteers, and eligible families. Together, they build a house over a multi-week period. At the end of the construction period, the home is sold to the family using a no-interest loan.

Lead poisoning is the most widespread environmental hazard facing young children, especially in older inner-city areas. Even at low levels, elevated blood lead levels (EBLL) have been associated with reduced intelligence, medical problems, and developmental problems. The main sources of lead poisoning in children are paint and dust in older homes with lead-based paint. Public health programs address the problem through a combination of primary and secondary prevention efforts. A typical secondary prevention program at the local level does outreach and screening of high-risk children, identifying those with EBLL, assessing their environments for sources of lead, and case managing both their medical treatment and environmental corrections. However, these programs must rely on others to accomplish the actual medical treatment and the reduction of lead in the home environment.

A common initiative of state immunization programs is comprehensive provider education programs to train and motivate private providers to provide more immunizations. A typical program includes a newsletter distributed three times per year to update private providers on new developments and changes in policy, and provide a brief education on various immunization topics; immunization trainings held around the state conducted by teams of state program staff and physician educators on general immunization topics and the immunization registry; a Provider Tool Kit on how to increase immunization rates in their practice; training of nursing staff in local health departments who then conduct immunization presentations in individual private provider clinics; and presentations on immunization topics by physician peer educators at physician grand rounds and state conferences.

Each chapter also provides checklists and worksheets to help you apply the teaching points.

[4] Scriven M. Minimalist theory of evaluation: The least theory that practice requires. American Journal of Evaluation 1998;19:57-70.

[5] Patton MQ. Utilization-focused evaluation: The new century text. 3rd ed. Thousand Oaks, CA: Sage, 1997.

[6] Green LW, George MA, Daniel M, Frankish CJ, Herbert CP, Bowie WR, et al. Study of participatory research in health promotion: Review and recommendations for the development of participatory research in health promotion in Canada . Ottawa, Canada : Royal Society of Canada , 1995.

[7] WHO European Working Group on Health Promotion Evaluation. Health promotion evaluation: Recommendations to policy-makers: Report of the WHO European working group on health promotion evaluation. Copenhagen, Denmark : World Health Organization, Regional Office for Europe, 1998.

[8] Public Health Functions Steering Committee. Public health in America . Fall 1994. Available at <http://www.health.gov/phfunctions/public.htm>. January 1, 2000.

[9] Dyal WW. Ten organizational practices of public health: A historical perspective. American Journal of Preventive Medicine 1995;11(6)Suppl 2:6-8.

[10] Centers for Disease Control and Prevention. op cit.

[11] Joint Committee on Standards for Educational Evaluation. The program evaluation standards: How to assess evaluations of educational programs. 2nd ed. Thousand Oaks, CA: Sage Publications, 1994.

[12] CDC’s Prevention Research Centers (PRC) program is an additional resource. The PRC program is a national network of 24 academic research centers committed to prevention research and the ability to translate that research into programs and policies. The centers work with state health departments and members of their communities to develop and evaluate state and local interventions that address the leading causes of death and disability in the nation. Additional information on the PRCs is available at www.cdc.gov/prc/index.htm.

[13] These cases are composites of multiple CDC and state and local efforts that have been simplified and modified to better illustrate teaching points. While inspired by real CDC and community programs, they are not intended to reflect the current

Program assessment.

Program evaluation looks at the parameters, needs, components, and outcomes of program design with an eye towards improving student learning. It involves a complex approach, taking into consideration needs assessment, curriculum mapping, and various models of program review.

Needs Assessment

Curriculum mapping, program review, kirkpatrick model, data collection, you may be interested in.

program evaluation in education example

Program evaluation writing.

Below you will find samples of program evaluations that can help you in your own writing of these documents. There are also useful resources provided to expand your knowledge and understanding of this process. 

  • Program Evaluation Template from the CDC This link opens in a new window This template provides a backdrop for writing an effective and detailed program evaluation.
  • IES Evaluation Plan Template This link opens in a new window Put out by the Institute of Education Sciences, The Evaluation Plan Template identifies the key components of an evaluation plan and provides guidance about the information typically included in each section of a plan for evaluating both the effectiveness and implementation of an intervention. Evaluators can use this tool to help develop their plan for a rigorous evaluation, with a focus on meeting What Works ClearinghouseTM evidence standards. The template can be used in combination with the Contrast Tool, a tool for documenting each impact that the evaluation will estimate to test program effectiveness.
  • University Program Review Template [DOC] This link opens in a new window

jump-start your school's program evaluation: part 1.

Do the words "program evaluation" strike fear in the hearts of your school's staff? Many schools have so many programs, strategies and practices underway that they are overwhelmed by the prospect of assessing their effectiveness. Yet evaluating impact is the best way to avoid wasting time and money.

To support schools in this task, EducationWorld is pleased to present professional development resources including article content and a planning worksheet shared by The Governor's Prevention Partnership , a Connecticut-based nonprofit organization.

Want to dig deeper? See Part 2 of this article for The Partnership's training exercise, which guides staff in shaping a school's comprehensive program evaluation plan.

A nonprofit with a mission to keep Connecticut’s youth safe, successful and drug-free, helps schools, communities, colleges and businesses create and sustain quality prevention programs. The organization also provides resources for .


Gone are the days when “feeling good” about your efforts is sufficient justification for continuing expensive and labor-intensive activities.

Following are five steps to help you ease into the process of program evaluation. Ideally, your school should have a plan for evaluating not only large, complex programs, but also smaller and simpler strategies and practices.

1. Define Terms

The first step is defining terms. In this article, "effort" is the generic term used to refer to any program, strategy or practice. Efforts are broken down into three types, as follows.

Programs are defined as high-effort undertakings based on a set of packaged resources that outline a series of prescribed activities. One example is the Olweus Bullying Prevention Program. Quality programs are evidence-based, meaning that prior research has indicated their effectiveness. In addition, programs often include evaluation tools such as surveys.

A strategy is less prescribed than a program and refers to any well-planned activity that aims to accomplish a goal or solve a problem. Differentiated instruction is an example of such a strategy. Strategies should also be evidence-based, although typically there are a looser set of best practices, rather than prescribed activities, guiding day-to-day efforts.

program evaluation in education example

This article will use "program evaluation" as a generic term referring to evaluation of any of the above types of efforts.

2. Get the Lay of the Land

Gather your school Data Team--the group that is responsible for managing data collection and analysis. You may call this a Response to Intervention (RTI) Team, Positive Behavioral Interventions and Supports (PBIS) Team, Student Intervention Team, or even School Climate Team. What's important is that team members have the necessary expertise to address both academic and social-emotional programs, practices and strategies in your school. Members might include the principal; special education director; literacy coach; one or more classroom teachers (perhaps one representing each grade level); school counselor; school psychologist or social worker; speech pathologist, occupational therapist or other specialists; point person for bullying (if your state requires such a person); data analyst or database manager. Some teams even include parents and students.

Many resources are available to help guide the work of the Data Team. Examples include:

Top Five Tips for Effective Data Teams Best Practices for Response to Intervention (RTI) Teams RTI Data team Process Student Intervention Team Manual

The next step is having the team turn a critical eye toward all programs, practices and strategies in your school that are aimed at addressing barriers to learning --in other words, efforts that attempt to enhance, improve, supplement, remediate or prevent something. Ultimately you will want to catalog these many efforts, making a concrete list. Schools generally do a better job tracking academic programs, practices and strategies than they do tracking social-emotional or school-climate efforts. It is important, however, to include social-emotional efforts in your list. Likewise, you should include school-wide efforts as well as those aimed at individual struggling students. Some questions to consider when reviewing existing programs, practices and strategies:

  • When did each effort start?
  • What level of time, manpower and dollars does each require?
  • What is the purpose/goal of each?
  • Where do efforts fall in terms of the three-tiered classification system ( universal, secondary and tertiary ) used in the Response to Intervention (RTI) and Positive Behavioral Interventions and Supports (PBIS) frameworks?
  • Are there combinations of larger and smaller efforts that can be grouped because they share a common goal?

3. Do a Little Pruning

Armed with an exhaustive list of every single effort, small or large, in your school, you'll be able to identify fragmented and poorly coordinated efforts and then eliminate efforts that sap too many resources, or that are not aligned with any concrete goal. Remember that if you consider a program, strategy or practice not important enough to spend time evaluating, the effort may not be worth continuing to implement. Cross items off the list as you go, eliminating time-wasters, duplicative activities and efforts that cannot be evaluated.

NOTE: In addition to academic support, key dimensions of school climate identified by the National School Climate Center include (1) Physical and Emotional Safety, (2) Teaching/Learning of Social, Emotional and Civic Skills, (3) Strengthening Interpersonal Relationships and (4) Physical Surroundings and School Connectedness. If you find that any of your school's efforts do not relate either to academic support or one of the four school climate dimensions, it may indicate that these efforts are not worth continuing. The dimensions are included for reference in The Governor's Prevention Partnership's Evaluation Planning Worksheet , which you'll use in the next step.

Once you've done some "pruning" of your list, you'll be ready to evaluate the remaining efforts. The pruning process may also have uncovered some gaps that you intend to fill with future programs, practices and strategies.

4. Plan to Evaluate Everything on the List

In order to paint the most complete picture, a comprehensive school evaluation plan should cover every effort on your list (hopefully you've shortened this list with some thoughtful pruning). The Governor's Prevention Partnership's Evaluation Planning Worksheet can help you get the process started. The worksheet walks you through identifying the purpose of each effort and choosing short-term and long-term methods of measuring and tracking its impact on students. Consider asking each member of your Data Team to complete the worksheet for a different type or group of programs, strategies and practices.

The sophistication level of evaluation for each effort should reflect the sophistication level of the program, practice or strategy. Here is an example: Evaluating a formal effort such as the Olweus Bullying Prevention Program will involve administering long annual school-wide surveys to multiple informant groups and then analyzing the resulting data report. Another example: Evaluating a more informal practice of holding student advisory periods (to promote positive school climate and student social-emotional development) might require only a five-question student survey or quick student feedback session twice a year.

Efforts that you have grouped together in step 2 might share a common method of measurement. For instance, you might use the same school-wide survey to assess both the impact of staff bullying prevention training and the impact of a new set of school-wide rules that you implemented at about the same time.

Remember: It's important to take a baseline measure of a problem before implementing new efforts to address that problem. If this is not possible because an effort has been in place for some time without being evaluated, put the measurement instrument or mechanism in place as soon as possible and begin to plan ahead for the next data collection, which will provide a point of comparison.

It's a good idea to consult an evaluation expert at the point that you're ready to choose your methods of measurement. An expert can help you choose good measurement tools (e.g., not all surveys are created equal, and there is some skill involved in conducting a focus group). He or she also can plan optimal data collection times and determine how data from different time points will be compared (e.g., will statistically significant differences be the standard for judging change?).

5. Complete Your Evaluation Plan

Keep in mind the best practices below as you use the Evaluation Planning Worksheet to flesh out your evaluation plan. You likely will not apply every best practice to evaluation of every effort, but taken as a whole, your plan should do the following:

  • Schedule the evaluation in advance and measure at the right time to “catch” good effects. If possible, conduct one or more “baseline” (pretest) measurements as well as several posttest measurements spaced over time.  
  • Be feasible (someone has the time to collect necessary data on an ongoing basis; school data team is in place to help plan and analyze).  
  • Measure at multiple levels (individual, small-group, class, grade and population/school levels). Measure at the group level to evaluate efforts that reach smaller groups; measure at the population level to evaluate efforts that reach the entire school. (Measurements at the population level tend to be done less frequently [e.g., annually or every other year] compared to measurements at other levels.)  
  • Use multiple informants (students, parents and teachers).  
  • Use multiple formal and informal data collection tools (e.g., observation, record review, survey, interview).  
  • Track process indicators (reflect upon how implementation is going) so that corrections can be made early in the process.  
  • Track both short-term and long-term outcome indicators (assess what immediate and longer-term effects efforts are having on students).  
  • Collect subjective, self-reported and qualitative outcome data as well as objective, observable, quantitative outcome data, using instruments with established reliability and validity.  

Related resources

See Part 2 of this article for The Partnership's training exercise. The exercise, sure to be a conversation starter, asks participants to critique a fictional school's evaluation plan and suggest improvements.

Article by Celine Provini , EducationWorld Editor Education World ®              Copyright © 2011 Education World

program evaluation in education example

  Section 1. A Framework for Program Evaluation: A Gateway to Tools

Chapter 36 Sections

  • Section 2. Community-based Participatory Research
  • Section 3. Understanding Community Leadership, Evaluators, and Funders: What Are Their Interests?
  • Section 4. Choosing Evaluators
  • Section 5. Developing an Evaluation Plan
  • Section 6. Participatory Evaluation


  Main Section
Learn how program evaluation makes it easier for everyone involved in community health and development work to evaluate their efforts.
This section is adapted from the article "Recommended Framework for Program Evaluation in Public Health Practice," by Bobby Milstein, Scott Wetterhall, and the CDC Evaluation Working Group.

Around the world, there exist many programs and interventions developed to improve conditions in local communities. Communities come together to reduce the level of violence that exists, to work for safe, affordable housing for everyone, or to help more students do well in school, to give just a few examples.

But how do we know whether these programs are working? If they are not effective, and even if they are, how can we improve them to make them better for local communities? And finally, how can an organization make intelligent choices about which promising programs are likely to work best in their community?

Over the past years, there has been a growing trend towards the better use of evaluation to understand and improve practice.The systematic use of evaluation has solved many problems and helped countless community-based organizations do what they do better.

Despite an increased understanding of the need for - and the use of - evaluation, however, a basic agreed-upon framework for program evaluation has been lacking. In 1997, scientists at the United States Centers for Disease Control and Prevention (CDC) recognized the need to develop such a framework. As a result of this, the CDC assembled an Evaluation Working Group comprised of experts in the fields of public health and evaluation. Members were asked to develop a framework that summarizes and organizes the basic elements of program evaluation. This Community Tool Box section describes the framework resulting from the Working Group's efforts.

Before we begin, however, we'd like to offer some definitions of terms that we will use throughout this section.

By evaluation , we mean the systematic investigation of the merit, worth, or significance of an object or effort. Evaluation practice has changed dramatically during the past three decades - new methods and approaches have been developed and it is now used for increasingly diverse projects and audiences.

Throughout this section, the term program is used to describe the object or effort that is being evaluated. It may apply to any action with the goal of improving outcomes for whole communities, for more specific sectors (e.g., schools, work places), or for sub-groups (e.g., youth, people experiencing violence or HIV/AIDS). This definition is meant to be very broad.

Examples of different types of programs include:

  • Direct service interventions (e.g., a program that offers free breakfast to improve nutrition for grade school children)
  • Community mobilization efforts (e.g., organizing a boycott of California grapes to improve the economic well-being of farm workers)
  • Research initiatives (e.g., an effort to find out whether inequities in health outcomes based on race can be reduced)
  • Surveillance systems (e.g., whether early detection of school readiness improves educational outcomes)
  • Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
  • Social marketing campaigns (e.g., a campaign in the Third World encouraging mothers to breast-feed their babies to reduce infant mortality)
  • Infrastructure building projects (e.g., a program to build the capacity of state agencies to support community development initiatives)
  • Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)
  • Administrative systems (e.g., an incentive program to improve efficiency of health services)

Program evaluation - the type of evaluation discussed in this section - is an essential organizational practice for all types of community health and development work. It is a way to evaluate the specific projects and activities community groups may take part in, rather than to evaluate an entire organization or comprehensive community initiative.

Stakeholders refer to those who care about the program or effort. These may include those presumed to benefit (e.g., children and their parents or guardians), those with particular influence (e.g., elected or appointed officials), and those who might support the effort (i.e., potential allies) or oppose it (i.e., potential opponents). Key questions in thinking about stakeholders are: Who cares? What do they care about?

This section presents a framework that promotes a common understanding of program evaluation. The overall goal is to make it easier for everyone involved in community health and development work to evaluate their efforts.

Why evaluate community health and development programs?

The type of evaluation we talk about in this section can be closely tied to everyday program operations. Our emphasis is on practical, ongoing evaluation that involves program staff, community members, and other stakeholders, not just evaluation experts. This type of evaluation offers many advantages for community health and development professionals.

For example, it complements program management by:

  • Helping to clarify program plans
  • Improving communication among partners
  • Gathering the feedback needed to improve and be accountable for program effectiveness

It's important to remember, too, that evaluation is not a new activity for those of us working to improve our communities. In fact, we assess the merit of our work all the time when we ask questions, consult partners, make assessments based on feedback, and then use those judgments to improve our work. When the stakes are low, this type of informal evaluation might be enough. However, when the stakes are raised - when a good deal of time or money is involved, or when many people may be affected - then it may make sense for your organization to use evaluation procedures that are more formal, visible, and justifiable.

How do you evaluate a specific program?

Before your organization starts with a program evaluation, your group should be very clear about the answers to the following questions:.

  • What will be evaluated?
  • What criteria will be used to judge program performance?
  • What standards of performance on the criteria must be reached for the program to be considered successful?
  • What evidence will indicate performance on the criteria relative to the standards?
  • What conclusions about program performance are justified based on the available evidence?

To clarify the meaning of each, let's look at some of the answers for Drive Smart, a hypothetical program begun to stop drunk driving.

  • Drive Smart, a program focused on reducing drunk driving through public education and intervention.
  • The number of community residents who are familiar with the program and its goals
  • The number of people who use "Safe Rides" volunteer taxis to get home
  • The percentage of people who report drinking and driving
  • The reported number of single car night time crashes (This is a common way to try to determine if the number of people who drive drunk is changing)
  • 80% of community residents will know about the program and its goals after the first year of the program
  • The number of people who use the "Safe Rides" taxis will increase by 20% in the first year
  • The percentage of people who report drinking and driving will decrease by 20% in the first year
  • The reported number of single car night time crashes will decrease by 10 % in the program's first two years
  • A random telephone survey will demonstrate community residents' knowledge of the program and changes in reported behavior
  • Logs from "Safe Rides" will tell how many people use their services
  • Information on single car night time crashes will be gathered from police records
  • Are the changes we have seen in the level of drunk driving due to our efforts, or something else? Or (if no or insufficient change in behavior or outcome,)
  • Should Drive Smart change what it is doing, or have we just not waited long enough to see results?

The following framework provides an organized approach to answer these questions.

A framework for program evaluation

Program evaluation offers a way to understand and improve community health and development practice using methods that are useful, feasible, proper, and accurate. The framework described below is a practical non-prescriptive tool that summarizes in a logical order the important elements of program evaluation.

The framework contains two related dimensions:

  • Steps in evaluation practice, and
  • Standards for "good" evaluation.

The six connected steps of the framework are actions that should be a part of any evaluation. Although in practice the steps may be encountered out of order, it will usually make sense to follow them in the recommended sequence. That's because earlier steps provide the foundation for subsequent progress. Thus, decisions about how to carry out a given step should not be finalized until prior steps have been thoroughly addressed.

However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's unique context (for example, the program's history and organizational climate) is essential for sound evaluation. They are intended to serve as starting points around which community organizations can tailor an evaluation to best meet their needs.

  • Engage stakeholders
  • Describe the program
  • Focus the evaluation design
  • Gather credible evidence
  • Justify conclusions
  • Ensure use and share lessons learned

Understanding and adhering to these basic steps will improve most evaluation efforts.

The second part of the framework is a basic set of standards to assess the quality of evaluation activities. There are 30 specific standards, organized into the following four groups:

  • Feasibility

These standards help answer the question, "Will this evaluation be a 'good' evaluation?" They are recommended as the initial criteria by which to judge the quality of the program evaluation efforts.

Engage Stakeholders

Stakeholders are people or organizations that have something to gain or lose from what will be learned from an evaluation, and also in what will be done with that knowledge. Evaluation cannot be done in isolation. Almost everything done in community health and development work involves partnerships - alliances among different organizations, board members, those affected by the problem, and others. Therefore, any serious effort to evaluate a program must consider the different values held by the partners. Stakeholders must be part of the evaluation to ensure that their unique perspectives are understood. When stakeholders are not appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted.

However, if they are part of the process, people are likely to feel a good deal of ownership for the evaluation process and results. They will probably want to develop it, defend it, and make sure that the evaluation really works.

That's why this evaluation cycle begins by engaging stakeholders. Once involved, these people will help to carry out each of the steps that follows.

Three principle groups of stakeholders are important to involve:

  • People or organizations involved in program operations may include community members, sponsors, collaborators, coalition partners, funding officials, administrators, managers, and staff.
  • People or organizations served or affected by the program may include clients, family members, neighborhood organizations, academic institutions, elected and appointed officials, advocacy groups, and community residents. Individuals who are openly skeptical of or antagonistic toward the program may also be important to involve. Opening an evaluation to opposing perspectives and enlisting the help of potential program opponents can strengthen the evaluation's credibility.

Likewise, individuals or groups who could be adversely or inadvertently affected by changes arising from the evaluation have a right to be engaged. For example, it is important to include those who would be affected if program services were expanded, altered, limited, or ended as a result of the evaluation.

  • Primary intended users of the evaluation are the specific individuals who are in a position to decide and/or do something with the results.They shouldn't be confused with primary intended users of the program, although some of them should be involved in this group. In fact, primary intended users should be a subset of all of the stakeholders who have been identified. A successful evaluation will designate primary intended users, such as program staff and funders, early in its development and maintain frequent interaction with them to be sure that the evaluation specifically addresses their values and needs.

The amount and type of stakeholder involvement will be different for each program evaluation. For instance, stakeholders can be directly involved in designing and conducting the evaluation. They can be kept informed about progress of the evaluation through periodic meetings, reports, and other means of communication.

It may be helpful, when working with a group such as this, to develop an explicit process to share power and resolve conflicts . This may help avoid overemphasis of values held by any specific stakeholder.

Describe the Program

A program description is a summary of the intervention being evaluated. It should explain what the program is trying to accomplish and how it tries to bring about those changes. The description will also illustrate the program's core components and elements, its ability to make changes, its stage of development, and how the program fits into the larger organizational and community environment.

How a program is described sets the frame of reference for all future decisions about its evaluation. For example, if a program is described as, "attempting to strengthen enforcement of existing laws that discourage underage drinking," the evaluation might be very different than if it is described as, "a program to reduce drunk driving by teens." Also, the description allows members of the group to compare the program to other similar efforts, and it makes it easier to figure out what parts of the program brought about what effects.

Moreover, different stakeholders may have different ideas about what the program is supposed to achieve and why. For example, a program to reduce teen pregnancy may have some members who believe this means only increasing access to contraceptives, and other members who believe it means only focusing on abstinence.

Evaluations done without agreement on the program definition aren't likely to be very useful. In many cases, the process of working with stakeholders to develop a clear and logical program description will bring benefits long before data are available to measure program effectiveness.

There are several specific aspects that should be included when describing a program.

Statement of need

A statement of need describes the problem, goal, or opportunity that the program addresses; it also begins to imply what the program will do in response. Important features to note regarding a program's need are: the nature of the problem or goal, who is affected, how big it is, and whether (and how) it is changing.


Expectations are the program's intended results. They describe what the program has to accomplish to be considered successful. For most programs, the accomplishments exist on a continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they should be organized by time ranging from specific (and immediate) to broad (and longer-term) consequences. For example, a program's vision, mission, goals, and objectives , all represent varying levels of specificity about a program's expectations.

Activities are everything the program does to bring about changes. Describing program components and elements permits specific strategies and actions to be listed in logical sequence. This also shows how different program activities, such as education and enforcement, relate to one another. Describing program activities also provides an opportunity to distinguish activities that are the direct responsibility of the program from those that are conducted by related programs or partner organizations. Things outside of the program that may affect its success, such as harsher laws punishing businesses that sell alcohol to minors, can also be noted.

Resources include the time, talent, equipment, information, money, and other assets available to conduct program activities. Reviewing the resources a program has tells a lot about the amount and intensity of its services. It may also point out situations where there is a mismatch between what the group wants to do and the resources available to carry out these activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part of the evaluation.

Stage of development

A program's stage of development reflects its maturity. All community health and development programs mature and change over time. People who conduct evaluations, as well as those who use their findings, need to consider the dynamic nature of programs. For example, a new program that just received its first grant may differ in many respects from one that has been running for over a decade.

At least three phases of development are commonly recognized: planning , implementation , and effects or outcomes . In the planning stage, program activities are untested and the goal of evaluation is to refine plans as much as possible. In the implementation phase, program activities are being field tested and modified; the goal of evaluation is to see what happens in the "real world" and to improve operations. In the effects stage, enough time has passed for the program's effects to emerge; the goal of evaluation is to identify and understand the program's results, including those that were unintentional.

A description of the program's context considers the important features of the environment in which the program operates. This includes understanding the area's history, geography, politics, and social and economic conditions, and also what other organizations have done. A realistic and responsive evaluation is sensitive to a broad range of potential influences on the program. An understanding of the context lets users interpret findings accurately and assess their generalizability. For example, a program to improve housing in an inner-city neighborhood might have been a tremendous success, but would likely not work in a small town on the other side of the country without significant adaptation.

Logic model

A logic model synthesizes the main program elements into a picture of how the program is supposed to work. It makes explicit the sequence of events that are presumed to bring about change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of steps leading to program results.

Creating a logic model allows stakeholders to improve and focus program direction. It reveals assumptions about conditions for program effectiveness and provides a frame of reference for one or more evaluations of the program. A detailed logic model can also be a basis for estimating the program's effect on endpoints that are not directly measured. For example, it may be possible to estimate the rate of reduction in disease from a known number of persons experiencing the intervention if there is prior knowledge about its effectiveness.

The breadth and depth of a program description will vary for each program evaluation. And so, many different activities may be part of developing that description. For instance, multiple sources of information could be pulled together to construct a well-rounded description. The accuracy of an existing program description could be confirmed through discussion with stakeholders. Descriptions of what's going on could be checked against direct observation of activities in the field. A narrow program description could be fleshed out by addressing contextual factors (such as staff turnover, inadequate resources, political pressures, or strong community participation) that may affect program performance.

Focus the Evaluation Design

By focusing the evaluation design, we mean doing advance planning about where the evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an evaluation to try to answer all questions for all stakeholders; there must be a focus. A well-focused plan is a safeguard against using time and resources inefficiently.

Depending on what you want to learn, some types of evaluation will be better suited than others. However, once data collection begins, it may be difficult or impossible to change what you are doing, even if it becomes obvious that other methods would work better. A thorough plan anticipates intended uses and creates an evaluation strategy with the greatest chance to be useful, feasible, proper, and accurate.

Among the issues to consider when focusing an evaluation are:

Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the design, methods, and use of the evaluation. Taking time to articulate an overall purpose will stop your organization from making uninformed decisions about how the evaluation should be conducted and used.

There are at least four general purposes for which a community group might conduct an evaluation:

  • To gain insight .This happens, for example, when deciding whether to use a new approach (e.g., would a neighborhood watch program work for our community?) Knowledge from such an evaluation will provide information about its practicality. For a developing program, information from evaluations of similar programs can provide the insight needed to clarify how its activities should be designed.
  • To improve how things get done .This is appropriate in the implementation stage when an established program tries to describe what it has done. This information can be used to describe program processes, to improve how the program operates, and to fine-tune the overall strategy. Evaluations done for this purpose include efforts to improve the quality, effectiveness, or efficiency of program activities.
  • To determine what the effects of the program are . Evaluations done for this purpose examine the relationship between program activities and observed consequences. For example, are more students finishing high school as a result of the program? Programs most appropriate for this type of evaluation are mature programs that are able to state clearly what happened and who it happened to. Such evaluations should provide evidence about what the program's contribution was to reaching longer-term goals such as a decrease in child abuse or crime in the area. This type of evaluation helps establish the accountability, and thus, the credibility, of a program to funders and to the community.
  • Empower program participants (for example, being part of an evaluation can increase community members' sense of control over the program);
  • Supplement the program (for example, using a follow-up questionnaire can reinforce the main messages of the program);
  • Promote staff development (for example, by teaching staff how to collect, analyze, and interpret evidence); or
  • Contribute to organizational growth (for example, the evaluation may clarify how the program relates to the organization's mission).

Users are the specific individuals who will receive evaluation findings. They will directly experience the consequences of inevitable trade-offs in the evaluation process. For example, a trade-off might be having a relatively modest evaluation to fit the budget with the outcome that the evaluation results will be less certain than they would be for a full-scale evaluation. Because they will be affected by these tradeoffs, intended users have a right to participate in choosing a focus for the evaluation. An evaluation designed without adequate user involvement in selecting the focus can become a misguided and irrelevant exercise. By contrast, when users are encouraged to clarify intended uses, priority questions, and preferred methods, the evaluation is more likely to focus on things that will inform (and influence) future actions.

Uses describe what will be done with what is learned from the evaluation. There is a wide range of potential uses for program evaluation. Generally speaking, the uses fall in the same four categories as the purposes listed above: to gain insight, improve how things get done, determine what the effects of the program are, and affect participants. The following list gives examples of uses in each category.

Some specific examples of evaluation uses

To gain insight:.

  • Assess needs and wants of community members
  • Identify barriers to use of the program
  • Learn how to best describe and measure program activities

To improve how things get done:

  • Refine plans for introducing a new practice
  • Determine the extent to which plans were implemented
  • Improve educational materials
  • Enhance cultural competence
  • Verify that participants' rights are protected
  • Set priorities for staff training
  • Make mid-course adjustments
  • Clarify communication
  • Determine if client satisfaction can be improved
  • Compare costs to benefits
  • Find out which participants benefit most from the program
  • Mobilize community support for the program

To determine what the effects of the program are:

  • Assess skills development by program participants
  • Compare changes in behavior over time
  • Decide where to allocate new resources
  • Document the level of success in accomplishing objectives
  • Demonstrate that accountability requirements are fulfilled
  • Use information from multiple evaluations to predict the likely effects of similar programs

To affect participants:

  • Reinforce messages of the program
  • Stimulate dialogue and raise awareness about community issues
  • Broaden consensus among partners about program goals
  • Teach evaluation skills to staff and other stakeholders
  • Gather success stories
  • Support organizational change and improvement

The evaluation needs to answer specific questions . Drafting questions encourages stakeholders to reveal what they believe the evaluation should answer. That is, what questions are more important to stakeholders? The process of developing evaluation questions further refines the focus of the evaluation.

The methods available for an evaluation are drawn from behavioral science and social research and development. Three types of methods are commonly recognized. They are experimental, quasi-experimental, and observational or case study designs. Experimental designs use random assignment to compare the effect of an intervention between otherwise equivalent groups (for example, comparing a randomly assigned group of students who took part in an after-school reading program with those who didn't). Quasi-experimental methods make comparisons between groups that aren't equal (e.g. program participants vs. those on a waiting list) or use of comparisons within a group over time, such as in an interrupted time series in which the intervention may be introduced sequentially across different individuals, groups, or contexts. Observational or case study methods use comparisons within a group to describe and explain what happens (e.g., comparative case studies with multiple communities).

No design is necessarily better than another. Evaluation methods should be selected because they provide the appropriate information to answer stakeholders' questions, not because they are familiar, easy, or popular. The choice of methods has implications for what will count as evidence, how that evidence will be gathered, and what kind of claims can be made. Because each method option has its own biases and limitations, evaluations that mix methods are generally more robust.

Over the course of an evaluation, methods may need to be revised or modified. Circumstances that make a particular approach useful can change. For example, the intended use of the evaluation could shift from discovering how to improve the program to helping decide about whether the program should continue or not. Thus, methods may need to be adapted or redesigned to keep the evaluation on track.

Agreements summarize the evaluation procedures and clarify everyone's roles and responsibilities. An agreement describes how the evaluation activities will be implemented. Elements of an agreement include statements about the intended purpose, users, uses, and methods, as well as a summary of the deliverables, those responsible, a timeline, and budget.

The formality of the agreement depends upon the relationships that exist between those involved. For example, it may take the form of a legal contract, a detailed protocol, or a simple memorandum of understanding. Regardless of its formality, creating an explicit agreement provides an opportunity to verify the mutual understanding needed for a successful evaluation. It also provides a basis for modifying procedures if that turns out to be necessary.

As you can see, focusing the evaluation design may involve many activities. For instance, both supporters and skeptics of the program could be consulted to ensure that the proposed evaluation questions are politically viable. A menu of potential evaluation uses appropriate for the program's stage of development could be circulated among stakeholders to determine which is most compelling. Interviews could be held with specific intended users to better understand their information needs and timeline for action. Resource requirements could be reduced when users are willing to employ more timely but less precise evaluation methods.

Gather Credible Evidence

Credible evidence is the raw material of a good evaluation. The information learned should be seen by stakeholders as believable, trustworthy, and relevant to answer their questions. This requires thinking broadly about what counts as "evidence." Such decisions are always situational; they depend on the question being posed and the motives for asking it. For some questions, a stakeholder's standard for credibility could demand having the results of a randomized experiment. For another question, a set of well-done, systematic observations such as interactions between an outreach worker and community residents, will have high credibility. The difference depends on what kind of information the stakeholders want and the situation in which it is gathered.

Context matters! In some situations, it may be necessary to consult evaluation specialists. This may be especially true if concern for data quality is especially high. In other circumstances, local people may offer the deepest insights. Regardless of their expertise, however, those involved in an evaluation should strive to collect information that will convey a credible, well-rounded picture of the program and its efforts.

Having credible evidence strengthens the evaluation results as well as the recommendations that follow from them. Although all types of data have limitations, it is possible to improve an evaluation's overall credibility. One way to do this is by using multiple procedures for gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can also enhance perceived credibility. When stakeholders help define questions and gather data, they will be more likely to accept the evaluation's conclusions and to act on its recommendations.

The following features of evidence gathering typically affect how credible it is seen as being:

Indicators translate general concepts about the program and its expected effects into specific, measurable parts.

Examples of indicators include:

  • The program's capacity to deliver services
  • The participation rate
  • The level of client satisfaction
  • The amount of intervention exposure (how many people were exposed to the program, and for how long they were exposed)
  • Changes in participant behavior
  • Changes in community conditions or norms
  • Changes in the environment (e.g., new programs, policies, or practices)
  • Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the county)

Indicators should address the criteria that will be used to judge the program. That is, they reflect the aspects of the program that are most meaningful to monitor. Several indicators are usually needed to track the implementation and effects of a complex program or intervention.

One way to develop multiple indicators is to create a "balanced scorecard," which contains indicators that are carefully selected to complement one another. According to this strategy, program processes and effects are viewed from multiple perspectives using small groups of related indicators. For instance, a balanced scorecard for a single program might include indicators of how the program is being delivered; what participants think of the program; what effects are observed; what goals were attained; and what changes are occurring in the environment around the program.

Another approach to using multiple indicators is based on a program logic model, such as we discussed earlier in the section. A logic model can be used as a template to define a full spectrum of indicators along the pathway that leads from program activities to expected effects. For each step in the model, qualitative and/or quantitative indicators could be developed.

Indicators can be broad-based and don't need to focus only on a program's long -term goals. They can also address intermediary factors that influence program effectiveness, including such intangible factors as service quality, community capacity, or inter -organizational relations. Indicators for these and similar concepts can be created by systematically identifying and then tracking markers of what is said or done when the concept is expressed.

In the course of an evaluation, indicators may need to be modified or new ones adopted. Also, measuring program performance by tracking indicators is only one part of evaluation, and shouldn't be confused as a basis for decision making in itself. There are definite perils to using performance indicators as a substitute for completing the evaluation process and reaching fully justified conclusions. For example, an indicator, such as a rising rate of unemployment, may be falsely assumed to reflect a failing program when it may actually be due to changing environmental conditions that are beyond the program's control.

Sources of evidence in an evaluation may be people, documents, or observations. More than one source may be used to gather evidence for each indicator. In fact, selecting multiple sources provides an opportunity to include different perspectives about the program and enhances the evaluation's credibility. For instance, an inside perspective may be reflected by internal documents and comments from staff or program managers; whereas clients and those who do not support the program may provide different, but equally relevant perspectives. Mixing these and other perspectives provides a more comprehensive view of the program or intervention.

The criteria used to select sources should be clearly stated so that users and other stakeholders can interpret the evidence accurately and assess if it may be biased. In addition, some sources provide information in narrative form (for example, a person's experience when taking part in the program) and others are numerical (for example, how many people were involved in the program). The integration of qualitative and quantitative information can yield evidence that is more complete and more useful, thus meeting the needs and expectations of a wider range of stakeholders.

Quality refers to the appropriateness and integrity of information gathered in an evaluation. High quality data are reliable and informative. It is easier to collect if the indicators have been well defined. Other factors that affect quality may include instrument design, data collection procedures, training of those involved in data collection, source selection, coding, data management, and routine error checking. Obtaining quality data will entail tradeoffs (e.g. breadth vs. depth); stakeholders should decide together what is most important to them. Because all data have limitations, the intent of a practical evaluation is to strive for a level of quality that meets the stakeholders' threshold for credibility.

Quantity refers to the amount of evidence gathered in an evaluation. It is necessary to estimate in advance the amount of information that will be required and to establish criteria to decide when to stop collecting data - to know when enough is enough. Quantity affects the level of confidence or precision users can have - how sure we are that what we've learned is true. It also partly determines whether the evaluation will be able to detect effects. All evidence collected should have a clear, anticipated use.

By logistics , we mean the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations also have cultural preferences that dictate acceptable ways of asking questions and collecting information, including who would be perceived as an appropriate person to ask the questions. For example, some participants may be unwilling to discuss their behavior with a stranger, whereas others are more at ease with someone they don't know. Therefore, the techniques for gathering evidence in an evaluation must be in keeping with the cultural norms of the community. Data collection procedures should also ensure that confidentiality is protected.

Justify Conclusions

The process of justifying conclusions recognizes that evidence in an evaluation does not necessarily speak for itself. Evidence must be carefully considered from a number of different stakeholders' perspectives to reach conclusions that are well -substantiated and justified. Conclusions become justified when they are linked to the evidence gathered and judged against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions are justified in order to use the evaluation results with confidence.

The principal elements involved in justifying conclusions based on evidence are:

Standards reflect the values held by stakeholders about the program. They provide the basis to make program judgments. The use of explicit standards for judgment is fundamental to sound evaluation. In practice, when stakeholders articulate and negotiate their values, these become the standards to judge whether a given program's performance will, for instance, be considered "successful," "adequate," or "unsuccessful."

Analysis and synthesis

Analysis and synthesis are methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each evidence element, as well as a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given body of evidence involves deciding how to organize, classify, compare, and display information. These decisions are guided by the questions being asked, the types of data available, and especially by input from stakeholders and primary intended users.


Interpretation is the effort to figure out what the findings mean. Uncovering facts about a program's performance isn't enough to make conclusions. The facts must be interpreted to understand their practical significance. For example, saying, "15 % of the people in our area witnessed a violent act last year," may be interpreted differently depending on the situation. For example, if 50% of community members had watched a violent act in the last year when they were surveyed five years ago, the group can suggest that, while still a problem, things are getting better in the community. However, if five years ago only 7% of those surveyed said the same thing, community organizations may see this as a sign that they might want to change what they are doing. In short, interpretations draw on information and perspectives that stakeholders bring to the evaluation. They can be strengthened through active participation or interaction with the data and preliminary explanations of what happened.

Judgments are statements about the merit, worth, or significance of the program. They are formed by comparing the findings and their interpretations against one or more selected standards. Because multiple standards can be applied to a given program, stakeholders may reach different or even conflicting judgments. For instance, a program that increases its outreach by 10% from the previous year may be judged positively by program managers, based on standards of improved performance over time. Community members, however, may feel that despite improvements, a minimum threshold of access to services has still not been reached. Their judgment, based on standards of social equity, would therefore be negative. Conflicting claims about a program's quality, value, or importance often indicate that stakeholders are using different standards or values in making judgments. This type of disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or bases) on which the program should be judged.


Recommendations are actions to consider as a result of the evaluation. Forming recommendations requires information beyond just what is necessary to form judgments. For example, knowing that a program is able to increase the services available to battered women doesn't necessarily translate into a recommendation to continue the effort, particularly when there are competing priorities or other effective alternatives. Thus, recommendations about what to do with a given intervention go beyond judgments about a specific program's effectiveness.

If recommendations aren't supported by enough evidence, or if they aren't in keeping with stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an evaluation can be strengthened by recommendations that anticipate and react to what users will want to know.

Three things might increase the chances that recommendations will be relevant and well-received:

  • Sharing draft recommendations
  • Soliciting reactions from multiple stakeholders
  • Presenting options instead of directive advice

Justifying conclusions in an evaluation is a process that involves different possible steps. For instance, conclusions could be strengthened by searching for alternative explanations from the ones you have chosen, and then showing why they are unsupported by the evidence. When there are different but equally well supported conclusions, each could be presented with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and interpret findings might be agreed upon before data collection begins.

Ensure Use and Share Lessons Learned

It is naive to assume that lessons learned in an evaluation will necessarily be used in decision making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure that the evaluation findings will be used appropriately. Preparing for their use involves strategic thinking and continued vigilance in looking for opportunities to communicate and influence. Both of these should begin in the earliest stages of the process and continue throughout the evaluation.

The elements of key importance to be sure that the recommendations from an evaluation are used are:

Design refers to how the evaluation's questions, methods, and overall processes are constructed. As discussed in the third step of this framework (focusing the evaluation design), the evaluation should be organized from the start to achieve specific agreed-upon uses. Having a clear purpose that is focused on the use of what is learned helps those who will carry out the evaluation to know who will do what with the findings. Furthermore, the process of creating a clear design will highlight ways that stakeholders, through their many contributions, can improve the evaluation and facilitate the use of the results.


Preparation refers to the steps taken to get ready for the future uses of the evaluation findings. The ability to translate new knowledge into appropriate action is a skill that can be strengthened through practice. In fact, building this skill can itself be a useful benefit of the evaluation. It is possible to prepare stakeholders for future use of the results by discussing how potential findings might affect decision making.

For example, primary intended users and other stakeholders could be given a set of hypothetical results and asked what decisions or actions they would make on the basis of this new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and that no action would be taken, then this is an early warning sign that the planned evaluation should be modified. Preparing for use also gives stakeholders more time to explore both positive and negative implications of potential results and to identify different options for program improvement.

Feedback is the communication that occurs among everyone involved in the evaluation. Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an evaluation on track by keeping everyone informed about how the evaluation is proceeding. Primary intended users and other stakeholders have a right to comment on evaluation decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of every step in the evaluation. Obtaining valuable feedback can be encouraged by holding discussions during each step of the evaluation and routinely sharing interim findings, provisional interpretations, and draft reports.

Follow-up refers to the support that many users need during the evaluation and after they receive evaluation findings. Because of the amount of effort required, reaching justified conclusions in an evaluation can seem like an end in itself. It is not . Active follow-up may be necessary to remind users of the intended uses of what has been learned. Follow-up may also be required to stop lessons learned from becoming lost or ignored in the process of making complex or political decisions. To guard against such oversight, it may be helpful to have someone involved in the evaluation serve as an advocate for the evaluation's findings during the decision -making phase.

Facilitating the use of evaluation findings also carries with it the responsibility to prevent misuse. Evaluation results are always bounded by the context in which the evaluation was conducted. Some stakeholders, however, may be tempted to take results out of context or to use them for different purposes than what they were developed for. For instance, over-generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation.

Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked. Active follow-up can help to prevent these and other forms of misuse by ensuring that evidence is only applied to the questions that were the central focus of the evaluation.


Dissemination is the process of communicating the procedures or the lessons learned from an evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other elements of the evaluation, the reporting strategy should be discussed in advance with intended users and other stakeholders. Planning effective communications also requires considering the timing, style, tone, message source, vehicle, and format of information products. Regardless of how communications are constructed, the goal for dissemination is to achieve full disclosure and impartial reporting.

Along with the uses for evaluation findings, there are also uses that flow from the very process of evaluating. These "process uses" should be encouraged. The people who take part in an evaluation can experience profound changes in beliefs and behavior. For instance, an evaluation challenges staff members to act differently in what they are doing, and to question assumptions that connect program activities with intended effects.

Evaluation also prompts staff to clarify their understanding of the goals of the program. This greater clarity, in turn, helps staff members to better function as a team focused on a common end. In short, immersion in the logic, reasoning, and values of evaluation can have very positive effects, such as basing decisions on systematic judgments instead of on unfounded assumptions.

Additional process uses for evaluation include:

  • By defining indicators, what really matters to stakeholders becomes clear
  • It helps make outcomes matter by changing the reinforcements connected with achieving positive results. For example, a funder might offer "bonus grants" or "outcome dividends" to a program that has shown a significant amount of community change and improvement.

Standards for "good" evaluation

There are standards to assess whether all of the parts of an evaluation are well -designed and working to their greatest potential. The Joint Committee on Educational Evaluation developed "The Program Evaluation Standards" for this purpose. These standards, designed to assess evaluations of educational programs, are also relevant for programs and interventions related to community health and development.

The program evaluation standards make it practical to conduct sound and fair evaluations. They offer well-supported principles to follow when faced with having to make tradeoffs or compromises. Attending to the standards can guard against an imbalanced evaluation, such as one that is accurate and feasible, but isn't very useful or sensitive to the context. Another example of an imbalanced evaluation is one that would be genuinely useful, but is impossible to carry out.

The following standards can be applied while developing an evaluation design and throughout the course of its implementation. Remember, the standards are written as guiding principles, not as rigid rules to be followed in all situations.

The 30 more specific standards are grouped into four categories:

The utility standards are:

  • Stakeholder Identification : People who are involved in (or will be affected by) the evaluation should be identified, so that their needs can be addressed.
  • Evaluator Credibility : The people conducting the evaluation should be both trustworthy and competent, so that the evaluation will be generally accepted as credible or believable.
  • Information Scope and Selection : Information collected should address pertinent questions about the program, and it should be responsive to the needs and interests of clients and other specified stakeholders.
  • Values Identification: The perspectives, procedures, and rationale used to interpret the findings should be carefully described, so that the bases for judgments about merit and value are clear.
  • Report Clarity: Evaluation reports should clearly describe the program being evaluated, including its context, and the purposes, procedures, and findings of the evaluation. This will help ensure that essential information is provided and easily understood.
  • Report Timeliness and Dissemination: Significant midcourse findings and evaluation reports should be shared with intended users so that they can be used in a timely fashion.
  • Evaluation Impact: Evaluations should be planned, conducted, and reported in ways that encourage follow-through by stakeholders, so that the evaluation will be used.

Feasibility Standards

The feasibility standards are to ensure that the evaluation makes sense - that the steps that are planned are both viable and pragmatic.

The feasibility standards are:

  • Practical Procedures: The evaluation procedures should be practical, to keep disruption of everyday activities to a minimum while needed information is obtained.
  • Political Viability : The evaluation should be planned and conducted with anticipation of the different positions or interests of various groups. This should help in obtaining their cooperation so that possible attempts by these groups to curtail evaluation operations or to misuse the results can be avoided or counteracted.
  • Cost Effectiveness: The evaluation should be efficient and produce enough valuable information that the resources used can be justified.

Propriety Standards

The propriety standards ensure that the evaluation is an ethical one, conducted with regard for the rights and interests of those involved. The eight propriety standards follow.

  • Service Orientation : Evaluations should be designed to help organizations effectively serve the needs of all of the targeted participants.
  • Formal Agreements : The responsibilities in an evaluation (what is to be done, how, by whom, when) should be agreed to in writing, so that those involved are obligated to follow all conditions of the agreement, or to formally renegotiate it.
  • Rights of Human Subjects : Evaluation should be designed and conducted to respect and protect the rights and welfare of human subjects, that is, all participants in the study.
  • Human Interactions : Evaluators should respect basic human dignity and worth when working with other people in an evaluation, so that participants don't feel threatened or harmed.
  • Complete and Fair Assessment : The evaluation should be complete and fair in its examination, recording both strengths and weaknesses of the program being evaluated. This allows strengths to be built upon and problem areas addressed.
  • Disclosure of Findings : The people working on the evaluation should ensure that all of the evaluation findings, along with the limitations of the evaluation, are accessible to everyone affected by the evaluation, and any others with expressed legal rights to receive the results.
  • Conflict of Interest: Conflict of interest should be dealt with openly and honestly, so that it does not compromise the evaluation processes and results.
  • Fiscal Responsibility : The evaluator's use of resources should reflect sound accountability procedures and otherwise be prudent and ethically responsible, so that expenditures are accounted for and appropriate.

Accuracy Standards

The accuracy standards ensure that the evaluation findings are considered correct.

There are 12 accuracy standards:

  • Program Documentation: The program should be described and documented clearly and accurately, so that what is being evaluated is clearly identified.
  • Context Analysis: The context in which the program exists should be thoroughly examined so that likely influences on the program can be identified.
  • Described Purposes and Procedures: The purposes and procedures of the evaluation should be monitored and described in enough detail that they can be identified and assessed.
  • Defensible Information Sources: The sources of information used in a program evaluation should be described in enough detail that the adequacy of the information can be assessed.
  • Valid Information: The information gathering procedures should be chosen or developed and then implemented in such a way that they will assure that the interpretation arrived at is valid.
  • Reliable Information : The information gathering procedures should be chosen or developed and then implemented so that they will assure that the information obtained is sufficiently reliable.
  • Systematic Information: The information from an evaluation should be systematically reviewed and any errors found should be corrected.
  • Analysis of Quantitative Information: Quantitative information - data from observations or surveys - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Analysis of Qualitative Information: Qualitative information - descriptive information from interviews and other sources - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Justified Conclusions: The conclusions reached in an evaluation should be explicitly justified, so that stakeholders can understand their worth.
  • Impartial Reporting: Reporting procedures should guard against the distortion caused by personal feelings and biases of people involved in the evaluation, so that evaluation reports fairly reflect the evaluation findings.
  • Metaevaluation: The evaluation itself should be evaluated against these and other pertinent standards, so that it is appropriately guided and, on completion, stakeholders can closely examine its strengths and weaknesses.

Applying the framework: Conducting optimal evaluations

There is an ever-increasing agreement on the worth of evaluation; in fact, doing so is often required by funders and other constituents. So, community health and development professionals can no longer question whether or not to evaluate their programs. Instead, the appropriate questions are:

  • What is the best way to evaluate?
  • What are we learning from the evaluation?
  • How will we use what we learn to become more effective?

The framework for program evaluation helps answer these questions by guiding users to select evaluation strategies that are useful, feasible, proper, and accurate.

To use this framework requires quite a bit of skill in program evaluation. In most cases there are multiple stakeholders to consider, the political context may be divisive, steps don't always follow a logical order, and limited resources may make it difficult to take a preferred course of action. An evaluator's challenge is to devise an optimal strategy, given the conditions she is working under. An optimal strategy is one that accomplishes each step in the framework in a way that takes into account the program context and is able to meet or exceed the relevant standards.

This framework also makes it possible to respond to common concerns about program evaluation. For instance, many evaluations are not undertaken because they are seen as being too expensive. The cost of an evaluation, however, is relative; it depends upon the question being asked and the level of certainty desired for the answer. A simple, low-cost evaluation can deliver information valuable for understanding and improvement.

Rather than discounting evaluations as a time-consuming sideline, the framework encourages evaluations that are timed strategically to provide necessary feedback. This makes it possible to make evaluation closely linked with everyday practices.

Another concern centers on the perceived technical demands of designing and conducting an evaluation. However, the practical approach endorsed by this framework focuses on questions that can improve the program.

Finally, the prospect of evaluation troubles many staff members because they perceive evaluation methods as punishing ("They just want to show what we're doing wrong."), exclusionary ("Why aren't we part of it? We're the ones who know what's going on."), and adversarial ("It's us against them.") The framework instead encourages an evaluation approach that is designed to be helpful and engages all interested stakeholders in a process that welcomes their participation.

Evaluation is a powerful strategy for distinguishing programs and interventions that make a difference from those that don't. It is a driving force for developing and adapting sound strategies, improving existing programs, and demonstrating the results of investments in time and other resources. It also helps determine if what is being done is worth the cost.

This recommended framework for program evaluation is both a synthesis of existing best practices and a set of standards for further improvement. It supports a practical approach to evaluation based on steps and standards that can be applied in almost any setting. Because the framework is purposefully general, it provides a stable guide to design and conduct a wide range of evaluation efforts in a variety of specific program areas. The framework can be used as a template to create useful evaluation plans to contribute to understanding and improvement. The Magenta Book - Guidance for Evaluation  provides additional information on requirements for good evaluation, and some straightforward steps to make a good evaluation of an intervention more feasible, read The Magenta Book - Guidance for Evaluation.

Are You Ready to Evaluate your Coalition? prompts 15 questions to help the group decide whether your coalition is ready to evaluate itself and its work.

The  American Evaluation Association Guiding Principles for Evaluators  helps guide evaluators in their professional practice.

CDC Evaluation Resources  provides a list of resources for evaluation, as well as links to professional associations and journals.

Chapter 11: Community Interventions in the "Introduction to Community Psychology" explains professionally-led versus grassroots interventions, what it means for a community intervention to be effective, why a community needs to be ready for an intervention, and the steps to implementing community interventions.

The  Comprehensive Cancer Control Branch Program Evaluation Toolkit  is designed to help grantees plan and implement evaluations of their NCCCP-funded programs, this toolkit provides general guidance on evaluation principles and techniques, as well as practical templates and tools.

Developing an Effective Evaluation Plan  is a workbook provided by the CDC. In addition to information on designing an evaluation plan, this book also provides worksheets as a step-by-step guide.

EvaluACTION , from the CDC, is designed for people interested in learning about program evaluation and how to apply it to their work. Evaluation is a process, one dependent on what you're currently doing and on the direction in which you'd like go. In addition to providing helpful information, the site also features an interactive Evaluation Plan & Logic Model Builder, so you can create customized tools for your organization to use.

Evaluating Your Community-Based Program  is a handbook designed by the American Academy of Pediatrics covering a variety of topics related to evaluation.

GAO Designing Evaluations  is a handbook provided by the U.S. Government Accountability Office with copious information regarding program evaluations.

The CDC's  Introduction to Program Evaluation for Publilc Health Programs: A Self-Study Guide  is a "how-to" guide for planning and implementing evaluation activities. The manual, based on CDC's Framework for Program Evaluation in Public Health, is intended to assist with planning, designing, implementing and using comprehensive evaluations in a practical way.

McCormick Foundation Evaluation Guide  is a guide to planning an organization's evaluation, with several chapters dedicated to gathering information and using it to improve the organization.

A Participatory Model for Evaluating Social Programs from the James Irvine Foundation.

Practical Evaluation for Public Managers  is a guide to evaluation written by the U.S. Department of Health and Human Services.

Penn State Program Evaluation  offers information on collecting different forms of data and how to measure different community markers.

Program Evaluaton  information page from Implementation Matters.

The Program Manager's Guide to Evaluation  is a handbook provided by the Administration for Children and Families with detailed answers to nine big questions regarding program evaluation.

Program Planning and Evaluation  is a website created by the University of Arizona. It provides links to information on several topics including methods, funding, types of evaluation, and reporting impacts.

User-Friendly Handbook for Program Evaluation  is a guide to evaluations provided by the National Science Foundation.  This guide includes practical information on quantitative and qualitative methodologies in evaluations.

W.K. Kellogg Foundation Evaluation Handbook  provides a framework for thinking about evaluation as a relevant and useful program tool. It was originally written for program directors with direct responsibility for the ongoing evaluation of the W.K. Kellogg Foundation.

This Community Tool Box section is an edited version of:

CDC Evaluation Working Group. (1999). (Draft). Recommended framework for program evaluation in public health practice . Atlanta, GA: Author.

Program Evaluation: a Plain English Guide

This 11-step guide defines program evaluation, what it is used for, the different types and when they should be used. Also covered is how to plan a program evaluation, monitor performance, communicate findings, deliver bad news, and put improvements into practice.

This resource and the following information was contributed to BetterEvaluation by Dana Cross , Grosvenor Management Consulting.   Authors and their affiliation Dana Cross, Grosvenor Management Consulting Year of publication 2015 Type of resource Guide Key features of the resource (summarise the purpose/focus of the resource or its key content/messages) An easy to read and understand guide on tried-and tested program evaluation practices including: the what and why of program evaluation how to articulate the workings of your program using program theory and program logic tools available for planning your program evaluation how to monitor program performance ways to communicate your findings This resource illustrates the practical, hands-on information that is particularly useful to program managers. It includes how-to-guides, diagrams and examples to understand and start implementing program evaluation in real life. Who is this resource useful for? Advocates for evaluation Commissioners/managers of evaluation How have you used or intend on using this resource? (In what ways have you used the resource? What was particularly helpful about it?) Program evaluation can be daunting for program managers approaching it for the first time. Program evaluators and managers have found this a particularly useful resource to share with peers and stakeholders who are new to evaluation; it provides a good introduction to what program evaluation might involve as part of the management and assessment of program performance.  Why would you recommend it to other people? This nuts-and-bolts guidance on the key components of program evaluation avoids jargon and provides a very practical way forward for implementation of the evaluation.
  • What is program evaluation?
  • Understanding which programs to evaluate
  • When is the best time for program evaluation?
  • Program theory and program logic: articulating how your program works
  • Types of program evaluation
  • Tools for planning program evaluation A: Evaluation framework
  • Tools for planning program evaluation B: Evaluation plan
  • How is your program going... really? Performance monitoring
  • Communicating your program evaluation findings effectively
  • Breaking bad news
  • Putting improvements into practice

Cross, D. (2015)  Program Evaluation: a Plain English Guide . Grosvenor Management Consulting

Program Evaluation: What is it, and what are key considerations?

program evaluation

What is Program Evaluation?

Program evaluation serves as a means to identify issues or evaluate changes within an educational program. Thus program evaluation allows for systematic improvement and serves as a key skill for educators seeking to improve learner outcomes. There are many considerations for a successful educational program evaluation.

Evaluation and Assessment

Evaluation and assessment are not one in the same and often mistakenly used interchangeably. In medical education, evaluation refers to a program or curriculum, while assessment refers to learners and learner outcomes. Learner assessment is a separate topic. When performing a program evaluation it must be assured that the focus is on an evaluation of the program or curriculum rather than an assessment of the learners.

When approaching program evaluation, what do you need to know?

When conducting a program evaluation, it is important that you know the stakeholders and what the stakeholders desire. In other words, who is the evaluation for? Is the goal accreditation via ACGME or ABEM? Are you interested in evaluating with respect to learners such as residents/medical students. In addition, you should ask yourself, what is important to these stakeholders? Are they interested in meeting accreditation standards, improving learner outcomes, or improving learner happiness, to name a few. These are very different outcomes that require different evaluative methods.

You must know what tools you have available, and what you wish to do with the information you gather. This will allow you to choose a feasible method that will provide you with the necessary information for your evaluation goals. The most appropriate method may not provide you with the highest level of outcome. For instance, an interview or survey often may provide you with the most useful information but not the highest level of evidence.

Getting beyond Kirkpatrick’s first level 1

Getting past the first level of Kirkpatrick’s education outcomes (satisfaction) can be difficult. Some suggestions on moving past the first-level to higher-level outcomes in your evaluation include:

  • Develop tools that you can use multiple times or use tools that have already been validated. This often leads to higher level and more consistent data. In addition, surrogates (others who may be equally qualified to deliver the specific tool: medical students, residents, fellows, other attendings, etc) can administer these tools saving you and other educators time without diminishing the quality of the data.
  • Use existing databases so that you are not the only one collecting data (i.e. databases on usage of blood products or antibiotics). This allows the researcher to compare their local outcome data to the larger database to allow for greater assessment of patient outcomes (higher Kirkpatrick levels) and, in part, an evaluation of their curriculum.
  • There are other perspectives on outcomes out there beside Kirkpatrick’s. These include examining return on investment for an outcome rather than just the outcome. These may be very valuable if you are looking at fiscal interventions or situations where resources are limited. 2
  • Remember that patient-oriented outcomes are not always the same as your educational goals, and therefore may not be the most appropriate outcomes for your program. 3
  • Do not dismiss Kirkpatrick’s first level of evaluation. Learner satisfaction is still very important! It may be important for your program and curriculum, however, expanding this into scholarship can prove hard to publish.

Tips for examining long-term outcomes

You must stay in touch with your learners. Educators struggle with long-term outcomes. Long-term data often reflect higher level outcomes and true programmatic success. In order to obtain long-term data, educators and their institution must stay connected with their learners in the long term. If you can stay connected, you can start looking at long-term variables like board certification rates, MOC, and adverse actions from medical boards. The holy grail in data for program evaluation would be complete career-length portfolio data for previous learners, but this remains elusive.

What big questions are looming on the horizon?

Attribution:  Those involved in program evaluation want to know that an outcome can be attributed to a specific educational intervention. To do this, the outcome data would need to be traced to the individual learners and behaviors, not just the institution. Think of it as attempting to establish a causal relationship.

Data sharing processes:  Collecting data about what and how learners learned necessitates the need for protection of this data including the responsibilities, liabilities, and concerns for privacy. Much of the information collected regarding learners is sensitive, and learners have concerns about the sharing of this information with future employers, evaluators, or accreditation organizations. Rules regarding the sharing of this information need to be established as we collect this information for program evaluation purposes. Similarly, student information is protected by FERPA.

Off-cycle learners:  We must address what to do when, in the future, learners are entering and exiting at different times. In the future, it is likely that learners will not all begin on July 1 and end on June 30 because of competency-based advancement rather than time-based advancement. This provides a challenge for program evaluation as well.

Other considerations include program evaluation of CME education programs as well as quality and safety measures in educational programs.

Journal of Education & Teaching in Emergency Medicine , or JETem, is a new journal and repository for educators to access curricula that have been implemented or designed elsewhere as well as small group exercises, team-based learning activities, etc. Publication of a research project is not the goal of JETem; however, when submitting a piece of work to JETem, it is important to state what program evaluation was performed and the outcomes.

How to get started in program evaluation

Start with the problem/question that you want to solve , then ask what data you will need to solve the question. This can be helpful in determining when to retire a program or modify it for future needs. In addition, when you are creating an educational program or making changes, simultaneously plan how you will evaluate it in the future. As we discussed previously:

  • Identify the stakeholders
  • Find out what data they are interested in
  • From whom you will collect the data

Remember, the stakeholders often have their own language. Make sure you are speaking their language to improve recognition of your outcomes and success (i.e., ACGME, LCME, residents, medical student, dean, or chair).

Final Thoughts

Michael Quinn Patton is the grandfather of program evaluation and his work is a helpful resource regarding program evaluation.

Useful Resources for Additional Reading

  • Goldie J. AMEE Education Guide no. 29: evaluating educational programmes. Med Teach. 2006 May;28(3):210-24. PubMed
  • Frye AW, Hemmer PA. Program evaluation models and related theories: AMEE guide no. 67. Med Teach. 2012;34(5):e288-99. PubMed
  • Shershneva MB, Larrison C, Robertson S, Speight M. Evaluation of a collaborative program on smoking cessation: translating outcomes framework into practice. J Contin Educ Health Prof. 2011 Fall;31 Suppl 1:S28-36. PubMed
Coaching for Faculty: The Secret to Unlocking Professional Success

Coaching for Faculty: The Secret to Unlocking Professional Success

IDEA Series: Specialised Lectures in Emergency Medicine (SLEM) – A virtual conference to strengthen EM education in the developing world

IDEA Series: Specialised Lectures in Emergency Medicine (SLEM) – A virtual conference to strengthen EM education in the developing world

Mismatch: Why were there so many unfilled emergency medicine residency positions in 2023?

Mismatch: Why were there so many unfilled emergency medicine residency positions in 2023?

  • The Comprehensive Guide to Program Evaluation
  • Learning Center

Program Evaluation

Program evaluation is a cornerstone of effective organizational management, allowing for data-driven decisions and strategic adaptations. Whether you’re a nonprofit, business, or governmental agency, understanding program evaluation can enhance your impact and accountability.

Table of Contents

What is a program?

What is program evaluation.

  • Importance of program evaluation for the monitoring and evaluation practice

How program evaluation can help your organization

  • Types of program evaluation
  • Elements of program evaluation
  • Designing a program evaluation plan

The key steps in conducting a program evaluation

Tools you can use for program evaluation, what are examples of program evaluation, benefits of program evaluation.

  • Using program evaluation results

Conclusion: Investing in program evaluation for better outcomes

A program is a set of related projects, activities, and resources that are managed in a coordinated way to achieve a specific goal or set of goals. Programs are often used in organizations to implement strategic initiatives, and they may involve multiple projects, teams, and stakeholders.

Programs are characterized by their complexity and scope, which may span multiple departments, functions, or geographic regions. They typically involve a range of activities, such as planning, monitoring, and evaluation, and may require specialized skills and expertise to manage effectively.

The key difference between a program and a project is that a program is made up of multiple projects, whereas a project is a temporary endeavor undertaken to create a unique product, service, or result.

Program management is the process of planning, executing, and controlling a program to achieve its intended goals and objectives. It involves coordinating the activities of multiple projects, managing program-level risks and issues, and ensuring that program resources are used effectively and efficiently.

Program evaluation is the systematic collection, analysis, and interpretation of data to understand the effectiveness and efficiency of a program. By assessing a program’s merit, worth, and value, stakeholders can ensure that resources are used optimally and that desired outcomes are achieved.

The purpose of program evaluation is to provide program managers and stakeholders with the information they need to make informed decisions about program design, implementation, and management. The findings of program evaluation can be used to identify program strengths and weaknesses, inform program improvements, and demonstrate accountability to stakeholders.

Program evaluation typically involves the following steps:

  • Defining program goals and objectives: The first step in program evaluation is to define the goals and objectives of the program. This provides a clear framework for evaluation and helps to ensure that evaluation activities are aligned with program priorities.
  • Identifying evaluation questions: The next step is to identify the evaluation questions that need to be answered. These questions should be focused on assessing program effectiveness, efficiency, and impact.
  • Developing an evaluation plan: Once the evaluation questions have been identified, an evaluation plan is developed. This plan outlines the data collection methods, analysis techniques, and reporting formats that will be used in the evaluation.
  • Collecting data: Data is collected using a variety of methods, such as surveys, interviews, focus groups, and document reviews. The data collected should be relevant, reliable, and valid.
  • Analyzing data: The data collected is analyzed to determine the extent to which the program has achieved its intended goals and objectives. This analysis may involve statistical techniques, such as regression analysis or cost-benefit analysis.
  • Reporting findings: The findings of the evaluation are reported to program managers and stakeholders. The report should provide clear and concise information on program effectiveness, efficiency, and impact, and should include recommendations for program improvement.

Program evaluation is a critical component of program management. It provides program managers and stakeholders with the information they need to make informed decisions about program design, implementation, and management. By conducting program evaluations regularly and using the information generated to improve program performance.

Related: Example of a Program Evaluation

Why is Program Evaluation Important?

Program evaluation is an essential part of monitoring and evaluation (M&E) practice, and it plays a vital role in ensuring the success of programs. The following are some of the key reasons why program evaluation is important for M&E:

  • Accountability: Program evaluation helps to ensure that program stakeholders are accountable for the use of resources and the achievement of program objectives. It provides a mechanism for measuring and reporting on program performance, which can help to build trust and credibility with stakeholders.
  • Learning: Program evaluation provides an opportunity for learning and improvement. It helps to identify what worked well, what did not work, and what could be done differently in future programs. This information can be used to improve program design, implementation, and management.
  • Decision-making: Program evaluation provides important information that can be used to inform decision-making. It helps to identify program strengths and weaknesses, and can provide insights into how best to allocate resources, adjust program strategies, and make decisions about program continuation or termination.
  • Communication: Program evaluation provides a mechanism for communicating program progress and performance to stakeholders. This helps to build trust and transparency and can help to mobilize support for the program.
  • Continuous improvement: Program evaluation is an ongoing process that allows for continuous improvement of program performance. By monitoring progress and making adjustments as needed, program managers can ensure that programs stay on track and are achieving their intended outcomes.

Program evaluation is essential for ensuring the success of programs. It provides valuable information for decision-making, accountability, learning, communication, and continuous improvement.

Program evaluation is a critical process that can help your organization to assess the effectiveness of your programs and determine whether you are achieving your intended outcomes. By measuring program impact, improving program effectiveness, making data-driven decisions, increasing stakeholder buy-in, and ensuring accountability, program evaluation can be a valuable tool for your organization.

Measuring program impact is essential for determining whether your programs are achieving their intended goals. By collecting data on program outcomes, you can assess the effectiveness of your programs and identify areas where improvements can be made. Improving program effectiveness requires analyzing data on program activities and outcomes to identify areas where changes can be made. This helps to ensure that your programs are delivering the desired results.

Making data-driven decisions is crucial for ensuring that your organization’s programs are based on evidence and are more likely to be successful. Program evaluation provides data that can be used to guide decision-making around program design, implementation, and improvement. Sharing program evaluation data with stakeholders can also help to build stakeholder buy-in and support for your programs.

Finally, program evaluation helps to ensure accountability for your organization’s programs and outcomes. By collecting data on program activities and outcomes, you can demonstrate that your organization is meeting its obligations and achieving its goals. This is particularly important for organizations that receive funding from external sources, as it demonstrates that the funding is being used effectively.

In summary, program evaluation is a valuable tool that can help your organization to achieve better outcomes and make a greater impact on your target population. By investing in program evaluation, your organization can ensure that its programs are effective, evidence-based, and accountable.

Types of Program Evaluation

There are several types of program evaluation that can be used to assess different aspects of program performance. The following are some of the most commonly used types of program evaluation:

  • Process evaluation : This type of evaluation focuses on how a program is implemented, and whether it is being delivered as intended. It assesses program activities, outputs, and inputs, and can help to identify areas for program improvement.
  • Outcome evaluation : This type of evaluation focuses on the extent to which a program is achieving its intended outcomes. It assesses the short-term and long-term effects of a program and can help to determine whether a program is making a difference.
  • Impact evaluation: This type of evaluation goes beyond outcome evaluation to assess the broader impact of a program on society or the environment. It examines the unintended effects of a program and can help to determine whether a program is having positive or negative consequences.
  • Cost-benefit analysis : This type of evaluation compares the costs of a program to its benefits, in monetary terms. It can help to determine whether a program is delivering value for money and can be used to make decisions about program continuation or termination.
  • Formative evaluation : This type of evaluation is conducted during program development to help improve program design and implementation. It can provide feedback on program activities, inputs, and outputs, and can help to ensure that a program is well-designed and effective.
  • Summative evaluation: This type of evaluation is conducted at the end of a program or program phase to assess program effectiveness and impact. It provides a summary of program performance and can be used to inform decisions about program continuation or termination.

There are several types of program evaluation that can be used to assess different aspects of program performance. Each type of evaluation has its own strengths and weaknesses and can be used to provide different types of information to program managers and stakeholders. By using a combination of different types of program evaluation, program managers can gain a comprehensive understanding of program performance and make informed decisions about program design, implementation, and management.

Elements of Program Evaluation

Program evaluation typically involves a structured and systematic process of assessing the effectiveness, efficiency, and relevance of a program. The following are some of the key elements of a program evaluation:

  • Program logic model: This is a visual representation of how a program is expected to work, including its inputs, activities, outputs, outcomes, and impact. The program logic model serves as the basis for developing evaluation questions and indicators.
  • Evaluation questions: These are the specific questions that the evaluation aims to answer, based on the program logic model. Evaluation questions should be clear, concise, and relevant to the program’s goals.
  • Evaluation design: This refers to the overall approach that the evaluation will take, including the type of evaluation (e.g., process evaluation, outcome evaluation, impact evaluation), data sources, data collection methods, and data analysis techniques.
  • Data collection methods: These are the methods used to collect data, such as surveys, interviews, focus groups, observation, and document review. The selection of data collection methods should be based on the evaluation questions and the availability and reliability of data.
  • Data analysis: This involves the process of organizing, interpreting, and summarizing the data collected through the evaluation. Data analysis should be based on the evaluation questions and use appropriate statistical and qualitative methods.
  • Findings and recommendations: These are the results of the evaluation, including findings about the effectiveness, efficiency, and relevance of the program, and recommendations for program improvement.
  • Dissemination: This refers to the process of sharing the evaluation findings and recommendations with stakeholders, such as program staff, funders, and beneficiaries.

The key elements of a program evaluation are designed to ensure that the evaluation is rigorous, credible, and useful in improving program performance and achieving intended outcomes. By following a structured and systematic approach to program evaluation, program managers can ensure that the evaluation provides meaningful insights and recommendations for program improvement.

Designing a Program Evaluation Plan

When it comes to designing an effective evaluation plan, there are several key steps to keep in mind. Designing a program evaluation plan involves a structured and systematic approach to assess the effectiveness, efficiency, and relevance of a program. Here are the key steps to designing a program evaluation plan:

  • Define the program: Clearly define the program being evaluated, including its goals, objectives, activities, inputs, outputs, and outcomes. Develop a logic model that visually represents how the program is intended to work.
  • Identify evaluation questions: Identify the key questions that the evaluation will aim to answer. These questions should be relevant to the program goals and objectives and should guide the development of the evaluation plan.
  • Determine the evaluation type: Decide on the type of evaluation that will be conducted, such as a process evaluation, outcome evaluation, or impact evaluation. This decision should be based on the evaluation questions and the stage of program implementation.
  • Develop data collection methods: Determine the data collection methods that will be used to answer the evaluation questions. This could include surveys, interviews, focus groups, observation, and document review. Consider the strengths and weaknesses of each method and select the most appropriate methods for the evaluation.
  • Plan data analysis: Plan how the data will be analyzed to answer the evaluation questions. This may involve the use of statistical techniques or qualitative data analysis methods.
  • Identify evaluation team and roles: Identify the members of the evaluation team and their roles and responsibilities. This could include internal staff or external consultants.
  • Develop a timeline: Develop a timeline for the evaluation, including data collection, analysis, and reporting. Ensure that the timeline is realistic and achievable.
  • Develop a budget: Develop a budget for the evaluation, including costs for data collection, analysis, and reporting.
  • Develop a dissemination plan: Plan how the evaluation findings will be disseminated to stakeholders, including program staff, funders, and beneficiaries. Determine the most appropriate formats for dissemination, such as reports, presentations, or dashboards.

By following these steps, program managers can design a comprehensive and effective program evaluation plan that provides meaningful insights into program effectiveness, efficiency, and relevance. This information can be used to improve program performance, achieve intended outcomes, and ensure accountability to program stakeholders.

Conducting a program evaluation is a complex process that requires careful planning and execution. Here are the key steps involved in conducting a program evaluation:

  • Define the evaluation purpose and scope: The first step is to identify the purpose of the evaluation and what needs to be evaluated. This includes defining the evaluation questions, the stakeholders involved, and the timeline for completion.
  • Develop an evaluation plan: The evaluation plan should include the methods that will be used to collect and analyze data, the roles and responsibilities of everyone involved in the evaluation, a timeline for completion, and a budget.
  • Collect data: Data collection can involve a variety of methods, including surveys, interviews, focus groups, and observation. The data collected should be relevant to the evaluation questions and should be gathered in a way that is ethical and responsible.
  • Analyze data: Once the data has been collected, it should be analyzed to identify patterns, trends, and relationships. This analysis should be guided by the evaluation questions and should be conducted using appropriate statistical methods.
  • Draw conclusions and make recommendations: Based on the data analysis, conclusions should be drawn about the effectiveness of the program and its impact on the target population. Recommendations for program improvement should also be made based on the evaluation findings.
  • Share results: The results of the evaluation should be shared with stakeholders, including program staff, funders, and the target population. This can be done through written reports, presentations, or other means.
  • Use results for program improvement: Finally, the results of the evaluation should be used to improve the program. This may involve making changes to program design, implementation, or evaluation methods.

Project managers are able to collect valuable data on the performance of the project and use that data to make informed decisions about the future of the project when they use the tools that are available to them.

Program evaluation can involve a variety of tools, depending on the evaluation questions, data collection methods, and analysis techniques. Here are some common tools used in program evaluation:

  • Surveys: Surveys are a common tool used to collect data from program participants and stakeholders. Surveys can be used to collect data on attitudes, behaviors, and program outcomes.
  • Interviews: Interviews are a tool used to collect in-depth information from program participants and stakeholders. Interviews can provide rich data on experiences, perceptions, and attitudes.
  • Focus groups: Focus groups are a tool used to collect data from a group of individuals who share similar characteristics or experiences. Focus groups can provide insights into group dynamics, shared experiences, and attitudes.
  • Observation: Observation is a tool used to collect data on program activities and outcomes. Observations can be structured or unstructured and can be used to collect data on behaviors, interactions, and program implementation.
  • Case studies: Case studies are a tool used to provide in-depth analysis of a specific program or project. Case studies can provide a detailed description of program implementation, outcomes, and impact.
  • Performance indicators: Performance indicators are tools used to measure program outcomes and impact. Performance indicators can be quantitative or qualitative and can be used to track progress over time.
  • Cost-benefit analysis: Cost-benefit analysis is a tool used to assess the economic impact of a program or project. Cost-benefit analysis can help organizations to determine whether a program is cost-effective and provides a positive return on investment.

By using these tools and techniques, organizations can collect valuable data on program implementation, outcomes, and impact. This data can then be used to improve program effectiveness and make data-driven decisions about program design and implementation.

There are numerous examples of program evaluation across various sectors and domains. Here are some examples:

  • Health programs: Evaluating the effectiveness of a public health program designed to reduce the incidence of a specific disease, such as malaria, tuberculosis, or HIV/AIDS.
  • Education programs: Evaluating the impact of an educational program designed to improve student learning outcomes, such as a literacy or numeracy program.
  • Environmental programs: Evaluating the effectiveness of a conservation program designed to protect endangered species or restore a degraded ecosystem.
  • Social programs: Evaluating the impact of a social welfare program designed to reduce poverty or improve access to basic services, such as healthcare or education.
  • Technology programs: Evaluating the effectiveness of a technology program designed to improve productivity, innovation, or access to information.
  • Economic programs: Evaluating the impact of a development program designed to stimulate economic growth, create jobs, or reduce inequality.
  • Disaster relief programs: Evaluating the effectiveness of a humanitarian relief program designed to respond to natural disasters or humanitarian crises.

Let’s see the evaluation of a mental health education program for university students. The program aims to increase students’ awareness and knowledge of mental health issues, reduce stigma and discrimination, and promote help-seeking behaviors. The program consists of online modules, workshops, peer support groups, and counseling services.

The program evaluation plan includes the following components:

  • Program description: A brief overview of the program’s goals, objectives, activities, and target population.
  • Evaluation design: A description of the evaluation questions, indicators, data sources, data collection methods, data analysis methods, and ethical considerations.
  • Plan to measure key data: A detailed plan of how to collect and analyze data for each indicator and data source. For example, one indicator is the change in students’ attitudes toward mental health issues before and after completing the online modules. The data source is a pre-test and post-test survey that measures students’ attitudes using a Likert scale. The data collection method is an online survey platform that administers the survey to students who enroll in the online modules. The data analysis method is a paired t-test that compares the mean scores of the pre-test and post-test surveys.
  • Collecting and reporting key results: A description of how to organize, manage, and report the data collected from the evaluation. For example, one result is the percentage of students who completed the online modules. The data is organized in a spreadsheet that tracks the completion status of each student. The data is reported in a table that shows the number and percentage of students who completed the online modules by gender, age group, and faculty.
  • Communication plan of key results: A plan of how to disseminate and use the evaluation findings to inform stakeholders and improve the program. For example, one communication strategy is to create a summary report that highlights the main findings and recommendations from the evaluation. The report is shared with the program staff, funders, partners, and participants through email, website, social media, and presentations.

Another example of program evaluation is the evaluation of a smoking cessation program for pregnant women. The program aims to reduce the prevalence of smoking among pregnant women and improve their health and the health of their babies. The program consists of educational sessions, counseling sessions, nicotine replacement therapy, and follow-up support. The program evaluation plan includes the following components:

  • Plan to measure key data: A detailed plan of how to collect and analyze data for each indicator and data source. For example, one indicator is the change in smoking status among pregnant women before and after participating in the program. The data source is a self-reported questionnaire that measures smoking behavior and attitudes using a Likert scale. The data collection method is a face-to-face interview that administers the questionnaire to pregnant women who enroll in the program. The data analysis method is a chi-square test that compares the frequency of smokers and non-smokers before and after the program.
  • Collecting and reporting key results: A description of how to organize, manage, and report the data collected from the evaluation. For example, one result is the percentage of pregnant women who quit smoking after completing the program. The data is organized in a spreadsheet that tracks the smoking status of each pregnant woman. The data is reported in a graph that shows the percentage of smokers and non-smokers before and after the program by age group and ethnicity.
  • Communication plan of key results: A plan of how to disseminate and use the evaluation findings to inform stakeholders and improve the program. For example, one communication strategy is to create a brochure that summarizes the main findings and recommendations from the evaluation. The brochure is distributed to the program staff, funders, partners, participants, health care providers, and community organizations through mail, email, website, social media, and meetings.

Another example of program evaluation is the evaluation of a recycling program for a city. The program aims to increase the rate of recycling among residents and businesses and reduce the amount of waste sent to landfills. The program consists of providing recycling bins, collection services, education campaigns, and incentives for recycling. The program evaluation plan includes the following components:

  • Plan to measure key data: A detailed plan of how to collect and analyze data for each indicator and data source. For example, one indicator is the change in recycling rate among residents and businesses before and after participating in the program. The data source is a waste audit that measures the weight and composition of waste and recyclables generated by a sample of households and businesses. The data collection method is a random sampling technique that selects a representative sample of households and businesses from each district of the city. The data analysis method is a difference-in-differences approach that compares the change in recycling rate between the treatment group (those who received the program) and the control group (those who did not receive the program).
  • Collecting and reporting key results: A description of how to organize, manage, and report the data collected from the evaluation. For example, one result is the percentage of households and businesses that participated in the recycling program. The data is organized in a database that tracks the participation status of each household and business. The data is reported in a chart that shows the percentage of participants by district and by type (residential or commercial).
  • Communication plan of key results: A plan of how to disseminate and use the evaluation findings to inform stakeholders and improve the program. For example, one communication strategy is to create a dashboard that displays the main findings and recommendations from the evaluation. The dashboard is accessible online to the program staff, funders, partners, participants, media, and public through a secure website.

Program evaluation can provide a number of benefits to organizations, including:

  • Assessing program effectiveness: Program evaluation allows organizations to determine whether their programs are effective and achieving their intended outcomes. This helps organizations to make data-driven decisions about program design and implementation.
  • Improving program outcomes: By identifying areas for improvement, program evaluation can help organizations to improve program outcomes and make a greater impact on their target population.
  • Increasing stakeholder buy-in: Program evaluation can help to build stakeholder buy-in and support for programs. By sharing evaluation results with stakeholders, organizations can demonstrate the impact and effectiveness of their programs.
  • Ensuring accountability: Program evaluation helps to ensure that organizations are accountable for their programs and outcomes. This is particularly important for organizations that receive funding from external sources, as it demonstrates that the funding is being used effectively.
  • Identifying best practices: Program evaluation can identify best practices for program design and implementation, allowing organizations to improve their programs and achieve better outcomes.
  • Demonstrating impact: Program evaluation can help organizations to demonstrate the impact of their programs and their contribution to the broader community.

In summary, program evaluation is a valuable tool that can help organizations to assess program effectiveness, improve program outcomes, increase stakeholder buy-in, ensure accountability, identify best practices, and demonstrate impact. By investing in program evaluation, organizations can achieve better outcomes and make a greater impact on their target population.

Using Program Evaluation Results

Program evaluation results can be used in a variety of ways to improve program performance and achieve intended outcomes. Here are some examples:

  • Program improvement: Evaluation results can be used to identify areas for program improvement and inform program redesign. For example, if an evaluation identifies that a program is not achieving its intended outcomes, program managers can use this information to revise program activities, outputs, or inputs to better align with intended outcomes.
  • Decision-making: Evaluation results can be used to inform decision-making about program continuation, termination, or expansion. For example, if an evaluation identifies that a program is not delivering value for money, program managers can use this information to decide whether to continue, terminate, or scale back the program.
  • Communication: Evaluation results can be used to communicate program performance to program stakeholders, such as funders, partners, or beneficiaries. For example, if an evaluation identifies that a program is making a positive impact on beneficiaries, program managers can use this information to communicate the program’s success to funders or partners.
  • Learning: Evaluation results can be used to promote organizational learning and knowledge sharing. For example, if an evaluation identifies a successful program component, program managers can use this information to share best practices with other programs or organizations.
  • Accountability: Evaluation results can be used to ensure accountability to program stakeholders. For example, if an evaluation identifies program weaknesses, program managers can use this information to be accountable to funders, partners, or beneficiaries by addressing the identified weaknesses and improving program performance.

In summary, program evaluation results can be used to improve program performance, inform decision-making, communicate program performance, promote organizational learning, and ensure accountability to program stakeholders. By using evaluation results in these ways, program managers can continuously improve program performance and achieve intended outcomes.

As M&E experts, we strongly recommend investing in program evaluation as an essential process for organizations looking to improve program outcomes and achieve better results. Program evaluation provides valuable insights into program effectiveness, outcomes, and impact, allowing organizations to make data-driven decisions, improve program design and implementation, and ensure accountability.

Investing in program evaluation can provide a range of benefits, including identifying best practices, increasing stakeholder buy-in, and demonstrating impact. By using various evaluation tools and techniques, organizations can collect and analyze data on program activities, outcomes, and impact. This data can be used to improve program effectiveness, identify areas for improvement, and make evidence-based decisions.

In today’s complex and dynamic world, organizations need to be able to demonstrate the impact of their programs, build stakeholder support, and ensure accountability. Program evaluation is an investment that can provide these benefits and more. By following the key steps involved in program evaluation and using appropriate tools and techniques, organizations can achieve better outcomes and make a positive difference in their communities.

In summary, investing in program evaluation is essential for organizations looking to improve program outcomes and achieve better results. It is a valuable tool that can provide a range of benefits, including improving program effectiveness, increasing stakeholder buy-in, and demonstrating impact. By making this investment, organizations can ensure that their programs are effective, evidence-based, and accountable, ultimately leading to better outcomes for program participants and a positive impact on the community.

Program Evaluation in Health Professions Education: An Innovative Approach Guided by Principles

Dorene f. balmer.

1 D.F. Balmer is professor, Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, and director of research on education, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania; ORCID: http://orcid.org/0000-0001-6805-4062 .

Hannah Anderson

2 H. Anderson is research associate, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania.

Daniel C. West

3 D.C. West is professor and associate chair for education, Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, and senior director of medical education, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania; ORCID: http://orcid.org/0000-0002-0909-4213 .

Program evaluation approaches that center the achievement of specific, measurable, achievable, realistic, and time-bound goals are common in health professions education (HPE) but can be challenging to articulate when evaluating emergent programs. Principles-focused evaluation is an alternative approach to program evaluation that centers on adherence to guiding principles, not achievement of goals. The authors describe their innovative application of principles-focused evaluation to an emergent HPE program.

The authors applied principles-focused evaluation to the Children’s Hospital of Philadelphia Medical Education Collaboratory, a works-in-progress program for HPE scholarship. In September 2019, the authors drafted 3 guiding principles. In May 2021, they used feedback from Collaboratory attendees to revise the guiding principles: Advance Excellence , Build Bridges , and Cultivate Learning .

In July 2021, the authors queried participants about the extent to which their experience with the Collaboratory adhered to the revised guiding principles. Twenty of the 38 Collaboratory participants (53%) responded to the survey. Regarding the guiding principle Advance Excellence , 9 respondents (45%) reported that the Collaboratory facilitated engagement in scholarly conversation only by a small extent, and 8 (40%) reported it facilitated professional growth only by a small extent. Although some respondents expressed positive regard for the high degree of rigor promoted by the Collaboratory, others felt discouraged because this degree of rigor seemed unachievable. Regarding the guiding principle Build Bridges , 19 (95%) reported the Collaboratory welcomed perspectives within the group. Regarding the guiding principle Cultivate Learning , 19 (95%) indicated the Collaboratory welcomed perspectives within the group and across disciplines, and garnered collaboration.

Next steps include improving adherence to the principle of Advancing Excellence , fostering a shared mental model of the Collaboratory’s guiding principles, and applying a principles-focused approach to the evaluation of multi-site HPE programs.

Achievement of specific, measurable, achievable, realistic, and time-bound (SMART) goals is commonly used as a criterion for judging the value or effectiveness of programs in health professions education (HPE). 1 – 3 Although SMART goals are useful in program evaluation, articulating SMART goals can be challenging when evaluating emergent or novel programs. In these situations, program leaders may have a general sense of what matters and what they want to accomplish, but exactly how they will accomplish that is unclear and may shift as the program evolves.

Patton’s 4 principles-focused evaluation is an alternative to goal-oriented program evaluation. It uses adherence to guiding principles, not achievement of goals, as the criterion for judging the value or effectiveness of a program. Guiding principles may be defined as “statements that provide guidance about how to think or behave toward some desired result, based on personal values, beliefs, and experience.” 4 (p9) In the context of program evaluation, Patton recommends that the guiding principles be (1) guiding (provide guidance and direction), (2) useful (inform decisions), (3) inspiring (articulate what could be), (4) developmental (adapt over time and contexts), and (5) evaluable (can be documented and judged). 4 , 5 Thus, notable differences exist between the use of guiding principles versus SMART goals as criteria for making judgments about a program. Guiding principles offer a values-informed sense of direction toward outcomes that are difficult to quantify and frame by time (Table ​ (Table1). 1 ). In addition, guiding principles are aspirational, whereas SMART goals explicate what is feasible to achieve.

Contrasting SMART Goals and Principles-Focused Evaluation a

We argue that adhering to guiding principles, while being flexible and open to how to realize those principles, may foster creativity in HPE programs. The use of guiding principles may also prevent premature closure when making judgments about the value or effectiveness of a program. As with false-negative results, program leaders could conclude that their program was not effective if SMART goals were not achieved (e.g., if only 10% of participants had at least 1 peer-reviewed publication within 12 months of completing the faculty development program and not the specific goal of > 20% of participants). However, with principles-focused evaluation, program leaders could conclude that the same program was effective because guiding principles were honored (e.g., participants routinely referred to the faculty development program as “my people” or “home base,” which indicates that the program adhered to the guiding principle of creating community).

To our knowledge, guiding principles have not been used for evaluating HPE programs, although they have been used for designing and implementing HPE programs. 6 , 7 To address this gap, we describe an innovative approach to program evaluation—principles-focused evaluation—and our application of this innovative approach to an emergent program in HPE.

Developing the program and crafting guiding principles

In 2019, we developed the Children’s Hospital of Philadelphia (CHOP) Medical Education Collaboratory (Collaboratory), a works-in-progress program for HPE scholarship. The Collaboratory was designed to be a forum for faculty, staff, and trainees to present scholarly projects and receive constructive feedback and a gathering place for them to learn from the scholarly projects of their peers.

To craft guiding principles, we reviewed existing documents from a 2017 visioning meeting attended by committed health professions educators at CHOP. Documents included statements about what educators valued, believed in, and knew from their own experience at CHOP. We met 3 times from September to November 2019 to review documents and inductively derive guiding principles. Our initial guiding principles were to Advance Excellence , Build Capacity , and Encourage Collaboration (Figure 1). We edited these initial principles based on Patton’s guidelines so that they fit the purpose of program evaluation when program value or effectiveness is judged based on adherence to principles. 4 In operationalizing the program, we routinely shared our guiding principles via email announcements about the Collaboratory and verbally at Collaboratory sessions at the start of each semester. We sought approval from CHOP’s Committee for the Protection of Human Subjects, which deemed our project exempt from review.

We remained cognizant of our guiding principles as we implemented the Collaboratory in January 2020 and made program improvements over time. For example, we iteratively adapted the schedule of Collaboratory sessions to best fit the needs of our attendees and presenters from across CHOP by shifting from 2 presenters to 1 presenter per 60-minute Collaboratory and adding an early evening timeslot. We also revised presenter guidelines to maximize time for discussion (see Supplemental Digital Appendix 1 at http://links.lww.com/ACADMED/B345 ). When pandemic restrictions prohibited face-to-face meetings, we shifted to video conferencing and took advantage of virtual meeting features (e.g., using the chat feature to share relevant articles).

This study was approved as exempt by the CHOP Committee for the Protection of Human Subjects.

We report outcomes of our innovation—application of principles-focused evaluation to program evaluation—in 2 respects. First, we consider our revision of guiding principles as outcomes. Second, we provide evidence of our adherence to those guiding principles.

Revised guiding principles

In May 2021, after 3 semesters of implementation and iterative improvements, we launched our principles-focused evaluation of the Collaboratory. Specifically, we asked, “Are we adhering to our guiding principles?” We started to address that question by sharing descriptive information (e.g., number of sessions, number attendees) and initial guiding principles with attendees of an end-of-semester Collaboratory and eliciting their ideas for program improvement. On the basis of their feedback and aware of a new venue to build community among physician educators, we scaled back on our intention to build capacity and instead focused on building collaboration. We were struck by perceptions that the forum had become a safe space for learning and wanted to incorporate that in our guiding principles. Thus, we revised our guiding principles to Advance Excellence , Build Bridges , and Cultivate Learning (Figure ​ (Figure1 1 ).

Initial and revised guiding principles of Children’s Hospital of Philadelphia Medical Education Collaboratory (Collaboratory), with an example of GUIDE (guiding, useful, inspiring, developmental, evaluable) criteria 4 for one guiding principle.

Then, we constructed a survey to query Collaboratory attendees and presenters about the extent to which the Collaboratory adhered to the revised guiding principles. The survey was composed of 7 items rated on a 4-point scale, with 1 indicating not at all and 4 indicating a great extent, and corresponding text boxes for optional open-ended comments (see Supplemental Digital Appendix 2 at http://links.lww.com/ACADMED/B345 ). Survey items were crafted from language of the guiding principles, which were informed by feedback from attendees of the end-of-semester Collaboratory, but the survey itself was not pilot tested.

To further address our evaluation question, we administered the survey via email to past presenters (n = 13) and attendees (n = 25) at the Collaboratory in July 2021. We received 20 unique responses, 9 from presenters and 11 from attendees for a response rate of 53% (n = 20/38). We analyzed quantitative data descriptively, calculating percentage of responses for each item. We categorized qualitative data from open-ended comments by guiding principle and reviewed these data for evidence of alignment with principles. Results for each item can be found in Figure ​ Figure2 2 .

Participant responses to guiding principle–specific survey items on the principles-focused evaluation of the Medical Education Collaboratory of Children’s Hospital of Philadelphia, 2021 (n = 20 respondents).

We summarize the survey findings, below, by guiding principle and highlight what we learned from open-ended comments. Given our small sample size and similar responses for attendees and presenters on quantitative survey items, we pooled the quantitative responses and report percentages for all survey respondents. We distinguish between attendee and presenter responses for more nuanced open-ended survey items.

Guiding principles

Advance excellence..

Nine of the 20 survey respondents (45%) reported that the Collaboratory facilitated their engagement in scholarly conversation by only a small extent, and 8 (40%) reported that the Collaboratory facilitated their professional growth by only a small extent. The Collaboratory’s intention to advance excellence seemed to be a 2-edged sword. In open-ended comments, some attendees expressed positive regard for a high degree of rigor, but other attendees described feeling discouraged because the level of scholarship that was demonstrated seemed unachievable. For example, some described the Collaboratory as useful for “taking a project idea and putting it into scholarly practice” and “pointing me in the right direction” (in terms of scholarship). In contrast, others described the Collaboratory as a forum where the “high standards could feel discouraging to those just testing the waters of education scholarship.” To better achieve the principle of advancing excellence, both attendees and presenters suggested the Collaboratory could make scholarship more approachable by offering workshops, not just a forum for discussion.

Build bridges.

Of the 20 survey respondents, 19 (95%) indicated that the Collaboratory welcomed personal perspectives within the group and across other disciplines that inform HPE. Almost half of the presenters reported gaining at least 1 new collaborator after presenting at the Collaboratory. According to both attendees and presenters, the Collaboratory was useful “in creating an education community and connecting CHOP with the education community from other institutions” and supporting an “interdisciplinary approach to scholarship.”

Cultivate learning.

Of the 20 survey respondents, 19 (95%) perceived the Collaboratory as welcoming different perspectives on scholarship. Open-ended comments revealed that features of the Collaboratory contributed to the perception of the Collaboratory as a safe space for learning; it was “inclusive,” “honest,” and “friendly.” According to attendees, features of virtual meetings (e.g., closed captioning, meeting transcripts, chat box) and the small-group setting made it easy to engage and contribute to discussions. For presenters, the Collaboratory had “great accessibility and feedback … [and a] wonderful environment.”

As the field of HPE continues to expand, so too will the number of emergent or novel programs. We applied an innovative approach to program evaluation—principles-focused evaluation—to an emergent HPE program at CHOP. 4 Distinct contributions of principles-focused evaluation were its flexibility in allowing us to start with a set of guiding principles informed by existing documents, to tailor guiding principles based on emergent stakeholder feedback, and to hold ourselves accountable to revised guiding principles, not predetermined SMART goals. Our evaluation revealed that we were adhering to 2 guiding principles ( Build Bridges and Cultivate Learning ) but had room to improve adherence to a third principle ( Advance Excellence ).

We considered our work in light of standards for judging the soundness of program evaluation: feasibility, utility, integrity and accuracy. 8 , 9 Our principles-focused evaluation was feasible, helping us stay open to different ways to adhere to principles as the program matured and the COVID-19 pandemic unfolded. Our evaluation was useful for informing program improvement. For example, we implemented a series of skill-building sessions to cultivate learning through small-group, hands-on instruction for frequent areas of concerns in response to evaluation findings. Our principles-focused evaluation had integrity because we grounded our work in data derived from health professions educators at CHOP. Collecting and analyzing both qualitative and quantitative data enhanced the accuracy of our evaluation findings.

As we reflect on our application of principles-focused evaluation, we note some limitations and words of caution. We focused on advancing excellence, building bridges, and cultivating a safe space for learning; in so doing, we did not consider other important principles, such as equity. We did not track attendance by individual; those who attended infrequently may have had a different perspective than those who attended more often. We acknowledge our role as both program leaders and program evaluators. Going forward, we will involve an external evaluator on our program leadership team. Although principles-focused evaluation can be a useful and feasible alternative to SMART goals in program evaluation, guiding principles are necessarily abstract and could appear contradictory, or at least not mutually supportive. We encountered this limitation when revising our guiding principles to focus on connection rather than capacity building. Therefore, program leaders and program evaluators need to establish and work to maintain a shared mental model of guiding principles for their program.

A goal-oriented approach to program evaluation is appropriate in many situations but could limit creativity in emergent or novel HPE programs. A next step may be the application of principles-focused evaluation to HPE programs that are implemented at multiple sites. 4 Similar to leaders of emergent or novel programs, leaders of multisite programs may have a shared sense of what matters and what they want to accomplish but realize that exactly how to accomplish what matters will be subject to site-specific, contextual influences. 10

In closing, principles-focused evaluation is an innovative approach to program evaluation in HPE. Principles-focused evaluation helped us substantiate the effectiveness of our local, emergent program and highlighted areas for improvement. More broadly, others might use a principles-focused approach to evaluate emergent or novel programs, where doing the right thing (adhering to principles) is more imperative than doing things right (achieving specific goals). 4


The authors acknowledge all those who participate in and support the ongoing work of the Medical Education Collaboratory of Children’s Hospital of Philadelphia.

Supplementary Material

Supplemental digital content for this article is available at http://links.lww.com/ACADMED/B345 .

Funding/Support: None reported.

Other disclosures: None reported.

Ethical approval: This study was approved as exempt by the Children’s Hospital of Philadelphia Committees for the Protection of Human Subjects (#21-018967) on June 29, 2021.


STEP 1: Program Goal

STEP 2: Program Objectives

STEP 3: Program Description

STEP 4: Evaluation Questions

STEP 5: Sources of Evaluation Data

STEP 6: Methods of Data Collection


