My Year In Review 2018—Engineering the Future of Learning Evaluation

In 2018, I shattered my collarbone and lay wasting for several months, but still, I think I had one of my best years in terms of the contributions I was able to make. This will certainly sound like hubris, and surely it is, but I can’t help but think that 2018 may go down as one of the most important years in learning evaluation’s long history. At the end of this post, I will get to my failures and regrets, but first I’d like to share just how consequential this year was in my thinking and work in learning evaluation.

It started in January when I published a decisive piece of investigative journalism showing that Donald Kirkpatrick was NOT the originator of the four-level model; that another man, Raymond Katzell, has deserved that honor all along. In February, I published a new evaluation model, LTEM (The Learning-Transfer Evaluation Model)—intended to replace the weak and harmful Kirkpatrick-Katzell Four-Level Model. Already, doctoral students are studying LTEM and organizations around the world are using LTEM to build more effective learning-evaluation strategies.

Publishing these two groundbreaking efforts would have made a great year, but because I still have so much to learn about evaluation, I was very active in exploring our practices—looking for their strengths and weaknesses. I led two research efforts (one with the eLearning Guild and one with my own organization, Work-Learning Research). The Guild research surveyed people like you and your learning-professional colleagues on their general evaluation practices. The Work-Learning Research effort focused specifically on our experiences as practitioners in surveying our learners for their feedback.

Also in 2018, I compiled and published a list of 54 common mistakes that get made in learning evaluation. I wrote an article on how to think about our business stakeholders in learning evaluation. I wrote a post on one of the biggest lies in learning evaluation—how we fool ourselves into thinking that learner feedback gives us definitive data on learning transfer and organizational results. It does not! I created a replacement for the problematic Net Promoter Score. I shared my updated smile-sheet questions, improving those originally put forth in my award winning book, Performance-Focused Smile Sheets. You can access all these publications below.

In my 2018 keynotes, conference sessions, and workshops, I recounted our decades-long frustrations in learning evaluation. We are clearly not happy with what we’ve been able to do in terms of learning evaluation. There are two reasons for this. First, learning evaluation is very complex and difficult to accomplish—doubly so given our severe resource constraints in terms of both budget and time. Second, our learning-evaluation tools are mostly substandard—enabling us to create vanity metrics but not enabling us to capture data in ways that help us, as learning professionals, make our most important decisions.

In 2019, I will continue my work in learning evaluation. I still have so much to unravel. If you see a bit of wisdom related to learning evaluation, please let me know.

Will’s Top Fifteen Publications for 2018

Let me provide a quick review of the top things I wrote this year:

  1. LTEM (The Learning-Transfer Evaluation Model)
    Although published by me in 2018, the model and accompanying 34-page report originated in work begun in 2016 and through the generous and brilliant feedback I received from Julie Dirksen, Clark Quinn, Roy Pollock, Adam Neaman, Yvon Dalat, Emma Weber, Scott Weersing, Mark Jenkins, Ingrid Guerra-Lopez, Rob Brinkerhoff, Trudy Mandeville, and Mike Rustici—as well as from attendees in the 2017 ISPI Design-Thinking conference and the 2018 Learning Technologies conference in London. LTEM is designed to replace the Kirkpatrick-Katzell Four-Level Model originally formulated in the 1950s. You can learn about the new model by clicking here.
  2. Raymond Katzell NOT Donald Kirkpatrick
    Raymond Katzell originated the Four-Level Model. Although Donald Kirkpatrick embraced accolades for the Four-Level Model, it turns out that Raymond Katzell was the true originator. I did an exhaustive investigation and offered a balanced interpretation of the facts. You can read the original piece by clicking here. Interestingly, none of our trade associations have reported on this finding. Why is that? LOL
  3. When Training Pollutes. Our Responsibility to Lessen the Environmental Damage of Training
    I wrote an article and placed it on LinkedIn and as far as I can tell, very few of us really want to think about this. But you can get started by reading the article (by clicking here).
  4. Fifty-Four Mistakes in Learning Evaluation
    Of course we as an industry make mistakes in learning evaluation, but who knew we made so many? I began compiling the list because I’d seen a good number of poor practices and false narratives about what is important in learning evaluation, but by the time I’d gotten my full list I was a bit dumbstruck by the magnitude of problem. I’ve come to believe that we are still in the dark ages of learning evaluation and we need a renaissance. This article will give you some targets for improvements. Click here to read it.
  5. New Research on Learning Evaluation — Conducted with The eLearning Guild
    The eLearning Guild and Dr. Jane Bozarth (the Guild’s Director of Research) asked me to lead a research effort to determine what practitioners in the learning/elearning field are thinking and doing in terms of learning evaluation. In a major report released about a month ago, we reveal findings on how people feel about the learning measurement they are able to do, the support they get from their organizations, and their feelings about their current level of evaluation competence. You can read a blog post I wrote highlighting one result from the report—that a full 40% of us are unhappy with what we are able to do in terms of learning evaluation. You can access the full report here (if you’re a Guild member) and an executive summary. Also, stay tuned to my blog or signup for my newsletter to see future posts about our findings.
  6. Current Practices in Gathering Learner Feedback
    We at Work-Learning Research, Inc. conducted a survey focused on gathering learner feedback (i.e., smile sheets, reaction forms, learner surveys) that spanned 2017 and 2018. Since the publication of my book, Performance-Focused Smile Sheets: A Radical Rethinking of a Dangerous Art Form, I’ve spent a ton of time helping organizations build more effective learner surveys and gauging common practices in the workplace learning field. This research survey continued that work. To read my exhaustive report, click here.
  7. One of the Biggest Lies in Learning Evaluation — Asking Learners about Level 3 and 4 (LTEM Tiers 7 and 8)
    This is big! One of the biggest lies in learning evaluation. It’s a lie we like to tell ourselves and a lie our learning-evaluation vendors like to tell us. If we ask our learners questions that relate to their job performance or the organizational impact of our learning programs we are NOT measuring at Kirkpatrick-Katzell Level 3 or 4 (or at LTEM Tiers 7 and 8), we are measuring at Level 1 and LTEM Tier 3. You can read this refutation here.
  8. Who Will Rule Our Conferences? Truth or Bad-Faith Vendors?
    What do you want from the trade organizations in the learning field? Probably “accurate information” is high on your list. But what happens when the information you get is biased and untrustworthy? Could. Never. Happen. Right? Read this article to see how bias might creep in.
  9. Snake Oil. The Story of Clark Stanley as Preface to Clark Quinn’s Excellent Book
    This was one of my favorite pieces of writing in 2018. Did I ever mention that I love writing and would consider giving this all up for a career as a writer? You’ve all heard of “snake oil” but if you don’t know where the term originated, you really ought to read this piece.
  10. Dealing with the Emotional Readiness of Our Learners — My Ski Accident Reflections
    I had a bad accident on the ski slopes in February this year and I got thinking about how our learners might not always be emotionally ready to learn. I don’t have answers in this piece, just reflections, which you can read about here.
  11. The Backfire Effect. Not the Big Worry We Thought it was (for Those Who Would Debunk Learning Myths)
    This article is for those interested in debunking and persuasion. The Backfire Effect was the finding that trying to persuade someone to stop believing a falsehood, might actually make them more inclined to believe the falsehood. The good news is that new research showed that this worry might be overblown. You can read more about this here (if you dare to be persuaded).
  12. Updated Smile-Sheet Questions for 2018
    I published a set of learner-survey questions in my 2016 book, and have been working with clients to use these questions and variations on these questions for over two years since then. I’ve learned a thing or two and so I published some improvements early last year. You can see those improvements here. And note, for 2019, I’ll be making additional improvements—so stay tuned! Remember, you can sign up to be notified of my news here.
  13. Replacement for NPS (The Net Promoter Score)
    NPS is all the rage. Still! Unfortunately, it’s a terribly bad question to include on a learner survey. The good news is that now there is an alternative, which you can see here.
  14. Neon Elephant Award for 2018 to Clark Quinn
    Every year, I give an award for a great research-to-practice contribution in the workplace learning field. This year’s winner is Clark Quinn. See why he won and check out his excellent resources here.
  15. New Debunker Club Website
    The Debunker Club is a group of people who have committed to debunking myths in the learning field and/or sharing research-based information. In 2018, working with a great team of volunteers, we revamped the Debunker Club website to help build a community of debunkers. We now have over 800 members from around the world. You can learn more about why The Debunker Club exists by clicking here. Also, feel free to join us!

 

My Final Reflections on 2018

I’m blessed to be supported by smart passionate clients and by some of the smartest friends and colleagues in the learning field. My Work-Learning Research practice turned 20 years old in 2018. Being a consultant—especially one who focuses on research-to-practice in the workplace learning field—is still a challenging yet emotionally rewarding endeavor. In 2018, I turned my attention almost fully to learning evaluation. You can read about my two-path evaluation approach here. One of my research surveys totally flopped this year. It was focused on the interface between us (as learning professionals) and our organizations’ senior leadership. I wanted to know if what we thought senior leadership wanted was what they actually wanted. Unfortunately, neither I nor any of the respondents could entice a senior leader to comment. Not one! If you or your organization has access to senior managers, I’d love to partner with you on this! Let me know. Indeed, this doesn’t even have to be research. If your CEO would be willing to trade his/her time letting me ask a few questions in exchange for my time answering questions about learning, elearning, learning evaluation, etc., I’d be freakin’ delighted! I failed this year in working out a deal with another evaluation-focused organization to merge our efforts. I was bummed about this failure as the synergies would have been great. I also failed in 2018 to cure myself of the tendency to miss important emails. If you ever can’t get in touch with me, try, try again! Thanks and apologies! I had a blast in 2018 speaking and keynoting at conferences—both big and small conferences. From doing variations on the Learning-Research Quiz Show (a rollicking good time) to talking about innovations in learning evaluation to presenting workshops on my learning-evaluation methods and the LTEM model. Good stuff, if a ton of work. Oh! I did fail again in 2018 turning my workshops into online workshops. I hope to do better in 2019. I also failed in 2018 in finishing up a research review of the training transfer research. I’m like 95% done, but still haven’t had a chance to finish.

2018 broke my body, made me unavailable for a couple of months, but overall, it turned out to be a pretty damn good year. 2019 looks promising too as I have plans to continue working on learning evaluation. It’s kind of interesting that we are still in the dark ages of learning evaluation. We as an industry, and me as a person, have a ton more to learn about learning evaluation. I plan to continue the journey. Please feel free to reach out and let me know what I can learn from you and your organization. And of course, because I need to pay the rent, let me say that I’d be delighted if you wanted me to help you or your organization. You can reach me through the Work-Learning Research contact form.

Thanks for reading and being interested in my work!!!

At a recent industry conference, a speaker, offering their expertise on learning evaluation, said this:

“As a discipline, we must look at the metrics that really matter… not to us but to the business we serve.”

Unfortunately, this is one of the most counterproductive memes in learning evaluation. It is counterproductive because it throws our profession under the bus. In this telling, we have no professional principles, no standards, no foundational ethics. We are servants, cleaning the floors the way we are instructed to clean them, even if we know a better way.

Year after year we hear from so-called industry thought leaders that our primary responsibility is to the organizations that pay us. This is a dangerous half truth. Of course we owe our organizations some fealty and of course we want to keep our jobs, but we also have professional obligations that go beyond this simple “tell-me-what-to-do” calculus.

This monomaniacal focus on measuring learning in terms of business outcomes reminds me of the management meme from the 1980s and 90s, that suggested that the goal of a business organization is to increase stakeholder value. This single-bottom-line focus has come under blistering attack for its tendency to skew business operations toward short-term results while ignoring long-term business results and for producing outcomes that harm employees, hurt customers, and destroy the environment.

If we give our business stakeholders the metrics they say that matter to them, but fail to capture the metrics that matter to our success as learning professionals in creating effective learning, then we not only fail ourselves and our learners but we fail our organization as well.

Evaluation What For?

To truly understand learning evaluation, we have to ask ourselves why we’re evaluating learning in the first place! We have to work backwards from the answer to this question.

Why does anyone evaluate? We evaluate to help us make better decisions and take better actions. It’s really that simple! So as learning professionals, we need information to help us make our most important decisions. We should evaluate to support these decisions!

What are our most important decisions? Here’s a few:

  • Which part of the content taught, if any, is relevant and helpful to supporting employees in doing their work? Which parts should be modified or discarded?
  • Which aspects of our learning designs are helpful in supporting comprehension, remembering, and motivation to learn? Which aspects should be modified or discarded?
  • Which after-training supports are helpful in enabling learning to be transferred and utilized by employees in their work? Which supports should be kept? Which need to be modified or discarded?

What are our organizational stakeholders’ most important decisions about learning? Here are a few:

  • Are our learning and development efforts creating optimal learning results? What additional support and resources should the organization supply that might improve learning results? What savings can be found in terms of support and resources—and are these savings worth the lost benefits?
  • Is the leadership of the learning and development function producing a cycle of continuous improvement, generating improved learning outcomes or generating learning outcomes optimized given their resource constraints? If not, can they be influenced to be better or should they be replaced?
  • Is the leadership of the learning and development function creating and utilizing evaluation metrics that enable the learning and development team to get valid feedback about the design factors that are most important in creating our learning results? If not, can they be influenced to use better metrics or should they be replaced?

Two Goals for Learning Evaluation

When we think of learning evaluation, we should have two goals. First, we should create learning-evaluation metrics that enable us to make our most important decisions regarding content, design components (i.e., focused at least on comprehension, remembering, motivation to apply learning), and after-training support. Second, we should do enough in our learning evaluations to gain sufficient credibility with our business stakeholders to continue our good work. Focusing only on the second of these is a recipe for disaster. 

Vanity Metrics

In the business start-up world there is a notion called “vanity metrics,” for example see warnings by Eric Ries, the originator of the lean startup movement. Vanity metrics are metrics that seem to be important, but that are not important. They are metrics that often make us look good even if the underlying data is not really meaningful.

Most calls to provide our business stakeholders with the metrics that matter to them result in beautiful visualizations and data dashboards that focus on vanity metrics. Ubiquitous vanity metrics in learning include the number of trainees trained, the cost per training, the estimates of learners for the value of the learning, complicated benefit/cost analyses of that utilize phantom measures of benefits, etc. By focusing only or primarily on these metrics we don’t have data to improve our learning designs, we don’t have data that enables us create cycles of improvement, we don’t have data that enables us to hold ourselves accountable.

Released Today: Research Report on Learning Evaluation Conducted with The eLearning Guild.

Report Title: Evaluating Learning: Insights from Learning Professionals.

I am delighted to announce that a research effort that I led in conjunction with Dr. Jane Bozarth and the eLearning Guild has been released today. I’ll be blogging about our findings over the next couple of months.

This is a major report — packed into 39 pages — and should be read by everyone in the workplace learning field interested in learning evaluation!

Just a teaser here:

We asked folks to consider the last three learning programs their units developed and to reflect on the learning-evaluation approaches they used.

While a majority were generally happy with their evaluation methods on these recent learning programs, about 40% where dissatisfied. Later, in a more general question about whether learning professionals are able to do the learning measurement they want to do, fully 52% said they were NOT able to do the kind of evaluation they thought was right to do.

In the full report, available only to Guild members, we dig down and explore the practices and perspectives that drive our learning-evaluation efforts. I encourage you to get the full report, as it touches on the methods we use, how we communicate with senior business leaders, what we’d like to do differently, and what we think we’re good at. Also, the report concludes with 12 powerful action strategies for getting the most out of our learning-evaluation efforts.

You can get the full report by clicking here.

 

 

I read a brilliantly clear article today by Karen Hao from the MIT Technology Review. It explains what machine learning is and provides a very clear diagram, which I really like.

Now, I am not a machine learning expert, but I have a hypothesis that has a ton of face validity when I look in the mirror. My hypothesis is this:

Machine learning will return meaningful results to the extent that the data it uses is representative of the domain of interest.

A simple thought experiment will demonstrate my point. If a learning machine is given data about professional baseball in the United States from 1890 to 2000, it would learn all kinds of things, including the benefits of pulling the ball as a batter. Pulling the ball occurs when a right-handed batter hits the ball to left field or a left-handed batter hits the ball to right field. In the long history of baseball, many hitters benefited by trying to pull the ball because it produces a more natural swing and one that generates more power. Starting in the 2000s, with the advent of advanced analytics that show where each player is likely to hit the ball, a maneuver called “the shift” has been used more and more, and pulling the ball consistently has become a disadvantage. In the shift, players in the field migrate to positions where the batter is most likely to hit the ball, thus negating the power benefits of pulling the ball. Our learning machine would not know about the decreased benefits of pulling the ball because it would never have seen that data (the data from 2000 to now).

Machine Learning about Learning

I raise this point because of the creeping danger in the world of learning and education. My concern is relevant to all domains where it is difficult to collect data on the most meaningful factors and outcomes, but where it is easy to collect data on less meaningful factors and outcomes. In such cases, our learning machines will only have access to the data that is easy to collect and will not have access to the data that is difficult or impossible to collect. People using machine learning on inadequate data sets will certainly find some interesting relationships in the data, but they will have no way of knowing what they’re missing. The worst part is that they’ll report out some fanciful finding, we’ll all jump up and down in excitement and then make bad decisions based on the bad learning caused by the incomplete data.

In the learning field—where trainers, instructional designers, elearning developers, and teachers reside—we have learned a great deal about research-based methods of improving learning results, but we don’t know everything. And, many of the factors which we know work are not tracked in most big data sets. Do we track the spacing effect, the number of concepts repeated with attention-grabbing variation, the alignment between context cues present in learning materials compared with the cues that will be present in our learners’ future performance situations? Ha! Our large data sets certainly miss many of these causal factors.

Our large data sets also fail to capture the most important outcomes metrics. Indeed, as I have been regularly recounting for years now, typical learning measurements are often biased by measuring immediately at the end of learning (before memories fade), by measuring in the learning context (where contextual cues offer inauthentic hints or subconscious triggering of recall targets), and by measuring with tests of low-level knowledge (compared to more relevant skill-focused decision-making or task performances). We also overwhelmingly rely on learner feedback surveys, both in workplace learning and in higher education. Learner surveys—at least traditional ones—have been found virtually uncorrelated with learning results. To use these meaningless metrics as a primary dependent variable (or just a variable) in a machine-learning data set is complete malpractice.

So if our machine learning data sets have a poor handle on both the inputs and outputs to learning, how can we see machine learning interpretations of learning data as anything but a shiny new alchemy?

 

Measurement Illuminates Some Things But Leaves Others Hidden

In my learning-evaluation workshops, I often show this image.

The theme expressed in the picture is relevant to all types of evaluation, but it is especially relevant for machine learning.

When we review our smile-sheet data, we should not fool ourselves into thinking that we have learned the truth about the success of our learning. When we see a beautiful data-visualized dashboard, we should not deceive ourselves and our organizations that what we see is all there is to see.

So it is with machine learning, especially in domains where the data is not all the data, where the data flawed, and where the boundaries on the full population of domain data are not known.

 

With Apologies to Karen Hao

I don’t know Karen, but I do love her diagram. It’s clear and makes some very cogent points—as does her accompanying article.

Here is her diagram, which you can see in the original at this URL.

Like measurement itself, I think the diagram illuminates some aspects of machine learning but fails to illuminate the danger of incomplete or unrepresentative data sets. So, I made a modification in the flow chart.

And yes, that seven-letter provocation is a new machine-learning term that arises from the data as I see it.

Corrective Feedback Welcome

As I said to start this invective, my hypothesis about machine learning and data is just that—a semi-educated hypothesis that deserves a review from people more knowledgeable than me about machine learning. So, what do you think machine learning gurus?

 

Karen Hao Responds

I’m so delighted! One day after I posted this, Karen Hao responded:

 

 

 

Dateline: This article will be updated periodically. As presented here, it is in its first iteration. I invite you to share your ideas and comments below.

= Will Thalheimer

Introduction

I’ve been in the workplace learning field for over 30 years, have made a lot of mistakes myself and have seen other mistakes get made over and over. In the last decade, as I’ve turned my attention more and more to learning evaluation, I see us making a number of critical mistakes. Because the biggest problem with these mistakes is that we continue to make them—often without realizing our errors—I aim to capture a list of common evaluation mistakes here, and update the list from time to time. I welcome your ideas. In the comment section below, please add your thoughts. Thanks!

Common Evaluation Mistakes

Listed in no particular order… and with common themes sometimes repeated across items…

When Measuring Learner Perceptions

  1. We rely on smile sheets that only tell us about learner satisfaction and course reputation—they don’t tell us enough about learning effectiveness.
  2. We rely on smile sheets as our only metric.
  3. We look at our smile sheet results and forget that what we’re seeing is not all that might be seen. That is, we may not realize that our results might be neglecting critical learning results such as learners’ comprehension, their motivation to apply what they’ve learned, their ability to remember, their success in applying what they’ve learned, etc.
  4. We ask learners about their learning, about their on-the-job performance, and about organizational results; and think we’ve actually measured learning, on-the-job performance, and organizational results—but we only have learners’ subjective opinions about these constructs.
  5. We ask learners questions they won’t be good at answering. (For example: “What percentage of your learning will you use in your job?” “Did the learning help you achieve the learning objectives?” “Did your instructor help you learn?”).
  6. We use Likert-like scales and numeric scales, both of which are too fuzzy to enable good respondent decision-making, to motivate attention to the questions, and to create results that are clear and actionable.
  7. We don’t often use after-training learner surveys to get insights into learning application.
  8. We use affirmations in our questions, biasing our results toward the positive.
  9. In using Likert-like scales, we put the positive choices first, biasing responses toward the positive.
  10. We don’t follow-up with learners to let them know what we’ve learned and the design improvements we’ve been able to target based on their feedback.
  11. We don’t attempt to persuade our learners of the importance of the learner surveys we are asking them to complete.
  12. We don’t use our survey questions as opportunities to send stealth messages to our key stakeholders about important learning-design imperatives.

Biases in Measuring Learning

  1. We measure learning in the learning context where learners are artificially triggered by contextual stimuli that help them remember more than they’ll remember when they are in a different context—for example, at their worksite.
  2. We measure learning near the end of learning, when learners have a relatively easy time remembering—so we fail to measure our learning interventions’ ability to minimize forgetting and support remembering.
  3. We measure low-importance learning metrics (like knowledge questions) rather than learning as represented in realistic decision-making and task performance.

Failing to Measure Learning Factors

  1. When we focus on measuring on-the-job performance and/or business results WHILE NEGLECTING to measure learning factors, we create for ourselves an inability to figure out how to improve our learning designs.
  2. When we don’t measure learning factors on a routine basis, we leave ourselves in the dark, we make it impossible to create a cycle of continuous improvement, and we are essentially abdicating our responsibility as professionals.

Failing to Compare Learning Factors

  1. We rarely, if ever, compare one learning method with another, as marketers do, for example in A-B testing.
  2. Even in elearning, where it wouldn’t be too difficult to randomize learners over different methods, we fail to take advantage.

Not Seeing Behind the Pretty Curtain

  1. We too often get sucked into gorgeous data visualizations without appreciating that the underlying data might be misleading, worthless, irrelevant, etc.
  2. Dashboardism is a version of this. If it looks sophisticated, we assume there is intelligence underneath.
  3. Big data and artificial intelligence may hold promise, but rarely in learning do we have big data. Certainly, in evaluating a single course, there is no big data. Even when we do have lots of data, the data has to be meaningful to be of use. Machine learning doesn’t work well if the most important factors aren’t collected as data. When we measure what is easy to measure compared to what is important to measure, we will discover mere trinkets of meaning.

Measuring On-the-Job Learning

  1. While it would be great to capture data on people’s efforts in learning on the job, so far it seems we are measuring what’s easy to measure, but perhaps not what is important to measure.
  2. We have failed to consider that measuring on-the-job learning could have as many negative consequences as positive consequences.
  3. Even with the promise of xAPI, the big obstacle is how to capture on-the-job learning data without such data-capture being onerous.
  4. Sometimes we forget that managers have been responsible for their teams’ learning ever since the modern organization was born. A mistake we make is creating another layer of learning infrastructure instead of leveraging managers.

The Biasing Effects of Pretests

  1. We forget that pretests—even if there is no feedback given—produce learning effects; perhaps activating interest, triggering future knowledge-seeking behavior, creating schemas that support knowledge formation, etc. This is problematic when we take pretested learning programs as representative of non-pretested learning results. For example, when we assume that a course piloted with a pretest-posttest design will produce similar results to the same course without the pretest.
  2. There is a similar problem with time-series evaluation designs as earlier assessments can affect learning for both good and ill. So, for example, if we see improvements in learning over time, it might be due to the assessment intervention itself rather than the actual learning program.

Not Focusing First on Evaluation Goals

  1. We too often measure just to measure. We don’t think about what decisions we want to be able to make based on our evaluation work.
  2. Too often we don’t start with the questions we want answers to and design our evaluations to answer those questions.
  3. We ask questions on learner surveys that give us information that we cannot act on are unlikely to act on even if we can.

Not Using Evaluation as a Golden Opportunity to Educate or Nudge Our Key Stakeholders (Including Ourselves)

  1. We fail to use the rare opportunity that evaluation provides—the opportunity to have meaningful conversations with key stakeholders—to push specific goals we have for action.
  2. We fail to use evaluation to promote a brand-like idea of who we are as a learning organization. For example, by asking questions about the support we provide to help learners apply their learning to their work, we could burnish our brand image as a learning department that is also a performance-improvement department.

Not Integrating Evaluation into Our Design and Development Process from the Beginning

  1. Too often we begin thinking about evaluation after we’ve already designed a learning program.
  2. Ideally, we would start with a set of evaluation objectives (that is, clear descriptions of the metrics and evaluation methods we will use), so we know in advance—and can negotiate with stakeholders in advance—how we will measure our learning outcomes.

Designing Evaluation Items from Poorly Defined Objectives

  1. Too often we begin the evaluation process by specifying low-level learning objectives that utilize action verbs (e.g., list, explain, etc.) and then derive our evaluation items from those low-level constructs—causing our evaluations to be focused on less-than meaningful metrics.
  2. Too often we utilize Bloom’s taxonomy to design our learning-evaluation assessments, distracting us from focusing on more powerful research-inspired considerations like contextually-realistic decisions and tasks.
  3. Ideally, instead of starting from poorly defined instructional objectives, we should be starting from more performance-focused evaluation objectives.

Measuring Only Obtrusively

  1. We focus mostly on obtrusive measures of learning (like knowledge checks pasted on at the end of modules) when we could also use unobtrusive measures of learning (challenging tasks incorporated as part of the learning).
  2. We fail to utilize subscription-learning opportunities (short learning sessions spread over time) to measure learning, where challenges feel like learning to learners but are also used by us to evaluate the strengths and weaknesses of our learning designs.

Failing to Distinguish between Validating Data and Non-Validating Data

  1. We too often fail to distinguish between data that can validly assess the success of a learning intervention and data that is a poor indicator of success. Some data may be useful to us, but not indicative of the success of learning. For example, the number of people who attend a training tells us nothing about whether the learning was well designed, but it can give us data to ensure that we have a large enough room the next time we run the class.
  2. We too often give ourselves credit for success when it is unwarranted; for example by capturing and reporting data on attendance, learner attention, learner interest, and learner participation—all of which are non-validating data. They can tell us things, but they cannot provide a valid indication of whether learning was successful.

Failing to Consider the Importance of Remembering

  1. We fail to see remembering as a critical node on the causal pathway from comprehension to remembering to work performance to results. By ignoring the critical importance of remembering, we leave a big blind spot in our evaluation systems.
  2. We too often measure learner comprehension and assume the result will be indicative of learners’ later ability to remember what they’ve learned. This is a huge blind spot because people can demonstrate understanding today but forget that understanding tomorrow or a week from now. By fooling ourselves with short-term tests of memory, we enable our learning interventions to continue with designs that fail to support long-term remembering (and fail to minimize forgetting).
  3. Our reliance on the Kirkpatrick-Katzell Four-Level Model exacerbates this tendency, as the Four Levels completely ignores the importance of remembering.

Failing to Evaluate Our Use of Prompting Mechanisms

  1. While we know we should measure learning, we almost always completely forget to measure the use and value of prompting mechanisms like job aids, performance support, signage, and other devices for directly prompting or guiding performance.
  2. Similarly, we fail to measure the synergy between training and prompting mechanisms. Certainly there are better ways to mix training and job aids for example, yet we rarely test different ways to use job aids to support training results.
  3. We also fail to examine grassroots prompting mechanisms—those crafted not through some formal authority, but by people doing the work. By gathering grassroots job aids and evaluating them against the more formal one’s we’ve developed, we can make better decisions about which ones to use.

Failing to Measure when Learning Technologies Give Us Obvious Opportunities

  1. As more and more learning interventions utilize some form of digital technology, we are failing to parlay the data-gathering capabilities of these technologies for use in learning evaluation. Even the simplest affordances are going unrequited. For example, we could easily keep track of how long it takes a learner to complete a task, we could diagnose knowledge through relatively simple mechanisms, we can provide follow-up assessment items after delays—and yet too few of us are using these capabilities and our authoring tools have not been redesigned to intuitively enable this functionality.
  2. We are too often failing in using the power of technology to enable the use of social evaluation methods. For example, we know from the research that peers often provide better feedback than experts to support learning—surely we can use this capability on evaluation practice as well.
  3. We fail to use technology to enable random assignment of learners to treatments—to different learning methods—to give us insights into what works best for our particular learners, content, situation, etc.

Failing to Push Against Poor Evaluation Practices

  1. Too often we report out evaluation data that are of dubious merit. For example, we highlight the number of learners who completed our programs, their general level of satisfaction, the number of words they utilized in a discussion forum, whether they were paying attention during a page-turning elearning program. By reporting these out, we venerate these measures as important, when that are not—or when they are not as important as other evaluation metrics.
  2. Our trade organizations are guilty of this as well, honoring organizations for “best-of-awards” that highlight the number of people who were trained, etc.
  3. The Kirkpatrick-Katzell Four-Level model is silent on poor practices, except that it does rightly cast suspicion on learner reaction data by putting it only at Level 1.

Please Add Ideas or Comment

This is some of what I’ve seen, but I’m sure some of you have seen other mistakes in learning evaluation. Please add them below… Also, feel free to comment on the items in my list, improving them, adding contingencies, attempting to refute that they are mistakes, etc. Thanks for your insights!

= Will Thalheimer

At a recent online discussion held by the Minnesota Chapter of ISPI, where they were discussing the Serious eLearning Manifesto, Michael Allen offered a brilliant idea for learning professionals.

Michael’s company, Allen Interactions, talks regularly with prospective clients. It is in this capacity that Michael often asks this question (or one with this gist):

What is the last thing you want your learners to be doing in training before they go back to their work?

Michael knows the answer—he is using Socratic questioning here—and the answer should be obvious to those skilled in developing learning. We want people to be practicing what they’ve learned, and hopefully practicing in as realistic a way as possible. Of course!

Of course, but too often we don’t think like this. We have our instructional objectives and we plow through to cover content, hoping against hope that the knowledge seeds we plant will magically turn into performance on the job—as if knowledge can be harvested without any further nurturance.

We must remember the wisdom behind Michael’s question, that it is our job as learning professionals to ensure our learners are not only gaining knowledge, but that they are getting practice in making decisions and practicing realistic tasks.

One way to encourage yourself to engage in these good practices is to utilize the LTEM model, a learning evaluation model designed to support us as learning professionals in measuring what’s truly important in learning. LTEM’s Tier 5 and 6 encourage us to evaluate learners’ proficiency in making work-relevant decisions (Tier 5) and performing work-relevant tasks (Tier 6).

Whatever method you use to encourage your organization and team to engage in this research-based best practice, remember that we are harming our learners when we just teach content. Without practice, very little learning will transfer to the workplace.

Respondents

Over 200 learning professionals responded to Work-Learning Research’s 2017-2018 survey on current practices in gathering learner feedback, and today I will reveal the results. The survey ran from November 29th, 2017 to September 16th, 2018. The sample of respondents was drawn from Work-Learning Research’s mailing list and through extensive calls for participation in a variety of social media. Because of this sampling methodology, the survey results are likely skewed toward professionals who care and/or pay attention to research-based practice recommendations more than the workplace learning field as a whole. They are also likely more interested and experienced in learning evaluation as well.

Feel free to share this link with others.

Goal of the Research

The goal of the research was to determine what people are doing in the way of evaluating their learning interventions through the practice of asking learners for their perspectives.

Questions the Research Hoped to Answer

  1. Are smile sheets (learner-feedback questions) still the most common method of doing learning evaluation?
  2. How does their use compare with other methods? Are other methods growing in prominence/use?
  3. How satisfied are learning professionals with their organizations’ learner-feedback methods?
  4. To what extent are organizations looking for alternatives to their current learner-feedback methods?
  5. What kinds of questions are used on smile sheets? Has Thalheimer’s new approach, performance-focused questioning, gained any traction?
  6. What do learning professionals think their current smile sheets are good at measuring (Satisfaction, Reputation, Effectiveness, Nothing)?
  7. What tools are organizations using to gather learner feedback?
  8. How useful are current learner-feedback questions in helping guide improvements in learning design and delivery?
  9. How widely are the target metrics of LTEM (The Learning-Transfer Evaluation Model) currently being measured?

A summary of the findings indexed to these questions can be found at the end of this post.

Situating the Practice of Gathering Learner Feedback

When we gather feedback from learners, we are using a Tier 3 methodology on the LTEM (Learning-Transfer Evaluation Model) or Level 1 on the Kirkpatrick-Katzell Four-Level Model of Training Evaluation.

Demographic Background of Respondents

Respondents came from a wide range of organizations, including small, midsize, and large organizations.

Respondents play a wide range of roles in the learning field.

Most respondents live in the United States and Canada, but there was some significant representation from many predominantly English-speaking countries.

Learner-Feedback Findings

About 67% of respondents report that learners are asked about their perceptions on more than half of their organization’s learning programs, including elearning. Only about 22% report that they survey learners on less than half of their learning programs. This finding is consistent with past findings—surveying learners is the most common form of learning evaluation and is widely practiced.

The two most common question types in use are Likert-like questions and numeric-scale questions. I have argued against their use* and I am pleased that Performance-Focused Smile Sheet questions have been utilized by so many so quickly. Of course, this sample of respondents is comprised of folks on my mailing list so this result surely doesn’t represent current practice in the field as a whole. Not yet! LOL.

*Likert-like questions and numeric-scale questions are problematic for several reasons. First, because they offer fuzzy response choices, learners have a difficult time deciding between them and this likely makes their responses less precise. Second, such fuzziness may inflate bias as there are not concrete anchors to minimize biasing effects of the question stems. Third, Likert-like options and numeric scales likely deflate learner responding because learners are habituated to such scales and because they may be skeptical that data from such scales will actually be useful. Finally, Likert-like options and numeric scales produce indistinct results—averages all in the same range. Such results are difficult to assess, failing to support decision-making—the whole purpose for evaluation in the first place. To learn more, check out Performance-Focused Smile Sheets: A Radical Rethinking of a Dangerous Art Form (book website here).

The most common tools used to gather feedback from learners were paper surveys and SurveyMonkey. Questions delivered from within an LMS were the next highest. High-end evaluation systems like Metrics that Matter were not highly represented in our respondents.

Our respondents did not rate their learner-feedback efforts as very effective. Their learner surveys were seen as most effective in gauging learner satisfaction. Only about 33% of respondents thought their learner surveys gave them insights on the effectiveness of the learning.

Only about 15% of respondents found their data very useful in providing them feedback about how to improve their learning interventions.

Respondents report that their organizations are somewhat open to alternatives to their current learner-feedback approaches, but overall they are not actively looking for alternatives.

Most respondents report that their organizations are at least “modestly happy” with their learner-feedback assessments. Yet only 22% reported being “generally happy” with them. Combining this finding with the one above showing that lots of organizations are open to alternatives, it seems that organizational satisfaction with current learner-feedback approaches is soft.

We asked respondents about their organizations’ attempts to measure the following:

  • Learner Attendance
  • Whether Learner is Paying Attention
  • Learner Perceptions of the Learning (eg, Smile Sheets, Learner Feedback)
  • Amount or Quality of Learner Participation
  • Learner Knowledge of the Content
  • Learner Ability to Make Realistic Decisions
  • Learner Ability to Complete Realistic Tasks
  • Learner Performance on the Job (or in another future performance situation)
  • Impact of Learning on the Learner
  • Impact of Learning on the Organization
  • Impact of Learning on Coworkers, Family, Friends of the Learner
  • Impact of Learning on the Community or Society
  • Impact of Learning on the Environment

These evaluation targets are encouraged in LTEM (The Learning-Transfer Evaluation Model).

Results are difficult to show—because our question was very complicated (admittedly too complicated)—but I will summarize the findings below.

As you can see, learner attendance and learner perceptions (smile sheets) were the most commonly measured factors, with learner knowledge a distant third. The least common measures involved the impact of the learning on the environment, community/society, and the learner’s coworkers/family/friends.

The flip side—methods rarely utilized in respondents’ organizations—shows pretty much the same thing.

Note that the question above, because it was too complicated, probably produced some spurious results, even if the trends at the extremes are probably indicative of the whole range. In other words, it’s likely that attendance and smile sheets are the most utilized and measures of impact on the environment, community/society, and learners’ coworkers/family/friends are the least utilized.

Questions Answered Based on Our Sample

  1. Are smile sheets (learner-feedback questions) still the most common method of doing learning evaluation?

    Yes! Smile sheets are clearly the most popular evaluation method, along with measuring attendance (if we include that as a metric).

  2. How does their use compare with other methods? Are other methods growing in prominence/use?

    Except for Attendance, nothing else comes close. The next most common method is measuring knowledge. Remarkably, given the known importance of decision-making (Tier 5 in LTEM) and task competence (Tier 6 in LTEM), these are used in evaluation at a relatively low level. Similar low levels are found in measuring work performance (Tier 7 in LTEM) and organizational results (part of Tier 8 in LTEM). We’ve known about these relatively low levels from many previous research surveys.

    Hardly any measurement is being done on the impact of learning on learner or his/her coworkers/family/friends, the impact of the learning on the community/society/environment, or on learner participation/attention.

  3. How satisfied are learning professionals with their organizations’ learner-feedback methods?

    Learning professionals are moderately satisfied.

  4. To what extent are organizations looking for alternatives to their current learner-feedback methods?

    Organizations are open to alternatives, with some actively seeking alternatives and some not looking.

  5. What kinds of questions are used on smile sheets? Has Thalheimer’s new approach, performance-focused questioning, gained any traction?

    Likert-like options and numeric scales are the most commonly used. Thalheimer’s performance-focused smile-sheet method has gained traction in this sample of respondents—people likely more in the know about Thalheimer’s approach than the industry at large.

  6. What do learning professionals think their current smile sheets are good at measuring (Satisfaction, Reputation, Effectiveness, Nothing)?

    Learning professionals think their current smile sheets are fairly good at measuring the satisfaction of learners. A full one-third of respondents feel that their current approaches are not valid enough to provide them with meaningful insights about the learning interventions.

  7. What tools are organizations using to gather learner feedback?

    The two most common methods for collecting learner feedback are paper surveys and SurveyMonkey. Questions from LMSs are the next most widely used. Sophisticated evaluation tools are not much in use in our respondent sample.

  8. How useful are current learner-feedback questions in helping guide improvements in learning design and delivery?

    This may be the most important question we might ask, given that evaluation is supposed to aid us in maintaining our successes and improving on our deficiencies. Only 15% of respondents found learner feedback “very helpful” in helping them improve their learning. Many found the feedback “somewhat helpful” but a full one-third found the feedback “not very useful” in enabling them to improve learning.

  9. How widely are the target metrics of LTEM (The Learning-Transfer Evaluation Model) currently being measured?

    As described in Question 2 above, many of the targets of LTEM are not being adequately measured at this point in time (November 2017 to September 2018, during the time immediately before and after LTEM was introduced). This indicates that LTEM is poised to help organizations uncover evaluation targets that can be helpful in setting goals for learning improvements.

Lessons to be Drawn

The results of this survey reinforce what we’ve known for years. In the workplace learning industry, we default to learner-feedback questions (smile sheets) as our most common learning-evaluation method. This is a big freakin’ problem for two reasons. First, our learner-feedback methods are inadequate. We often use poor survey methodologies and ones particularly unsuited to learner feedback, including the use of fuzzy Likert-like options and numeric scales. Second, even if we used the most advanced learner-feedback methods, we still would not be doing enough to gain insights into the strengths and weaknesses of our learning interventions.

Evaluation is meant to provide us with data we can use to make our most critical decisions. We need to know, for example, whether our learning designs are supporting learner comprehension, learner motivation to apply what they’ve learned, learner ability to remember what they’ve learned, and the supports available to help learners transfer their learning to their work. We typically don’t know these things. As a result, we don’t make design decisions we ought to. We don’t make improvements in the learning methods we use or the way we deploy learning. The research captured here should be seen as a wake up call.

The good news from this research is that learning professionals are often aware and sensitized to the deficiencies of their learning-evaluation methods. This seems like a good omen. When improved methods are introduced, they will seek to encourage their use.

LTEM, the new learning-evaluation model (which I developed with the help of some of the smartest folks in the workplace learning field) is targeting some of the most critical learning metrics—metrics that have too often been ignored. It is too new to be certain of its impact, but it seems like a promising tool.

Why I have turned my Attention to Evaluation (and why you should too!)

For 20 years, I’ve focused on compiling scientific research on learning in the belief that research-based information—when combined with a deep knowledge of practice—can drastically improve learning results. I still believe that wholeheartedly! What I’ve also come to understand is that we as learning professionals must get valid feedback on our everyday efforts. It’s simply our responsibility to do so.

We have to create learning interventions based on the best blend of practical wisdom and research-based guidance. We have to measure key indices that tell us how our learning interventions are doing. We have to find out what their strengths are and what their weaknesses are. Then we have to analyze and assess and make decisions about what to keep and what to improve. Then we have to make improvements and again measure our results and continue the cycle—working always toward continuous improvement.

Here’s a quick-and-dirty outline of the recommended cycle for using learning to improve work performance. “Quick-and-dirty” means I might be missing something!

  1. Learn about and/or work to uncover performance-improvement needs.
  2. If you determine that learning can help, continue. Otherwise, build or suggest alternative methods to get to improved work performance.
  3. Deeply understand the work-performance context.
  4. Sketch out a very rough draft for your learning intervention.
  5. Specify your evaluation goals—the metrics you will use to measure your intervention’s strengths and weaknesses.
  6. Sketch out a rough draft for your learning intervention.
  7. Specify your learning objectives (notice that evaluation goals come first!).
  8. Review the learning research and consider your practical constraints (two separate efforts subsequently brought together).
  9. Sketch out a reasonably good draft for your learning intervention.
  10. Build your learning intervention and your learning evaluation instruments (Iteratively testing and improving).
  11. Deploy your “ready-to-go” learning intervention.
  12. Measure your results using the previously determined evaluation instruments, which were based on your previously determined evaluation objectives.
  13. Analyze your results.
  14. Determine what to keep and what to improve.
  15. Make improvements.
  16. Repeat (maybe not every step, but at least from Step 6 onward)

And here is a shorter version:

  1. Know the learning research
  2. Understand your project needs.
  3. Outline your evaluation objectives—the metrics you will use.
  4. Design your learning.
  5. Deploy your learning and your measurement.
  6. Analyze your results.
  7. Make Improvements
  8. Repeat.

More Later Maybe

The results shared here are the result from all respondents. If I get the time, I’d like to look at subsets of respondents. For example, I’d like to look at how learning executives and managers might differ from learning practitioners. Let me know how interested you would be in these results.

Also, I will be conducting other surveys on learning-evaluation practices, so stay tuned. We have been too long frustrated with our evaluation practices and more work needs to be done in understanding the forces that keep us from doing what we want to do. We could also use more and better learning-evaluation tools because the truth is that learning evaluation is still a nascent field.

Finally, because I learn a ton by working with clients who challenge themselves to do more effective interventions, please get in touch with me if you’d like a partner in thinking things through and trying new methods to build more effective evaluation practices. Also, please let me know how you’ve used LTEM (The Learning-Transfer Evaluation Model).

Some links to make this happen:

Appreciations

As always, I am grateful to all the people I learn from, including clients, researchers, thought leaders, conference attendees, and more… Thanks also to all who acknowledge and share my work! It means a lot!

Updated July 3rd, 2018—a week after the original post. See end of post for the update, featuring Rob Brinkerhoff’s response.

Rob Brinkerhoff’s “Success Case Method” needs a subtle name change. I think a more accurate name would be the “Brinkerhoff Case Method.”

I’m one of Rob’s biggest fans, having selected him in 2008 as the Neon Elephant Award Winner for his evaluation work.

Thirty five years ago, in 1983, Rob published an article where he introduced the “Success Case Method.” Here is a picture of the first page of that article:

In that article, the Success-Case Method was introduced as a way to find the value of training when it works. Rob wrote, “The success-case method does not purport to produce a balanced assessment of the total results of training. It does, however, attempt to answer the question: When training works, how well does it work?” (page 58, which is visible above).

The Success-Case Method didn’t stand still. It evolved and improved as Rob refined it based on his research and his work with clients. In his landmark book that details the methodology in 2006, Telling Training’s Story: Evaluation Made Simple, Credible, and Effective, Rob describes how to first survey learners and then sample some of them for interviews by selecting them based on their level of success in applying the training. “Once the sorting is complete, the next step is to select the interviewees from among the high and low success candidates, and perhaps from the middle categories.” (page 102).

To call this the success-case method seems more aligned with the original naming then the actual recommended practice. For that reason, I recommend that we simply call it the Brinkerhoff Case Method. This gives Rob the credit he deserves, and it more accurately reflects the rigor and balance of the method itself.

As soon as I posted the original post, I reached out to Rob Brinkerhoff to let him know. After some reflection, Rob wrote this and asked me to post it:

“Thank you for raising the issue of the currency of the name Success Case Method (SCM). It is kind of you to also think about identifying it more closely with my name. Your thoughts are not unlike others and on occasion even myself. 

It is true the SCM collects data from extreme portions of the respondent distribution including likely successes, non-successes, and ‘middling’ users of training. Digging into these different groups yields rich and useful information. 

Interestingly the original name I gave to the method some 40 years ago when I first started forging it was the “Pioneer” method since when we studied the impact of a new technology or innovation we felt we learned the most from the early adopters – those out ahead of the pack that tried out new things and blazed a trail for others to follow. I refined that name to a more familiar term but the concept and goal remained identical: accelerate the pace of change and learning by studying and documenting the work of those who are using it the most and the best. Their experience is where the gold is buried. 

Given that, I choose to stick with the “success” name. It expresses our overall intent: to nurture and learn from and drive more success. In a nutshell, this name expresses best not how we do it, but why we do it. 

Thanks again for your thoughtful reflections. We’re on the same page.“ 

Rob’s response is thoughtful, as usual. Yet my feelings on this remain steady. As I’ve written in my report on the new Learning-Transfer Evaluation Model (LTEM), our models should nudge appropriate actions. The same is true for the names we give things. Mining for success stories is good, but it has to be balanced. After all, if evaluation doesn’t look for the full truth—without putting a thumb on the scale—than we are not evaluating, we are doing something else.

I know Rob’s work. I know that he is not advocating for, nor does he engage in, unbalanced evaluations. I do fear that the name Success Case Method may give permission or unconsciously nudge lesser practitioners to find more success and less failure than is warranted by the facts.

Of course, the term “Success Case Method” has one brilliant advantage. Where people are hesitant to evaluate for fear of uncovering unpleasant results, the name “Success Case Method” may lessen the worry of moving forward and engaging in evaluation—and so it may actually enable the balanced evaluation that is necessary to uncover the truth of learning’s level of success.

Whatever we call it, the Success Case Method or the Brinkerhoff Case Method—and this is the most important point—it is one of the best learning-evaluation innovations in the past half century.

I also agree that since Rob is the creator, his voice should have the most influence in terms of what to call his invention.

I will end with one of my all-time favorite quotations from the workplace learning field, from Tim Mooney and Robert Brinkerhoff’s excellent book, Courageous Training:

“The goal of training evaluation is not to prove the value of training; the goal of evaluation is to improve the value of training.” (p. 94-95)

On this we should all agree!

The Learning-Transfer Evaluation Model (LTEM) and accompanying Report were updated today with two major changes:

  • The model has been inverted to put the better evaluation methods at the top instead of at the bottom.
  • The model now uses the word “Tier” to refer to the different levels within the model—to distinguish these from the levels of the Kirkpatrick-Katzell model.

This will be the last update to LTEM for the foreseeable future.

You can find the latest version of LTEM and the accompanying report by clicking here.

NOTICE OF UPDATE (17 May 2018):

The LTEM Model and accompanying Report were updated today and can be found below.

Two major changes were included:

  • The model has been inverted to put the better evaluation methods at the top instead of at the bottom.
  • The model now uses the word “Tier” to refer to the different levels within the model—to distinguish these from the levels of the Kirkpatrick-Katzell model.

This will be the last update to LTEM for the foreseeable future.

 

This blog post introduces a new learning-evaluation model, the Learning-Transfer Evaluation Model (LTEM).

 

Why We Need a New Evaluation Model

It is well past time for a new learning-evaluation model for the workplace learning field. The Kirkpatrick-Katzell Model is over 60 years old. It was born in a time before computers, before cognitive psychology revolutionized the learning field, before the training field was transformed from one that focused on the classroom learning experience to one focused on work performance.

The Kirkpatrick-Katzell model—created by Raymond Katzell and popularized by Donald Kirkpatrick—is the dominant standard in our field. It has also done a tremendous amount of harm, pushing us to rely on inadequate evaluation practices and poor learning designs.

I am not the only critic of the Kirkpatrick-Katzell model. There are legions of us. If you do a Google search starting with these letters, “Criticisms of the Ki,” Google anticipates the following: “Criticisms of the Kirkpatrick Model” as one of the most popular searches.

Here’s what a seminal research review said about the Kirkpatrick-Katzell model (before the model’s name change):

The Kirkpatrick framework has a number of theoretical and practical shortcomings. [It] is antithetical to nearly 40 years of research on human learning, leads to a checklist approach to evaluation (e.g., ‘we are measuring Levels 1 and 2, so we need to measure Level 3’), and, by ignoring the actual purpose for evaluation, risks providing no information of value to stakeholders…

The New Model

For the past year or so I’ve been working to develop a new learning-evaluation model. The current version is the eleventh iteration, improved after reflection, after asking some of the smartest people in our industry to provide feedback, after sharing earlier versions with conference attendees at the 2017 ISPI innovation and design-thinking conference and the 2018 Learning Technologies conference in London.

Special thanks to the following people who provided significant feedback that improved the model and/or the accompanying article:

Julie Dirksen, Clark Quinn, Roy Pollock, Adam Neaman, Yvon Dalat, Emma Weber, Scott Weersing, Mark Jenkins, Ingrid Guerra-Lopez, Rob Brinkerhoff, Trudy Mandeville, Mike Rustici

The model, which I’ve named the Learning-Transfer Evaluation Model (LTEM, pronounced L-tem) is a one page, eight-level model, augmented with color coding and descriptive explanations. In addition to the model itself, I’ve prepared a 34-page report to describe the need for the model, the rationale for its design, and recommendations on how to use it.

You can access the model and the report by clicking on the following links:

 

 

Release Notes

The LTEM model and report were researched, conceived, and written by Dr. Will Thalheimer of Work-Learning Research, Inc., with significant and indispensable input from others. No one sponsored or funded this work. It was a labor of love and is provided as a valentine for the workplace learning field on February 14th, 2018 (Version 11). Version 12 was released on May 17th, 2018 based on feedback from its use. The model and report are copyrighted by Will Thalheimer, but you are free to share them as is, as long as you don’t sell them.

If you would like to contact me (Will Thalheimer), you can do that at this link: https://www.worklearning.com/contact/

If you would like to sign up for my list, you can do that here: https://www.worklearning.com/sign-up/