One of the Biggest Lies in Learning Evaluation — Asking Learners about Level 3 and 4.

, ,

The Kirkpatrick four-level model of evaluation includes Level 1 learner reactions, Level 2 learning, Level 3 behavior, and 4 Level results. Because of the model’s ubiquity and popularity, many learning professionals and organizations are influenced or compelled by the model to measure the two higher levels—Behavior and Results—even when it doesn’t make sense to do so and even if poor methods are used to do the measurement. This pressure has led many of us astray. It has also enabled vendors to lie to us.

Let me get right to the point. When we ask learners whether a learning intervention will improve their job performance, we are getting their Level 1 reactions. We are NOT getting Level 3 data. More specifically, we are not getting information we can trust to tell us whether a person’s on-the-job behavior has improved due to the learning intervention.

Similarly, when we ask learners about the organizational results that might come from a training or elearning program, we are getting learners’ Level 1 reactions. We are NOT getting Level 4 data. More specifically, we are not getting information we can trust to tell us whether organizational results improved due to the learning intervention.

One key question is, “Are we getting information we can trust?” Another is, “Are we sure the learning intervention caused the outcome we’re targeting—or whether, at least, it was significant in helping to create the targeted outcomes?”

Whenever we gather learner answers, we have to remember that people’s subjective opinions are not always accurate. First there are general problems with human subjectivity; including people’s tendencies toward wanting to be nice, to see themselves and their organizations in a positive light, to believing they themselves are more productive, intelligent, and capable than they actually are. In addition, learners don’t always know how different learning methods affect learning outcomes, so asking them to assess learning designs has to be done with great care to avoid bias.

The Foolishness of Measuring Level 3 and 4 with Learner-Input Alone

There are also specific difficulties in having learners rate Level 3 and 4 results.

  • Having learners assess Level 3 is fraught with peril because of all the biases that are entailed. Learners may want to look good to others or to themselves. They may suffer from the Dunning-Kruger effect and rate their performance at a higher level than what is deserved.
  • Assessing Level 4 organizational results is particularly problematic. First, it is very difficult to track all the things that influence organizational performance. Asking learners for Level 4 results is a dubious enterprise because most employees cannot observe or may not fully understand the many influences that impact organizational outcomes.

Many questions we ask learners in measuring Level 3 and 4 are biased in and of themselves. These four questions are highly biasing, and yet sadly they were taken directly from two of our industry’s best-known learning-evaluation vendors:

  • “Estimate the degree to which you improved your performance related to this course?” (Rated on a scale of percentages to 100)
  • “The training has improved my job performance.” (Rated on a numeric scale)
  • “I will be able to apply on the job what I learned during this session.” (rated with a Likert-like scale)
  • “I anticipate that I will eventually see positive results as a result of my efforts.” (rated with a Likert-like scale)

At least two of our top evaluation vendors make the case explicitly that smile sheets can gather Level 3 and 4 data. This is one of the great lies in the learning industry. A smile sheet garners Level 1 results! It does not capture data at any other levels.

What about delayed smile sheets—questions delivered to learners weeks or months after a learning experience? Can these get Level 2, 3, and 4 data? No! Asking learners for their perspectives, regardless of when their answers are collected, still gives us only Level 1 outcomes! Yes, learners answers can provide hints, but the data can only be a proxy for outcomes beyond Level 1.

On top of that, the problems cited above regarding learner perspectives on their job performance and on organizational results still apply even when questions are asked well after a learning event. Remember, the key to measurement is always whether we can trust the data we are collecting! To reiterate, asking learners for their perspectives on behavior and results suffers from the following:

  • Learners’ biases skew the data
  • Learners’ blind spots make their answers suspect
  • Biased questioning spoils the data
  • The complexity in determining the network of causal influences makes assessments of learning impact difficult or impossible

In situations where learner perspectives are so in doubt, asking learners questions may generate some reasonable hypotheses, but then these hypotheses must be tested with other means.

The Ethics of the Practice

It is unfair to call Level 1 data Level 3 data or Level 4 data.

In truth, it is not only unfair, it is deceptive, disingenuous, and harmful to our learning efforts.

How Widespread is this Misconception?

If two of are top vendors are spreading this misconception, we can be pretty sure that our friend-and-neighbor foot soldiers are marching to the beat.

Last week, I posted a Twitter poll asking the following question:

If you ask your learners how the training will impact their job performance, what #Kirkpatrick level is it?

Twitter polls only allow four choices, so I gave people the choice of choosing Level 1 — Reaction, Level 2 –Learning, Level 3 — Behavior, or Level 4 — Results.

Over 250 people responded (253). Here are the results:

  • Level 1 — Reaction (garnered 31% of the votes)
  • Level 2 — Learning (garnered 15% of the votes)
  • Level 3 — Behavior (garnered 38% of the votes)
  • Level 4 — Results (garnered 16% of the votes)

Level 1 is the correct answer! Level 3 is the most common misconception!

And note, given that Twitter is built on a social-media follower-model—and many people who follow me have read my book on Performance-Focused Smile Sheets, where I specifically debunk this misconception—I’m sure this result is NOT representative of the workplace learning field in general. I’m certain that in the field, more people believe that the question represents a Level 3 measure.

Yes, it is true what they say! People like you who read my work are more informed and less subject to the vagaries of vendor promotions. Also better looking, more bold, and more likely to be humble humanitarians!

My tweet offered one randomly-chosen winner a copy of my award-winning book. And the winner is:

Sheri Kendall-DuPont, known on Twitter as:

Thanks to everyone who participated in the poll…