The Case for Delaying Level 2 Assessments

Why Level 2 Assessments Given Immediately After Learning Are Generally Dangerous and Misleading

Note from Will Thalheimer: This is an updated version of a newsletter blurb written a couple of years ago, where I made too strong a point about the dangers of Level 2 Assessments. Specifically, I claimed that Level 2 Assessments should never be used immediately after learning, which may have been pushing the point too far. Upon rethinking the issue a bit, I’ve concluded that Level 2 Assessments immediately after learning are still dangerous, but there may be some benefits to using them if the overall assessment process is designed correctly. You can be the judge by reading below.

Introduction to Levels 1 and 2 Assessments

Before we get to Level 2 evaluations, let’s talk about Level 1 of Kirkpatrick’s 4-level model of assessment. Level 1 is represented by the "smile sheets" that we hand out after training or include at the end of an e-learning course. They typically ask learners to rate the course and to judge how likely they are to use the information they learned. These evaluations are valuable to get learner reactions and opinions, but they provide a very poor gauge of learning and performance. The fact that we rely on these almost exclusively to assess the value of our instruction is unconscionable.

Level 1 Assessments are not always good predictors of learning. Learners may give a course a high score but not remember what they learned. Learners are also famously optimistic about what they will remember. Just because they tell us they’ll remember information and use it in their work doesn’t mean they will. Learners also fill in smile sheets based on whether they like the course or the instructor. Courses that challenge learners may be rated poorly, even though a challenge might be exactly what is needed to push a significant behavior change.

Level 2 Assessments are intended to measure learning and retention. We want to know whether the information learned is retrievable from memory. Ideally, we want to know whether the information is retrieved and used on the job. If we measure actual on-the-job performance, we’re really utilizing a Level 3 Assessment. In comparison, Level 2 Assessments measure the retrievability of information, not it’s actual use. This is where the problems start.

What is Meant Here by the Word Assessment?

First let me clarify that I am using the word "assessment" to mean a test given for the purposes of evaluating a learning intervention. Assessments can also be used to bolster learning, as when they are used to promote retrieval practice or provide feedback to the learners. It is the first use of assessments that I am concerned with in this article. Specifically I will argue that Level 2 Assessments given for the purpose of evaluating the success or failure of a learning intervention are dangerous if given immediately after the learning. However, this does not mean that assessments used for the purpose of aiding retrieval or providing corrective feedback are not valuable at the end of learning. In fact, they are excellent for that purpose.

The analysis in this article also assumes that Level 2 Assessments are well designed. Specifically, it assumes that the assessments prompt learners to retrieve from memory the same information that they will have to retrieve in their on-the-job situations. It also assumes that the cues that trigger retrieval will be similar to those that will trigger their thinking and performance on the job. It is true that most current Level 2 Assessments don’t meet these criteria, but they should.

The Problems With Immediate Assessments

When we learn a concept, we think about it. When we think about something, it becomes highly retrievable from memory, at least for a short time. Thus, during learning and immediately afterward, our newly learned information is highly retrievable. If we test learners then, they are likely to remember all kinds of stuff they’ll forget in a day.

This problem is compounded because learning is contextualized. If we learn in Room A, we’ll be better able to retrieve the information we learned if we have to retrieve it in Room A as opposed to Room B (by up to 55% or so). Thus, if we test learners in the training room or while they’re still at their desks using an e-learning program, we’re priming them for a level of success they won’t attain when they’re out of that learning situation and back on the job.

Giving someone a test immediately after they learn something is cheating. It provides an inflated measure of their learning. More importantly, it tells us very little—if anything—about how well learners will be able to retrieve information when they get back to their jobs.

On-the-job retrieval depends on both the amount of learning and the amount of forgetting.

Retrieval = Learning – Forgetting

Our instructional designs need to maximize learning and minimize forgetting. If we measure learners immediately after they learn, we’ve accounted for the learning part of the retrieval equation, but we’ve ignored forgetting all together. Not only are immediately-given Level 2 Assessments poor tools to use in measuring an individual’s performance, but they also give us poor feedback about whether our instructional designs are any good. In short, they’re double trouble. First they don’t measure what we want them to measure, and then they don’t hold us accountable for our work.

But What Happens On The Job?

All this is true in most cases, but there are complications when we consider what happens after the learning event. The analysis above is accurate in those situations when learners forget much of what they learn as they move from learning events back to their jobs. Look at the following graph and imagine doing a Level 2 Assessment at the end of the learning—before the forgetting begins. It would show strong results even though later performance would be poor.


But what happens in those all-too-rare situations when learners take the learning and begin to apply what they’ve learned as soon as they get back to the workplace? When they do this, they’re much less likely to forget—and they may even take their competence and learning to a higher level than they achieved in the actual learning event. Check out the graph below as an example.


In this case, if we did a Level 2 Assessment at the end of the initial learning—before the workplace learning begins, again the assessment wouldn’t be accurate. This time it might not adequately assess the ability of the learning intervention to facilitate the workplace learning.

Real learning interventions often generate both types of results. Learners utilize some of what they’ve learned back on the job—facilitating their memory; but the rest of what they learned is not used and so is forgotten. The following graph depicts this dichotomous effect.


So Why Use Level 2 Assessments At All?

It should be clear that Level 2 Assessments delivered immediately after the learning are virtually impossible to interpret. However, it may be useful to use them in conjunction with a later Level 2 Assessment to determine what is happening to the learning after the learners get back to the job.
If the learners’ level of retrieval improves, then we can be fairly certain that the learners have made positive use of the learning event. Of course, a better way to draw this conclusion is to use a comparison-group design, but such an investment is normally not feasible.

If the learners’ level of retrieval remains steady, then we can be fairly certain that the course did some good in preventing forgetting. Again, a comparison-group design will be more definitive.
If the learners’ level of retrieval deteriorates, then we can be fairly certain that the learning event did not prevent forgetting and/or was probably not targeted at real work skills. Deteriorating retrieval is the one result we want to make sure we don’t produce with our learning designs—and because forgetting is central to human learning—if we’re preventing forgetting, we’re doing something important. Finally, because forgetting is normal, a comparison-group design is not as critical in ruling out alternative explanations. In other words, if we find forgetting after the learner returns to the job, we can conclude that the learning event wasn’t good enough.

To summarize this section, it appears that though Level 2 Assessments given immediately after learning have dubious merit because they’re impossible to interpret, there may be value in using immediate Level 2 Assessments in combination with delayed Level 2 Assessments.

How to Do Level 2 Assessments

Level 2 Assessments should be utilized in the following manner:

1. Be administered at least a week or two after the original learning ends, or be administered twice—immediately after learning and a week or two later.

2. Be composed of authentic questions that require learners to retrieve information from memory in a way that is similar to how they’ll have to retrieve it on the job. Simulation-like questions that provide realistic decisions set in real-world contexts are ideal.

3. Cover a significant portion of the most important performance objectives.