Let me start out by saying that I don’t know everything about learning measurement. It’s a topic that is too deep and complicated for easy answers.

My value-add is that I look at measurement from the standpoint of how learning works. As far as I can tell, this is a perspective that is new to most of our field’s discussions of measurement. This is ironic of course, because it’s learning measurement we’re talking about. So for example, when we know that learning begets forgetting, why is it that we measure learning before forgetting might even have an effect—thereby biasing the results ridiculously in our favor?

The second unique perspective I’m adding to the conversation is the importance of predicting future retrieval. I argue that we must validly predict future retrieval to give us good feedback about how well our learning programs are working. We do an absolutely terrible job of this now.

Finally, I’d like to think that I am pushing us beyond the conceptual prison engendered by our old models and methods. It’s not that these models and methods are bad. It’s that most of us—me included—have had a seriously difficult time thinking outside the boundaries of the models’ constraints. Let me use the Donald Kirkpatrick model as an example. Others may beat up on it, expand it, or expound on it for pleasure or treasure, but it’s a great model. It helps us make sense of the world by simplifying the confusion. But the model, by itself, doesn’t tell us everything we need to know about learning measurement. It certainly shouldn’t be seen as a prescription for how we measure. It is simply too crude for that.

Three Biases in Measuring Learning at Level 2

  1. Measuring Immediately After Learning. A very high percentage of learning measurement is done immediately at the end of our learning events (In the eLearning Guild research, we found about 90% of e-learning measurement was done right at the end of learning). Immediate tests of learning only make sense if you don’t care about the learner’s future ability to retrieve information. When we measure “learning” what we really want to do is measure “retrieval.” Moreover, what we really care about is future retrieval. Isn’t that the goal of learning after all? We want our learners to learn something so they can retrieve it and use the information later. I detail this point in much greater depth in the Guild research report and in even more depth in my report, Measuring Learning Results… By the way, the Guild report is free to Guild members and to those who complete the survey.
  2. Measuring in the Same Context Where Learning Took Place. A high percentage of training is measured in the learning context (about 92% in the same or similar context in the Guild research). Unfortunately, context influences retrieval, and so when we measure in the learning context we bias our measurement results. Oh, and we bias them in our favor, again. So for example, in the classic research study, Smith, Glenberg, and Bjork (1978) found that when learners were tested in the same room in which they learned, they were able to recall more than 50% more than when they were tested in a different room from where they learned. This amount of bias is well within the bounds of the Barry Bonds level of cheating!! Would you vote yourself into the Hall of Fame?
  3. Measuring Using Inauthentic Assessment Items (like Memorization). Most assessment items purporting to measure learning use memorization questions. Asking learners to simply retrieve what they have learned is bad assessment design because memorization is generally unrelated to future retrieval. So, if we test memorization, we know nothing (or very little) regarding whether our learners will be able to retrieve what is truly important. Sharon Shrock and Bill Coscarelli, two of my co-authors in the eLearning Guild Research Report, highlight the problems of memorization in the third edition of their excellent book, Criterion-Referenced Testing… One of the goals of criterion-referenced testing is to determine whether a learner can be “certified” as competent or knowledgeable about a particular topic area. Schrock and Coscarelli argue that only assessments done on (a) real-world tasks, (b) simulations, and (c) scenarios can be validly used for certification decisions, whereas memorization cannot be used. This is a change from the second edition of their book and provides a paradigmatic shift in our field. In future posts in this series, I will highlight my taxonomy of authenticity for assessment questions that follows up on Schrock and Coscarelli’s thoughtful certification ideas.

The Series Continues Tomorrow…

References:

Shrock, S. A., & Coscarelli, W. C. (2007). Criterion-Referenced Test Development: Technical and Legal Guidelines for Corporate Training (Third Edition). San Francisco: Pfeiffer.

Smith, S. M., Glenberg, A., & Bjork, R. A. (1978). Environmental context and human memory. Memory & Cognition, 6, 342-353.