Sharon Shrock and Bill Coscarelli recently completed the third edition of their important book, Criterion-Referenced Test Development: Technical and Legal Guidelines for Corporate Training. If this book isn’t in your collection already, I’ll give you a link below to buy from Amazon.com.

In this third edition, Sharon and Bill have updated the book from the second edition (published in 2000) in some critical ways. One of those ways is truly transformational for the workplace learning and performance field. I’ll get to that in a minute.

Also updated is the excellent chapter at the end of the book by Patricia S. Eyres, a lawyer with employment-law credentials. Her chapter covers the legal ramifications and guidelines in dealing with employee testing, especially as that testing affects employee selection, advancement, and retention. She has updated her chapter with new case law and legal precedent from that in the second edition. Most people in the training field have very little knowledge of the legal ramifications of testing, and I’d recommend the book for this chapter alone—it’s a great wakeup call that will spur a new appreciation of the legal aspects of testing.

In the second edition, Shrock and Coscarelli put forward what they call the “Certification Suite.” In criterion-referenced testing, the goal is to decide whether a test taker has met a criterion or not. When they have met the criterion, they are said to be “certified” as competent in the area on which they were tested. The Certification Suite has six levels, some which offer full Certification and some which offer Quasi-Certification:

Certification

  1. Real World
  2. High-Fidelity Simulation
  3. Scenarios

Quasi-Certification

  1. Memorization
  2. Attendance
  3. Affiliation

As the authors say in the book (p. 111), “Level C represents the last level of certification that can be considered to assess an ability to perform on the job.”

The truly transformational thing offered by Shrock and Coscarelli is that Level D Memorization, in the second edition of the book, was considered to offer Certification. NO MORE!! That’s right. Two of our leading thinkers on testing say that memorization questions are no longer good enough!!!!!!!!!!!!!! Disclosure: In speaking with Bill Coscarelli in 2006, I gently encouraged this change. This is mentioned in the book, so it’s not like I’m bragging. SMILE.

I love this, of course, because it follows what we know about human learning. For tests to be predictive of real-world performance, they have to offer similar cues to those that learners will face in the real world. If they offer different cues—like almost all memorization questions do—they are just not relevant. And, from a learning standpoint (as opposed to a certification standpoint) memorization questions won’t spur spontaneous remembering through the triggering mechanism of the real-world cues.

This literal and figurative raising of the bar—to move it beyond memorization—should shake us to our core (especially since this is one of the few books on assessment that covers legal stuff—so it may have some evidentiary heft in court). If the compliance tests at the end of your e-learning programs are based on memorization questions, you are so in trouble. If your credentialing is based on completion (and 85% of our respondents in the eLearning Guild research report said they utilized completion as a learning measure), you are in even worse trouble. And, of course, if you ever thought your memorization-level questions supported learning, well, sorry. They don’t! At least not as strongly as they might.

Have you bought the book yet? You should. You ought to at least have it around to show management (or your clients) why it’s important (absolutely freakin’ critical) to use high-value assessment items.

I’ve got some quibbles with the book as well. They list 6 reasons for testing. I’ve recently come up with 18, so it appears they’re missing some, or I’m drinking too much. I also don’t like the use of Bloom’s Taxonomy to index some of the recommendations. In short, Bloom’s has issues. I don’t like the way they talk about learning objectives. They use the methodology of relying on a single objective to guide the process of both instructional design and evaluation. I am now advocating to free instructional-design objectives from the crazy constraint of being super-glued to the evaluation objectives. They need to be linked of course, but not hog-tied. I wish they emphasized more strongly the distinction between testing to assess and testing to support learning. They are different animals and most of us are confused about this.

Overall, it’s a great and thoughtful book. I bought it. You should too.

Here’s a link that will let you click here to buy.

The Learning Measurement Series will continue in January…

(But watch to see who wins this year’s Neon Elephant Award, which I’ll announce on Saturday (December 22nd 2007). The winner(s) is/are all about learning measurement.)

In the eLearning Guild report I worked on with several other brilliant authors (SMILE), we asked e-learning professionals whether they were happy with the learning measurement they were able to do. Here’s what they said (All the data reported in this blog post is for respondents who create e-learning for workers in their own organizations. The Guild’s powerful database technology makes it possible to split the data in different ways).

In general, are people able to do the learning measurement they want to? See the graph below.

Guild_q9_dec2007

Only about 17 percent were happy with their current measurement practices. About 73 percent wanted to be able to do MORE or BETTER measurement. Clearly there is a lot of frustration.

In fact, one of the top reasons people say they can’t do the measurement they want to is they don’t have the knowledge or expertise to do it. It’s in a virtual dead heat for the third most important reason given. See the diagram below.

Guild_q10_dec2007

The question then becomes, if people don’t have the expertise to do measurement they way they want to, do they hire expertise from outside their organizations? A full 88.8% said they did all their measurement themselves. Wow! The graph below could not be more striking.

Guild_q13_dec2007

When we asked people what kind of expertise they do utilize—whether it was in house or contracted—they told us the following (I added a color-coded legend at the top):

Guild_q12_dec2007

Most of the folks doing learning measurement are instructional designers and developers with no particular expertise in measuring learning. A full 84% of respondents indicated that non-expert instructional designers are doing measurement at their organizations. Only 51.7% of respondents said their organization uses instructional developers with some advanced education. Only 20% of organizations have masters-level degrees in measurement. Only 6.7% utilize doctoral-level experts.

Note that when we look only at organizations that claim to be getting "high value" from doing learning measurement versus all the others, the results are intriguing.

Respondents Reporting the Level of Value
They got from their Measurement Efforts

Less Than High Value

High Value for Measurement

Percentage of Respondents saying they Utilize People with
MASTERS DEGREES ON STAFF.

16.4%

31.4%

Percentage of Respondents saying they Utilize People with
DOCTORATES ON STAFF.

5.6%

10.1%

Percentage of Respondents saying they Utilize People with
MASTERS DEGREES HIRED FROM OUTSIDE.

3.7%

11.8%

Percentage of Respondents saying they Utilize People with
DOCTORATES HIRED FROM OUTSIDE.

3.4%

6.1%

Wow!! Those folks who think they are getting very high value for their measurement efforts utilize people with masters degrees on staff more than 91% more compared with those reporting less than high value for their measurement efforts. The high-value people also utilize more doctoral degrees on staff compared to the non-high-value people by 80%, more masters degrees hired from outside by 219%, and more doctoral degrees hired from outside by 79%. While this data is correlational and self-report data, it suggests some sort of relationship between the measurement expertise employed and the level of value an organization can get for their learning measurement.

Summary

To recap, in a survey of over 900 e-learning professionals, many are frustrated because they want to do more/better measurement. A significant portion of their frustration results from not having the expertise to do measurement correctly. They have very few high-level measurement experts on staff, and they hire almost nobody from the outside to help them.

What the hell is wrong with this picture?

It confirms for me that learning measurement is just not given the importance it deserves.

More to come tomorrow in the Learning Measurement series…

There are basically four types of software tools that can be used for developing instruments to measure learning. There are tools that are dedicated measurement-development tools, for example Questionmark’s Perception. There are e-learning authoring tools that offer an assessment-development capability, for example Adobe’s Captivate. There are learning content management systems, for example, Blackboard’s Academic Suite. And finally, there are general purpose software-development tools, for example, Adobe Flash Professional. To put into a list form:

  1. Dedicated Assessment-Development Tools
  2. E-Learning Authoring Tools
  3. Learning Content Management Systems
  4. General-Purpose Software-Development Tools

When we asked e-learning professionals from the eLearning Guild membership about their use of tools in developing learning-measurement instruments, they told us an interesting story.

Specifically, we asked them, "What PRIMARY tool do you use to develop your measurement instruments?"

The most popular two answers were (1) we didn’t use a tool, and (2) we used a tool developed in-house. See the graph below.

Guild_q14_dec2007

When we broke this down by corporate and education audiences, and looked at product market shares, other interesting findings appear.

Take a look at the corporate results, excluding all education and government respondents:

Guild_toolsmktsharecorp_dec2007

Adobe’s Captivate dominates with over 50% of the market share—that is, over 50% of respondents said they used Captivate to develop their measurement instruments (they may have used other tools). Even more telling is that six of the top seven items are authoring tools or part of authoring tool suites. You read that right. Authoring tools are by far, by far, by far the way people develop assessment items in the corporate e-learning space. Only Questionmark’s Perception and Adobe’s Flash Professional sneak into the top nine responses before "Other" takes the tenth spot.

This makes sense if we assume, like a dismal economist, that people do what is easy to do. Our authoring tools remind us to add questions, so we add questions. It also tells me that maybe our field puts very little value on measuring learning if our behavior is so controlled by our surroundings that we don’t look further than our authoring tools. Or, could it be that our authoring tools provide us with all we need?

Let’s take a look at the education (with some government) results. (Note to those using the eLearning Guild’s Direct Data Access capability: I filtered only for students, interns, academics, and practitioners).

Guild_toolsmktshareeduc_dec2007

The results for the education results are interesting as well, especially as compared with the corporate results. Note how many dedicated-assessment tools in the top ten. There are three (Respondus, StudyMate, and Questionmark’s Perception). So perhaps educators care a little bit more about testing. Okay, that makes sense too. Still, there are a lot of e-learning authoring tools at the top, with Captivate dominating again.

The Leverage Point

The clearest conclusion I will draw from this data is that to improve our e-learning assessment practices, we need to do it at the one clear leverage point—at the one point that we seem to think about measurement the most—in our authoring tools. How might this work:

  1. Okay, we could just train people to create better measurement instruments with the idea that they’ll use that information the next time they boot up their authoring tool.
  2. Better would be to train them to create better measurement instruments while they are using their authoring tool. And give them practice as well, with feedback, etc. You learning researchers will be chanting "encoding specificity" and "transfer-appropriate processing" and those of you who have ever had one of my workshops on the learning research will be thinking of "aligning the learning and performance contexts" to "create spontaneous remembering."
  3. Better would be to develop job aids indexed to different screen shots of the authoring tool.
  4. Better would be for the authoring tools to be seeded with performance-support tools that encouraged people to utilize better measurement practices.

Oh crap. The best way to do this is to get the authoring-tool developers to take responsibility for better measurement and better product design. Entrepreneurial minded readers will be thinking about all kinds of business opportunities. Hey Silke, how about giving me a call? SMILE.

Not much of this is going to happen anytime soon, is my guess. So, besides engaging someone like me to train your folks in how to create more authentic assessments, you’re pretty much on your own.

And we know that’s not going to happen either. At least that’s what the data shows. Hardly anybody brings in outside experts to help with learning measurement.

I guess somebody thinks it’s just not that important.

More on this as the series continues…

The data above was generated by a group of folks working through the eLearning Guild. The report we created is available by clicking here.

Here’s some more detail:

The eLearning Guild Report

The eLearning Guild report, "Measuring Success," is FREE to Guild members and to those who complete the research survey, even if not a member. 

Disclaimer: I led the surveying and content efforts on the research report and was paid a small stipend for contributing my time, however, I will receive nothing from sales of the report. I recommend the report because it offers unique and valuable information, including wisdom from such stars as Allison Rossett (the Allison Rossett), Sharon Shrock, Bill Coscarelli, (both of Criterion-Referenced Testing fame) James Ong (at Stettler Henke where he leads in efforts of measuring learning results through comprehensive simulations), Roy Pollock (Chief Learning Officer at Fort Hill Company, which is providing innovative software and industry-leading ideas to support training transfer), Maggie Martinez (CEO of The Training Place, specializing in learning assessment and design), Brent Schenkler (a learning-technology guru at the eLearning Guild), and the incomparable Steve Wexler (The eLearning Guild’s Research Director, research-database wizard, publishing magnate, and tireless calico cat herder).

How to Get the Reports

   1. eLearning Guild Measuring Success (Free to Most Guild Members)

  • If Member (Member+ or Premium): Just Click Here
  • If Associate Member, Take measurement survey, then access report.
  • If Non-member, Become associate member, take measurement survey, then access report.

    2. My Report, Measuring Learning Results: Click through to My Catalog

More tomorrow in the Learning Measurement Series…

What can e-learning add to measurement?

What can e-learning add to measurement? Does e-learning have unique capabilities that enable it to improve learning measurement? I think it does. Here’s a short list:

  1. E-learning can capture more data more easily than classroom training.
  2. E-learning can capture data during the learning program—not just at the end of the learning event—in a manner that the learners feel is a seamless and natural part of the event.
  3. E-learning can track incoming proficiency through the use of pretests to determine whether the learning program actually meets a need, or determine who it meets a need for.
  4. E-learning can collect data in a manner that can give learners comparison data while they complete an assessment.
  5. E-learning can collect data on learner behaviors during the learning (for example, the click journey, time per screen, etc.)
  6. E-learning can track pretest to posttest changes.
  7. E-learning can randomly assign learners to program versions, making methodological comparisons possible. For example, a program version that uses immediate feedback can be compared to a program version using delayed feedback to determine which method is more effective.
  8. E-learning can capture on-the-job performance data, including learners’ self-ratings, manager ratings, direct-report ratings, etc. This capability puts the focus on on-the-job performance, where benefits can accrue from management oversight and coaching, self-initiated development, and peer learning.
  9. E-learning, because it can access and track learners at more than one point in time, can measure how well the learning intervention has performed in creating long-term remembering.
  10. E-learning can capture data even when learners don’t know the learning program is being assessed. For example, the learning program can capture data when the learners think they are simply getting practice on the learning material.
  11. E-learning can track learners as they move from the training event to the workplace. For example, e-learning programs can track learners’ goals to implement what they have learned to see how successful they have been in transferring the learning to the job.

With this power, comes responsibility, and a damn fun challenge. You can read my call-to-action later in this series and in the e-learning Guild Research Report as well.

The Measurement Series Continues Tomorrow…

To lead into the weekend, let me hot-wax poetic:

Measurement is like a magnetically alluring supermodel we might see across the room at a party. We want to stare and absorb every curve of muscle, every glowing inch of skin. Yet, our primordial core forces our eyes away, perhaps ashamed of our own imperfections, perhaps following some failed inner calculus of future possibilities. The apparition seems impossible to grasp, so we turn away. With another opportunity lost, learning measurement keeps its mystery, its danger, and its transcendent ability to lift our practices to their highest potential.

Add your comments to analyze the paragraph above. What do you think I’m trying to say about the state of learning measurement in our industry? About our reasons for failing in this regard? What am I missing? What am I saying about supermodels? Add your own purple prose, poetry, etc., using the comments function below.

Note: Both men and women can be supermodels.

This quote comes directly out of the eLearning Guild Report.

Measurement Series Continues on Monday…

Learning is a Many Splendid Thing

In the eLearning Guild Report, I outlined 18 separate reasons we might want to measure learning:

To Support the Learners in Learning and Performance

1. To encourage learners to study.
2. To give learners feedback on their learning progress.
3. To help learners better understand the concepts being taught by giving them tests of understanding and follow-up feedback.
4. To provide learners with additional retrieval practice (to support long-term retrieval).
5. To give successful assessment-takers a sense of accomplishment, a sense of being special, and/or a feeling of being in a privileged group.
6. To increase the likelihood that the learning is implemented later.

To Support Certification, Credentialing, or Compliance

7. To assign learners with grades or give them a passing score.
8. To enable learners to earn credentials.
9. To document legal or regulatory compliance.

To Provide Learning Professionals (i.e., instructors/developers) with Information

10. To provide instructors with feedback on learning.
11. To provide instructional designers/developers with feedback.
12. To diagnose future learning needs.

To Provide Additional Information

13. To provide learners’ managers  with feedback and information.
14. To provide other organizational stakeholders with information.
15. To examine the organizational impacts of learning.
16. To compare one learning intervention to an alternative one.
17. To calculate return-on-investment of the learning program.
18. To collect data to sell or market the learning program.

Problems

The report details many problems this can cause, but to briefly explain this now, the basic problem is that we often (a) confuse one goal for another, (b) use an assessment design that is ill-suited for meeting the goal or goals we want to assess, or (c) limit our outcomes because we don’t realize what is possible.

The eLearning Guild Report

The eLearning Guild report, “Measuring Success,” is FREE to Guild members and to those who complete the research survey, even if not a member. 

Disclaimer: I led the surveying and content efforts on the research report and was paid a small stipend for contributing my time, however, I will receive nothing from sales of the report. I recommend the report because it offers unique and valuable information, including wisdom from such stars as Allison Rossett (the Allison Rossett), Sharon Shrock, Bill Coscarelli, (both of Criterion-Referenced Testing fame) James Ong (at Stettler Henke where he leads in efforts of measuring learning results through comprehensive simulations), Roy Pollock (Chief Learning Officer at Fort Hill Company, which is providing innovative software and industry-leading ideas to support training transfer), Maggie Martinez (CEO of The Training Place, specializing in learning assessment and design), Brent Schenkler (a learning-technology guru at the eLearning Guild), and the incomparable Steve Wexler (The eLearning Guild’s Research Director, research-database wizard, publishing magnate, and tireless calico cat herder).

How to Get the Reports

   1. eLearning Guild Measuring Success (Free to Most Guild Members)

  • If Member (Member+ or Premium): Just Click Here
  • If Associate Member, Take measurement survey, then access report.
  • If Non-member, Become associate member, take measurement survey, then access report.

    2. My Report, Measuring Learning Results: Click through to My Catalog

The Series Continues Tomorrow…

Let me start out by saying that I don’t know everything about learning measurement. It’s a topic that is too deep and complicated for easy answers.

My value-add is that I look at measurement from the standpoint of how learning works. As far as I can tell, this is a perspective that is new to most of our field’s discussions of measurement. This is ironic of course, because it’s learning measurement we’re talking about. So for example, when we know that learning begets forgetting, why is it that we measure learning before forgetting might even have an effect—thereby biasing the results ridiculously in our favor?

The second unique perspective I’m adding to the conversation is the importance of predicting future retrieval. I argue that we must validly predict future retrieval to give us good feedback about how well our learning programs are working. We do an absolutely terrible job of this now.

Finally, I’d like to think that I am pushing us beyond the conceptual prison engendered by our old models and methods. It’s not that these models and methods are bad. It’s that most of us—me included—have had a seriously difficult time thinking outside the boundaries of the models’ constraints. Let me use the Donald Kirkpatrick model as an example. Others may beat up on it, expand it, or expound on it for pleasure or treasure, but it’s a great model. It helps us make sense of the world by simplifying the confusion. But the model, by itself, doesn’t tell us everything we need to know about learning measurement. It certainly shouldn’t be seen as a prescription for how we measure. It is simply too crude for that.

Three Biases in Measuring Learning at Level 2

  1. Measuring Immediately After Learning. A very high percentage of learning measurement is done immediately at the end of our learning events (In the eLearning Guild research, we found about 90% of e-learning measurement was done right at the end of learning). Immediate tests of learning only make sense if you don’t care about the learner’s future ability to retrieve information. When we measure “learning” what we really want to do is measure “retrieval.” Moreover, what we really care about is future retrieval. Isn’t that the goal of learning after all? We want our learners to learn something so they can retrieve it and use the information later. I detail this point in much greater depth in the Guild research report and in even more depth in my report, Measuring Learning Results… By the way, the Guild report is free to Guild members and to those who complete the survey.
  2. Measuring in the Same Context Where Learning Took Place. A high percentage of training is measured in the learning context (about 92% in the same or similar context in the Guild research). Unfortunately, context influences retrieval, and so when we measure in the learning context we bias our measurement results. Oh, and we bias them in our favor, again. So for example, in the classic research study, Smith, Glenberg, and Bjork (1978) found that when learners were tested in the same room in which they learned, they were able to recall more than 50% more than when they were tested in a different room from where they learned. This amount of bias is well within the bounds of the Barry Bonds level of cheating!! Would you vote yourself into the Hall of Fame?
  3. Measuring Using Inauthentic Assessment Items (like Memorization). Most assessment items purporting to measure learning use memorization questions. Asking learners to simply retrieve what they have learned is bad assessment design because memorization is generally unrelated to future retrieval. So, if we test memorization, we know nothing (or very little) regarding whether our learners will be able to retrieve what is truly important. Sharon Shrock and Bill Coscarelli, two of my co-authors in the eLearning Guild Research Report, highlight the problems of memorization in the third edition of their excellent book, Criterion-Referenced Testing… One of the goals of criterion-referenced testing is to determine whether a learner can be “certified” as competent or knowledgeable about a particular topic area. Schrock and Coscarelli argue that only assessments done on (a) real-world tasks, (b) simulations, and (c) scenarios can be validly used for certification decisions, whereas memorization cannot be used. This is a change from the second edition of their book and provides a paradigmatic shift in our field. In future posts in this series, I will highlight my taxonomy of authenticity for assessment questions that follows up on Schrock and Coscarelli’s thoughtful certification ideas.

The Series Continues Tomorrow…

References:

Shrock, S. A., & Coscarelli, W. C. (2007). Criterion-Referenced Test Development: Technical and Legal Guidelines for Corporate Training (Third Edition). San Francisco: Pfeiffer.

Smith, S. M., Glenberg, A., & Bjork, R. A. (1978). Environmental context and human memory. Memory & Cognition, 6, 342-353.

First of a Series

This is the first of a long series of blog entries devoted to the topic of learning measurement that I will offer over the next two weeks.

This series draws from my recent thinking on learning measurement and from my 2007 publication, Measuring Learning Results… It also introduces the findings from a remarkable research study that I participated in with the eLearning Guild and several other illustrious authors.

For the last year, I have spent many weeks devoted to rethinking the topic of learning measurement from the standpoint of the learning research. My research-to-practice report, Measuring learning results: Creating fair and valid assessments by considering findings from fundamental learning research, highlights the flaws in the current methods we use to measure learning results—and offers recommendations for how to improve our measurement practices. This report is available on my catalog. See below.

Why does Will Thalheimer Care about Measurement?

Why do I—a person who has spent the last 10 years attempting to bring fundamental learning research into focus—want to spend my “research time” on learning measurement?

Here’s why:

  1. The performance of the learning-and-performance field is severely deficient—often creating learning that is not remembered and/or not utilized on the job.
  2. Of the forces that control and influence our industry and the practices we use, measurement is one of the most critical.
  3. Currently, our measurement practices provide us with poor and biased feedback about our performance as learning-and-performance professionals.
  4. Because we do poor measurement, we don’t get good feedback (nor do our stakeholders), and so we have very little motivation to critically examine our practices—and improve them as valid feedback would suggest.

To put it simply, if we don’t measure better, we will continue to underperform—and we’ll continue to underserve our learners and organizations.

The eLearning Guild Report

The eLearning Guild report, “Measuring Success,” is FREE to Guild members and to those who complete the research survey, even if not a member.

Also available, at $1,895 ($1,950 if you are not a member), is Direct Data Access (DDA) to the database of research results , including the ability to filter the results based on a variety of factors, including the survey respondents’ experience, industry, country, job title, etc. These Direct Data Access reports will be invaluable for vendors who want to know how well their products are rated on a number of dimensions (more on this later in this series), and valuable to for those who want to benchmark their efforts against other organizations that are similar to theirs. If you want to make a case for improving your measurement practices, you absolutely have to buy direct data access.

Disclaimer: I led the surveying and content efforts on the research report and was paid a small stipend for contributing my time, however, I will receive nothing from sales of the report. I recommend the report because it offers unique and valuable information, including wisdom from such stars as Allison Rossett (the Allison Rossett), Sharon Shrock, Bill Coscarelli, (both of Criterion-Referenced Testing fame) James Ong (at Stettler Henke where he leads in efforts of measuring learning results through comprehensive simulations), Roy Pollock (Chief Learning Officer at Fort Hill Company, which is providing innovative software and industry-leading ideas to support training transfer), Maggie Martinez (CEO of The Training Place, specializing in learning assessment and design), Brent Schenkler (a learning-technology guru at the eLearning Guild), and the incomparable Steve Wexler (The eLearning Guild’s Research Director, research-database wizard, publishing magnate, and tireless calico cat herder).

How to Get the Reports

   1. eLearning Guild Measuring Success (Free to Most Guild Members)

  • If Member (Member+ or Premium): Just Click Here
  • If Associate Member, Take measurement survey, then access report.
  • If Non-member, Become associate member, take measurement survey, then access report.

    2. My Report, Measuring Learning Results: Click through to My Catalog

The Series Continues Tomorrow…