Dani Johnson at RedThread Research has just released a wonderful synopsis of Learning Evaluation Models. Comprehensive, Thoughtful, Well-Researched! It also has suggestions of articles to read!!!

This work is part of an ongoing effort to research the learning-evaluation space. With research sponsored by the folks at the adroit learning-evaluation company forMetris, RedThread is looking to uncover new insights about the way we do workplace learning evaluation.

Here’s what Dani says in her summary:

“What we hoped to see in the literature were new ideas – different ways of defining impact for the different conditions we find ourselves in. And while we did see some, the majority of what we read can be described as same. Same trends and themes based on the same models with little variation.”

 

“While we do not disparage any of the great work that has been done in the area of learning measurement and evaluation, many of the models and constructs are over 50 years old, and many of the ideas are equally as old.

On the whole, the literature on learning measurement and evaluation failed to take into account that the world has shifted – from the attitudes of our employees to the tools available to develop them to the opportunities we have to measure. Many articles focused on shoe-horning many of the new challenges L&D functions face into old constructs and models.”

 

“Of the literature we reviewed, several pieces stood out to us. Each of the following authors [detailed in the summary] and their work contained information that we found useful and mind-changing. We learned from their perspectives and encourage you to do the same.”

 

I also encourage you to look at this great review! You can see the summary here.

 

 

As I preach in my workshops on how to create better learner-survey questions (for example my Gold-Certification workshop on Performance-Focused Smile Sheets), open-ended comment questions are very powerful questions. Indeed, they are critical in our attempts to truly understand our learners’ perspectives.

Unfortunately, to get the most benefit from comment questions, we have to take time to read every response and reflect on the meaning of all the comments taken together. Someday AI may be able to help us parse comment-question data, but currently the technology is not ready to give us a full understanding. Nor are word clouds or other basic text-processing algorithms useful enough to provide valid insights into our data.

It’s good to take the time in analyzing our comment-question data, but if there was a way to quickly get a sense of comment data, wouldn’t we consider using it? Of course!

As most of you know, I’ve been focusing a lot of my attention on learning evaluation over the last few years. While I’ve learned a lot, have been lauded by others as an evaluation thought leader, and have even created some useful innovations like LTEM, I’m still learning. Today, by filling out a survey after going to a CVS MinuteClinic to get a vaccine shot, I learned something pretty cool. Take a look.

This is a question on their survey, delivered to me right after I’d answered a comment question. This gives the survey analyzers a way to quickly categorize the comments. It DOES NOT REPLACE, or should not replace, a deeper look at the comments (for example, my comment was very specific and useful i hope), but it does enable us to ascribe some overall meaning to the results.

Note that this is similar to what I’ve been calling a hybrid question, where we first give people a forced-choice question and then give them a comment question. The forced choice question drives clarity whereas the follow-up comment question enables more specificity and richness.

One warning! Adding a forced choice question after a comment question should be seen as a tool in our toolbox. Let’s not overuse it. More pointedly, let’s use it when it is particularly appropriate.

If we’ve asked two open-ended comment questions—one asking for positive feedback and one asking for constructive criticism—we might not need a follow-up forced choice question, because we’ve already prompted respondents to give us the good and the bad.

The bottom line is that we now have two types of hybrid questions to add to our toolbox:

  1. Forced-choice question followed by clarifying comment question.
  2. Comment question followed by categorizing forced-choice question.

Freakin’ Awesome!

 

Donald Taylor, learning-industry visionary, has just come out with his annual Global Sentiment Survey asking practitioners in the field what topics are the most important right now. The thing that struck me is that the results show that data is becoming more and more important to people, especially as represented in adaptive learning through personalization, artificial intelligence, and learning analytics.

Learning analytics was most important category for the opinion leaders represented in social media. This seems right to me as someone who will be focused mostly on learning evaluation in 2019.

As Don said in the GoodPractice podcast with Ross Dickie and Owen Ferguson, “We don’t have to prove. We have to improve through learning analytics.”

What I love about Don Taylor’s work here is that he’s clear as sunshine about the strengths and limitations of this survey—and, most importantly, that he takes the time to explain what things mean without over-hyping and slight-of-hand. It’s a really simple survey, but the results are fascinating—not necessarily about what we should be doing, but what people in our field think we should be paying attention to. This kind of information is critical to all of us who might need to persuade our teams and stakeholders on how we can be most effective in our learning interventions.

Other findings:

  • Businessy-stuff fell in rated importance, for example, “consulting more deeply in the business,” “showing value,” and “developing the L&D function.”
  • Neuroscience/Cognitive Science fell in importance (most likely I think because some folks have been debunking the neuroscience-and-learning connections). And note: These should not be one category really, especially given that people in the know know that cognitive science, or more generally learning research, has shown to have proven value. Neuroscience not so much.
  • Mobile delivery and artificial intelligence were to two biggest gainers in terms of popularity.
  • Very intriguing that people active on social media (perhaps thought leaders, perhaps the opinionated mob) have different views that a more general population of workplace learning professionals. There is an interesting analysis in the book and a nice discussion in the podcast mentioned above.

For those interested in Don Taylor’s work, check out his website.

 

I’d like to announce that the first certification workshop for my new Work-Learning Academy is almost ready to launch. The first course? Naturally, it’s a course on how to create effective learner surveys—on Performance-Focused Smile Sheets.

I’m thrilled—ecstatic really—because I’ve wanted to do something like this for years and years, but the elements weren’t quite available. I’ve always wanted to provide an online workshop, but the tools tended to push toward just making presentations. As a learning expert, I knew mere presentations—even if they include discussions and some minimal interactions like polling questions—just weren’t good enough to create real learning benefits. I’ve also always wanted a way to provide a meaningful credential—one that was actually worth something, one that went beyond giving people credit for attendance and completion. Finally, I figured out how to bring this all together

And note that LTEM (the Learning-Transfer Evaluation Model), helped me clarify my credentialing strategy. You can read about using LTEM for credentialing here, but, in short, our entry-level certification—our Gold Certification—requires learners to pass a rigorous LTEM Tier-5 assessment, demonstrating competence through realistic decision-making. Those interested in the next level credential—our Master Certification—will have to prove their competence at an LTEM Tier-6 designation. Further certification levels—our Artisan Certification and Research Certification—will require competence demonstrated at Tier-7 and/or Tier-8.

 

For over 20 years, I’ve been plying my research-to-practice craft through Work-Learning Research, Inc. I’m thrilled to announce that I’ll be certifying our first set of Gold Credential professionals within a few months. If you’d like to sign up to be notified when the credential workshop is available—or just learn more—follow this link:

Click here to go to our
Work-Learning Academy information page

LTEM, the Learning-Transfer Evaluation Model, was designed as an alternative to the Kirkpatrick-Katzell Four-Level Model of learning evaluation. It was designed specifically to better align learning evaluation with the science of human learning. One way in which LTEM is superior to the Four-Level Model is in the way it highlights gradations of learning outcomes. Where the Four-Level model crammed all “Learning” outcomes into one box (that is, “Level 2”), LTEM separates learning outcomes into Tier-4 Knowledge, Tier-5 Decision-Making Competence, and Tier-6 Task Competence. This simple, yet incredibly powerful categorization, changes everything in terms of learning evaluation. First and foremost, it pushes us to go beyond inconsequential knowledge checks in our learning evaluations (and in our learning designs as well). To learn more about how LTEM creates additional benefits, you can click on this link, where you can access the model and a 34-page report for free, compliments of  me, Will Thalheimer, and Work-Learning Research, Inc.

Using LTEM in Credentialing

LTEM can also be used in credentialing—or less formally in specifying the rigorousness of our learning experiences. So for example, if our training course only asks questions about terminology or facts in its assessments, than we can say that the course provides a Tier-4 credential. If our course asks learners to successfully complete a series of scenario-based decisions, we can say that the course provides a Tier-5 credential.

Wow! Think of the power of naming the credential level of our learning experiences. Not only will it give us—and our business stakeholders—a clear sense of the strength of our learning initiatives, but it will drive our instructional designs to meet high standards of effectiveness. It will also begin to set the bar higher. Let’s admit a dirty truth. Too many of our training programs are just warmed-over presentations that do very little to help our learners make critical decisions or improve their actual skills. By focusing on credentialing, we focus on effectiveness!

 

Using LTEM Credentialing at Work-Learning Research

For the last several months, I’ve been developing an online course to teach learning professionals how to transform their learner surveys into Performance-Focused Smile Sheets. As part of this development process, I realized that I needed more than one learning experience—at least one to introduce the topic and one to give people extensive practice. I also wanted to provide people with a credential each time they successfully completed a learning experience. Finally, I wanted to make the credential meaningful. As the LTEM model suggests, attendance is NOT a meaningful benchmark. Neither is learner satisfaction. Nor is knowledge regurgitation.

Suddenly, it struck me. LTEM already provided a perfect delineation for meaningful credentialing. Tier-5 Decision-Making Competence would provide credentialing for the first learning experience. For people to earn their credential they would have to perform successfully in responding to realistic decision-making scenarios. Tier-6 Task Competence would provide credentialing for the second, application-focused learning experience. Additional credentials would only be earned if people could show results at Tier-7 and/or Tier-8 (Transfer to Work Performance and associated Transfer Effects).

 

 

The Gold-Certification Workshop is now ready for enrollment. The Master-Certification Workshop is coming soon! You can keep up to date or enroll now by going to the Work-Learning Academy page.

 

How You Can Use LTEM Credentialing to Assess Learning Experiences that Don’t Use LTEM

LTEM is practically brand new, having only been released to the public a year ago. So, while many organizations are gaining a competitive advantage by exploring its use, most of our learning infrastructure has yet to be transformed. In this transitional period, each of us has to use our wisdom to assess what’s already out there. How about you give it a try?

Two-Day Classroom Workshop — What Tier Credential?

What about a two-day workshop that gives people credit for completing the experience? Where would that be on the LTEM framework?

Here’s a graphic to help. Or you can access the full model by clicking here.

The two-day workshop would be credentialed at a Tier-1 level, signifying that the experience credentials learners by measuring their attendance or completion.

Two-Day Classroom Workshop with Posttest — What Tier Credential?

What if the same two-day workshop also added a test focused on whether the learners understood the content—and provided the test a week after the program. Note that in the LTEM model, credentialing is encouraged at Tiers 4, 5, and 6 to include assessments that show learners are able to remember, not just comprehend in the short term.

If the workshop added this posttest, we’d credential it at Tier-4, Knowledge Retention.

Half-Day Online Program with Performance-Focused Smile Sheet — What Tier Credential?

What if there was a half day workshop that used one of my Performance-Focused Smile Sheets to evaluate success. At what Tier would this be credentialed?

It would be credentialed at Tier-3, or Tier-3A if we wanted to delineate between learner surveys that assess learning effectiveness and those that don’t.

Three-Session Online Program with Traditional Smile Sheet — What Tier Credential?

This format—using three 90-minute sessions with a traditional smile sheet—is the most common form of credentialing in the workplace learning industry right now. Go look around at those that are providing credentials. They are providing credentials using relatively short presentations and a smile sheet at the end. If this is what they provide, what credentialing Tier do they deserve? Tier-3 or Tier-3B! That’s right! That’s it. They only tell us that learners are satisfied with the learning experience. They don’t tell us whether they can make important decisions or whether they can utilize new skills.

What is this credential really worth?

You can decide for yourself, but I think it could be worth more, if only those making the money provided credentialing at Tier-5, Tier-6, and beyond.

With LTEM we can begin to demand more!

 

Work-Learning Research and Will Thalheimer can Help!

People tell me I need to stop giving stuff away for free, or at least I ought to be more proactive in seeking customers. So, this is a reminder that I am available to help you improve your learning and learning evaluation strategies and tactics. Please reach out to me at my nifty contact form by clicking here.

For years, we have used the Kirkpatrick-Katzell Four-Level Model to evaluate workplace learning. With this taxonomy as our guide, we have concluded that the most common form of learning evaluation is learner surveys, that the next most common evaluation is learning, then on-the-job behavior, then organizational results.

The truth is more complicated.

In some recent research I led with the eLearning Guild and Jane Bozarth, we used the LTEM model to look for further differentiation. We found it.

Here’s some of the insights from the graphic above:

  • Learner surveys are NOT the most common form of learning evaluation. Program completion and attendance are more common, being done on most training programs in about 83% of organizations.
  • Learners surveys are still very popular, with 72% of respondents saying that they are used in more than one-third of their learning programs.
  • When we measure learning, we go beyond simple quizzes and knowledge checks.
    • Tier 5 assessments, measuring the ability to make realistic decisions, were reported by 24% of respondents to be used in more than one-third of their learning programs.
    • Tier 6 assessments, measuring realistic task performance (during learning), were reported by about 32% of respondents to be used in more than one-third of their learning programs.
    • Unfortunately, we messed up and forgot to include an option on Tier 4 Knowledge questions. However, previous eLearning Guild research in the 2007, 2008, and 2010 found that the percentage of respondents who reported that they measured memory recall of critical information was 60%, 60%, and 63% respectively.
  • Only about 20% of respondents said their organizations are measuring work performance.
  • Only about 16% of respondents said their organizations are measuring the organizational results from learning.
  • Interestingly, where the Four-Level Model puts all types of Results into one bucket, the LTEM framework encourages us to look at other results besides business results.
    • About 12% said their organizations were looking at the effect of the learning on the learner’s success and well-being.
    • Only about 3% said they were measuring the effects of learning on coworkers/family/friends.
    • Only about 3% said they were measuring the effects of learning on the community or society (as has been recommended by Roger Kaufman for years).
    • Only about 1% reported measuring the effects of learning on the environs.

 

Opportunities

The biggest opportunity—or the juiciest low-hanging fruit—is that we can stop just using Tier-1 attendance and Tier-3 learner-perception measures.

We can also begin to go beyond our 60%-rate in measuring Tier-4 knowledge and do more Tier-5 and Tier-6 assessments. As I’ve advocated for years, Tier-5 assessments using well-constructed scenario-based questions are the perfect balance of power and cost. They are aligned with the research on learning, they have moderate costs in terms of resources, and learners see them as challenging and interesting rather than punitive and unhelpful like they often see knowledge checks.

We can also begin to emphasize more Tier-7 evaluations. Shouldn’t we know whether our learning interventions are actually transferring to the workplace? The same is true for Tier-8 measures. We should look for strategic opportunities here—being mindful to the incredible costs of doing good Tier-8 evaluations. We should also consider looking beyond business results—as these are not the only effects our learning interventions are having.

Finally, we can use LTEM to help guide our learning-development efforts and our learning evaluations. By using LTEM, we are prompted to see things that have been hidden from us for decades.

 

The Original eLearning Guild Report

To get the original eLearning Guild report, click here.

 

The LTEM Model

To get the LTEM Model and the 34-page report that goes with it, click here.

My Year In Review 2018—Engineering the Future of Learning Evaluation

In 2018, I shattered my collarbone and lay wasting for several months, but still, I think I had one of my best years in terms of the contributions I was able to make. This will certainly sound like hubris, and surely it is, but I can’t help but think that 2018 may go down as one of the most important years in learning evaluation’s long history. At the end of this post, I will get to my failures and regrets, but first I’d like to share just how consequential this year was in my thinking and work in learning evaluation.

It started in January when I published a decisive piece of investigative journalism showing that Donald Kirkpatrick was NOT the originator of the four-level model; that another man, Raymond Katzell, has deserved that honor all along. In February, I published a new evaluation model, LTEM (The Learning-Transfer Evaluation Model)—intended to replace the weak and harmful Kirkpatrick-Katzell Four-Level Model. Already, doctoral students are studying LTEM and organizations around the world are using LTEM to build more effective learning-evaluation strategies.

Publishing these two groundbreaking efforts would have made a great year, but because I still have so much to learn about evaluation, I was very active in exploring our practices—looking for their strengths and weaknesses. I led two research efforts (one with the eLearning Guild and one with my own organization, Work-Learning Research). The Guild research surveyed people like you and your learning-professional colleagues on their general evaluation practices. The Work-Learning Research effort focused specifically on our experiences as practitioners in surveying our learners for their feedback.

Also in 2018, I compiled and published a list of 54 common mistakes that get made in learning evaluation. I wrote an article on how to think about our business stakeholders in learning evaluation. I wrote a post on one of the biggest lies in learning evaluation—how we fool ourselves into thinking that learner feedback gives us definitive data on learning transfer and organizational results. It does not! I created a replacement for the problematic Net Promoter Score. I shared my updated smile-sheet questions, improving those originally put forth in my award winning book, Performance-Focused Smile Sheets. You can access all these publications below.

In my 2018 keynotes, conference sessions, and workshops, I recounted our decades-long frustrations in learning evaluation. We are clearly not happy with what we’ve been able to do in terms of learning evaluation. There are two reasons for this. First, learning evaluation is very complex and difficult to accomplish—doubly so given our severe resource constraints in terms of both budget and time. Second, our learning-evaluation tools are mostly substandard—enabling us to create vanity metrics but not enabling us to capture data in ways that help us, as learning professionals, make our most important decisions.

In 2019, I will continue my work in learning evaluation. I still have so much to unravel. If you see a bit of wisdom related to learning evaluation, please let me know.

Will’s Top Fifteen Publications for 2018

Let me provide a quick review of the top things I wrote this year:

  1. LTEM (The Learning-Transfer Evaluation Model)
    Although published by me in 2018, the model and accompanying 34-page report originated in work begun in 2016 and through the generous and brilliant feedback I received from Julie Dirksen, Clark Quinn, Roy Pollock, Adam Neaman, Yvon Dalat, Emma Weber, Scott Weersing, Mark Jenkins, Ingrid Guerra-Lopez, Rob Brinkerhoff, Trudy Mandeville, and Mike Rustici—as well as from attendees in the 2017 ISPI Design-Thinking conference and the 2018 Learning Technologies conference in London. LTEM is designed to replace the Kirkpatrick-Katzell Four-Level Model originally formulated in the 1950s. You can learn about the new model by clicking here.
  2. Raymond Katzell NOT Donald Kirkpatrick
    Raymond Katzell originated the Four-Level Model. Although Donald Kirkpatrick embraced accolades for the Four-Level Model, it turns out that Raymond Katzell was the true originator. I did an exhaustive investigation and offered a balanced interpretation of the facts. You can read the original piece by clicking here. Interestingly, none of our trade associations have reported on this finding. Why is that? LOL
  3. When Training Pollutes. Our Responsibility to Lessen the Environmental Damage of Training
    I wrote an article and placed it on LinkedIn and as far as I can tell, very few of us really want to think about this. But you can get started by reading the article (by clicking here).
  4. Fifty-Four Mistakes in Learning Evaluation
    Of course we as an industry make mistakes in learning evaluation, but who knew we made so many? I began compiling the list because I’d seen a good number of poor practices and false narratives about what is important in learning evaluation, but by the time I’d gotten my full list I was a bit dumbstruck by the magnitude of problem. I’ve come to believe that we are still in the dark ages of learning evaluation and we need a renaissance. This article will give you some targets for improvements. Click here to read it.
  5. New Research on Learning Evaluation — Conducted with The eLearning Guild
    The eLearning Guild and Dr. Jane Bozarth (the Guild’s Director of Research) asked me to lead a research effort to determine what practitioners in the learning/elearning field are thinking and doing in terms of learning evaluation. In a major report released about a month ago, we reveal findings on how people feel about the learning measurement they are able to do, the support they get from their organizations, and their feelings about their current level of evaluation competence. You can read a blog post I wrote highlighting one result from the report—that a full 40% of us are unhappy with what we are able to do in terms of learning evaluation. You can access the full report here (if you’re a Guild member) and an executive summary. Also, stay tuned to my blog or signup for my newsletter to see future posts about our findings.
  6. Current Practices in Gathering Learner Feedback
    We at Work-Learning Research, Inc. conducted a survey focused on gathering learner feedback (i.e., smile sheets, reaction forms, learner surveys) that spanned 2017 and 2018. Since the publication of my book, Performance-Focused Smile Sheets: A Radical Rethinking of a Dangerous Art Form, I’ve spent a ton of time helping organizations build more effective learner surveys and gauging common practices in the workplace learning field. This research survey continued that work. To read my exhaustive report, click here.
  7. One of the Biggest Lies in Learning Evaluation — Asking Learners about Level 3 and 4 (LTEM Tiers 7 and 8)
    This is big! One of the biggest lies in learning evaluation. It’s a lie we like to tell ourselves and a lie our learning-evaluation vendors like to tell us. If we ask our learners questions that relate to their job performance or the organizational impact of our learning programs we are NOT measuring at Kirkpatrick-Katzell Level 3 or 4 (or at LTEM Tiers 7 and 8), we are measuring at Level 1 and LTEM Tier 3. You can read this refutation here.
  8. Who Will Rule Our Conferences? Truth or Bad-Faith Vendors?
    What do you want from the trade organizations in the learning field? Probably “accurate information” is high on your list. But what happens when the information you get is biased and untrustworthy? Could. Never. Happen. Right? Read this article to see how bias might creep in.
  9. Snake Oil. The Story of Clark Stanley as Preface to Clark Quinn’s Excellent Book
    This was one of my favorite pieces of writing in 2018. Did I ever mention that I love writing and would consider giving this all up for a career as a writer? You’ve all heard of “snake oil” but if you don’t know where the term originated, you really ought to read this piece.
  10. Dealing with the Emotional Readiness of Our Learners — My Ski Accident Reflections
    I had a bad accident on the ski slopes in February this year and I got thinking about how our learners might not always be emotionally ready to learn. I don’t have answers in this piece, just reflections, which you can read about here.
  11. The Backfire Effect. Not the Big Worry We Thought it was (for Those Who Would Debunk Learning Myths)
    This article is for those interested in debunking and persuasion. The Backfire Effect was the finding that trying to persuade someone to stop believing a falsehood, might actually make them more inclined to believe the falsehood. The good news is that new research showed that this worry might be overblown. You can read more about this here (if you dare to be persuaded).
  12. Updated Smile-Sheet Questions for 2018
    I published a set of learner-survey questions in my 2016 book, and have been working with clients to use these questions and variations on these questions for over two years since then. I’ve learned a thing or two and so I published some improvements early last year. You can see those improvements here. And note, for 2019, I’ll be making additional improvements—so stay tuned! Remember, you can sign up to be notified of my news here.
  13. Replacement for NPS (The Net Promoter Score)
    NPS is all the rage. Still! Unfortunately, it’s a terribly bad question to include on a learner survey. The good news is that now there is an alternative, which you can see here.
  14. Neon Elephant Award for 2018 to Clark Quinn
    Every year, I give an award for a great research-to-practice contribution in the workplace learning field. This year’s winner is Clark Quinn. See why he won and check out his excellent resources here.
  15. New Debunker Club Website
    The Debunker Club is a group of people who have committed to debunking myths in the learning field and/or sharing research-based information. In 2018, working with a great team of volunteers, we revamped the Debunker Club website to help build a community of debunkers. We now have over 800 members from around the world. You can learn more about why The Debunker Club exists by clicking here. Also, feel free to join us!

 

My Final Reflections on 2018

I’m blessed to be supported by smart passionate clients and by some of the smartest friends and colleagues in the learning field. My Work-Learning Research practice turned 20 years old in 2018. Being a consultant—especially one who focuses on research-to-practice in the workplace learning field—is still a challenging yet emotionally rewarding endeavor. In 2018, I turned my attention almost fully to learning evaluation. You can read about my two-path evaluation approach here. One of my research surveys totally flopped this year. It was focused on the interface between us (as learning professionals) and our organizations’ senior leadership. I wanted to know if what we thought senior leadership wanted was what they actually wanted. Unfortunately, neither I nor any of the respondents could entice a senior leader to comment. Not one! If you or your organization has access to senior managers, I’d love to partner with you on this! Let me know. Indeed, this doesn’t even have to be research. If your CEO would be willing to trade his/her time letting me ask a few questions in exchange for my time answering questions about learning, elearning, learning evaluation, etc., I’d be freakin’ delighted! I failed this year in working out a deal with another evaluation-focused organization to merge our efforts. I was bummed about this failure as the synergies would have been great. I also failed in 2018 to cure myself of the tendency to miss important emails. If you ever can’t get in touch with me, try, try again! Thanks and apologies! I had a blast in 2018 speaking and keynoting at conferences—both big and small conferences. From doing variations on the Learning-Research Quiz Show (a rollicking good time) to talking about innovations in learning evaluation to presenting workshops on my learning-evaluation methods and the LTEM model. Good stuff, if a ton of work. Oh! I did fail again in 2018 turning my workshops into online workshops. I hope to do better in 2019. I also failed in 2018 in finishing up a research review of the training transfer research. I’m like 95% done, but still haven’t had a chance to finish.

2018 broke my body, made me unavailable for a couple of months, but overall, it turned out to be a pretty damn good year. 2019 looks promising too as I have plans to continue working on learning evaluation. It’s kind of interesting that we are still in the dark ages of learning evaluation. We as an industry, and me as a person, have a ton more to learn about learning evaluation. I plan to continue the journey. Please feel free to reach out and let me know what I can learn from you and your organization. And of course, because I need to pay the rent, let me say that I’d be delighted if you wanted me to help you or your organization. You can reach me through the Work-Learning Research contact form.

Thanks for reading and being interested in my work!!!

At a recent industry conference, a speaker, offering their expertise on learning evaluation, said this:

“As a discipline, we must look at the metrics that really matter… not to us but to the business we serve.”

Unfortunately, this is one of the most counterproductive memes in learning evaluation. It is counterproductive because it throws our profession under the bus. In this telling, we have no professional principles, no standards, no foundational ethics. We are servants, cleaning the floors the way we are instructed to clean them, even if we know a better way.

Year after year we hear from so-called industry thought leaders that our primary responsibility is to the organizations that pay us. This is a dangerous half truth. Of course we owe our organizations some fealty and of course we want to keep our jobs, but we also have professional obligations that go beyond this simple “tell-me-what-to-do” calculus.

This monomaniacal focus on measuring learning in terms of business outcomes reminds me of the management meme from the 1980s and 90s, that suggested that the goal of a business organization is to increase stakeholder value. This single-bottom-line focus has come under blistering attack for its tendency to skew business operations toward short-term results while ignoring long-term business results and for producing outcomes that harm employees, hurt customers, and destroy the environment.

If we give our business stakeholders the metrics they say that matter to them, but fail to capture the metrics that matter to our success as learning professionals in creating effective learning, then we not only fail ourselves and our learners but we fail our organization as well.

Evaluation What For?

To truly understand learning evaluation, we have to ask ourselves why we’re evaluating learning in the first place! We have to work backwards from the answer to this question.

Why does anyone evaluate? We evaluate to help us make better decisions and take better actions. It’s really that simple! So as learning professionals, we need information to help us make our most important decisions. We should evaluate to support these decisions!

What are our most important decisions? Here’s a few:

  • Which part of the content taught, if any, is relevant and helpful to supporting employees in doing their work? Which parts should be modified or discarded?
  • Which aspects of our learning designs are helpful in supporting comprehension, remembering, and motivation to learn? Which aspects should be modified or discarded?
  • Which after-training supports are helpful in enabling learning to be transferred and utilized by employees in their work? Which supports should be kept? Which need to be modified or discarded?

What are our organizational stakeholders’ most important decisions about learning? Here are a few:

  • Are our learning and development efforts creating optimal learning results? What additional support and resources should the organization supply that might improve learning results? What savings can be found in terms of support and resources—and are these savings worth the lost benefits?
  • Is the leadership of the learning and development function producing a cycle of continuous improvement, generating improved learning outcomes or generating learning outcomes optimized given their resource constraints? If not, can they be influenced to be better or should they be replaced?
  • Is the leadership of the learning and development function creating and utilizing evaluation metrics that enable the learning and development team to get valid feedback about the design factors that are most important in creating our learning results? If not, can they be influenced to use better metrics or should they be replaced?

Two Goals for Learning Evaluation

When we think of learning evaluation, we should have two goals. First, we should create learning-evaluation metrics that enable us to make our most important decisions regarding content, design components (i.e., focused at least on comprehension, remembering, motivation to apply learning), and after-training support. Second, we should do enough in our learning evaluations to gain sufficient credibility with our business stakeholders to continue our good work. Focusing only on the second of these is a recipe for disaster. 

Vanity Metrics

In the business start-up world there is a notion called “vanity metrics,” for example see warnings by Eric Ries, the originator of the lean startup movement. Vanity metrics are metrics that seem to be important, but that are not important. They are metrics that often make us look good even if the underlying data is not really meaningful.

Most calls to provide our business stakeholders with the metrics that matter to them result in beautiful visualizations and data dashboards that focus on vanity metrics. Ubiquitous vanity metrics in learning include the number of trainees trained, the cost per training, the estimates of learners for the value of the learning, complicated benefit/cost analyses of that utilize phantom measures of benefits, etc. By focusing only or primarily on these metrics we don’t have data to improve our learning designs, we don’t have data that enables us create cycles of improvement, we don’t have data that enables us to hold ourselves accountable.

Released Today: Research Report on Learning Evaluation Conducted with The eLearning Guild.

Report Title: Evaluating Learning: Insights from Learning Professionals.

I am delighted to announce that a research effort that I led in conjunction with Dr. Jane Bozarth and the eLearning Guild has been released today. I’ll be blogging about our findings over the next couple of months.

This is a major report — packed into 39 pages — and should be read by everyone in the workplace learning field interested in learning evaluation!

Just a teaser here:

We asked folks to consider the last three learning programs their units developed and to reflect on the learning-evaluation approaches they used.

While a majority were generally happy with their evaluation methods on these recent learning programs, about 40% where dissatisfied. Later, in a more general question about whether learning professionals are able to do the learning measurement they want to do, fully 52% said they were NOT able to do the kind of evaluation they thought was right to do.

In the full report, available only to Guild members, we dig down and explore the practices and perspectives that drive our learning-evaluation efforts. I encourage you to get the full report, as it touches on the methods we use, how we communicate with senior business leaders, what we’d like to do differently, and what we think we’re good at. Also, the report concludes with 12 powerful action strategies for getting the most out of our learning-evaluation efforts.

You can get the full report by clicking here.

 

 

I read a brilliantly clear article today by Karen Hao from the MIT Technology Review. It explains what machine learning is and provides a very clear diagram, which I really like.

Now, I am not a machine learning expert, but I have a hypothesis that has a ton of face validity when I look in the mirror. My hypothesis is this:

Machine learning will return meaningful results to the extent that the data it uses is representative of the domain of interest.

A simple thought experiment will demonstrate my point. If a learning machine is given data about professional baseball in the United States from 1890 to 2000, it would learn all kinds of things, including the benefits of pulling the ball as a batter. Pulling the ball occurs when a right-handed batter hits the ball to left field or a left-handed batter hits the ball to right field. In the long history of baseball, many hitters benefited by trying to pull the ball because it produces a more natural swing and one that generates more power. Starting in the 2000s, with the advent of advanced analytics that show where each player is likely to hit the ball, a maneuver called “the shift” has been used more and more, and pulling the ball consistently has become a disadvantage. In the shift, players in the field migrate to positions where the batter is most likely to hit the ball, thus negating the power benefits of pulling the ball. Our learning machine would not know about the decreased benefits of pulling the ball because it would never have seen that data (the data from 2000 to now).

Machine Learning about Learning

I raise this point because of the creeping danger in the world of learning and education. My concern is relevant to all domains where it is difficult to collect data on the most meaningful factors and outcomes, but where it is easy to collect data on less meaningful factors and outcomes. In such cases, our learning machines will only have access to the data that is easy to collect and will not have access to the data that is difficult or impossible to collect. People using machine learning on inadequate data sets will certainly find some interesting relationships in the data, but they will have no way of knowing what they’re missing. The worst part is that they’ll report out some fanciful finding, we’ll all jump up and down in excitement and then make bad decisions based on the bad learning caused by the incomplete data.

In the learning field—where trainers, instructional designers, elearning developers, and teachers reside—we have learned a great deal about research-based methods of improving learning results, but we don’t know everything. And, many of the factors which we know work are not tracked in most big data sets. Do we track the spacing effect, the number of concepts repeated with attention-grabbing variation, the alignment between context cues present in learning materials compared with the cues that will be present in our learners’ future performance situations? Ha! Our large data sets certainly miss many of these causal factors.

Our large data sets also fail to capture the most important outcomes metrics. Indeed, as I have been regularly recounting for years now, typical learning measurements are often biased by measuring immediately at the end of learning (before memories fade), by measuring in the learning context (where contextual cues offer inauthentic hints or subconscious triggering of recall targets), and by measuring with tests of low-level knowledge (compared to more relevant skill-focused decision-making or task performances). We also overwhelmingly rely on learner feedback surveys, both in workplace learning and in higher education. Learner surveys—at least traditional ones—have been found virtually uncorrelated with learning results. To use these meaningless metrics as a primary dependent variable (or just a variable) in a machine-learning data set is complete malpractice.

So if our machine learning data sets have a poor handle on both the inputs and outputs to learning, how can we see machine learning interpretations of learning data as anything but a shiny new alchemy?

 

Measurement Illuminates Some Things But Leaves Others Hidden

In my learning-evaluation workshops, I often show this image.

The theme expressed in the picture is relevant to all types of evaluation, but it is especially relevant for machine learning.

When we review our smile-sheet data, we should not fool ourselves into thinking that we have learned the truth about the success of our learning. When we see a beautiful data-visualized dashboard, we should not deceive ourselves and our organizations that what we see is all there is to see.

So it is with machine learning, especially in domains where the data is not all the data, where the data flawed, and where the boundaries on the full population of domain data are not known.

 

With Apologies to Karen Hao

I don’t know Karen, but I do love her diagram. It’s clear and makes some very cogent points—as does her accompanying article.

Here is her diagram, which you can see in the original at this URL.

Like measurement itself, I think the diagram illuminates some aspects of machine learning but fails to illuminate the danger of incomplete or unrepresentative data sets. So, I made a modification in the flow chart.

And yes, that seven-letter provocation is a new machine-learning term that arises from the data as I see it.

Corrective Feedback Welcome

As I said to start this invective, my hypothesis about machine learning and data is just that—a semi-educated hypothesis that deserves a review from people more knowledgeable than me about machine learning. So, what do you think machine learning gurus?

 

Karen Hao Responds

I’m so delighted! One day after I posted this, Karen Hao responded: