Posts

Original post appeared in 2011. I update it here.

Updated Article

When companies think of evaluation, they often first think of benchmarking their performance against other companies. There are important reasons to be skeptical of this type of approach, especially as a sole source of direction.

I often add this warning to my workshops on how to create more effective smile sheets: Watch out! There are vendors in the learning field who will attempt to convince you that you need to benchmark your smile sheets against your industry. You will spend (waste) a lot of money with these extra benchmarking efforts!

Two forms of benchmarking are common, (1) idea-generation, and (2) comparison. Idea-generation involves looking at other company’s methodologies and then assessing whether particular methods would work well at our company. This is a reasonable procedure only to the extent that we can tell whether the other companies have similar situations to ours and whether the methodologies have really been successful at those other companies.

Comparison benchmarking for training and development looks further at a multitude of learning methods and results and specifically attempts to find a wide range of other companies to benchmark against. This approach requires stringent attempts to create valid comparisons. This type of benchmarking is valuable only to the extent that we can determine whether we are comparing our results to good companies or bad and whether the comparison metrics are important in the first place.

Both types of benchmarking require exhaustive efforts and suffer from validity problems. It is just too easy to latch on to other company’s phantom results (i.e., results that seem impressive but evaporate upon close examination). Picking the right metrics are difficult (i.e., a business can be judged on its stock price, its revenues, profits, market share, etc.). Comparing companies between industries presents the proverbial apple-to-orange problem. It’s not always clear why one business is better than another (e.g., It is hard to know what really drives Apple Computer’s current success: its brand image, its products, its positioning versus its competitors, its leaders, its financial savvy, its customer service, its manufacturing, its project management, its sourcing, its hiring, or something else). Finally, and most pertinent here, it is extremely difficult to determine which companies are really using best practices (e.g., see Phil Rosenweig’s highly regarded book on The Halo Effect) because companies’ overall results usually cloud and obscure the on-the-job realities of what’s happening.

The difficulty of assessing best practices in general pales in comparison to the difficulties of assessing its training-and-development practices. The problem is that there just aren’t universally accepted and comparable metrics to utilize for training and development. Where baseball teams have wins and losses, runs scored, and such; and businesses have revenues and profits and the like; training and development efforts produce more fuzzy numbers—certainly ones that aren’t comparable from company to company. Reviews of the research literature on training evaluation have found very low levels of correlation (usually below .20) between different types of learning assessments (e.g., Alliger, Tannenbaum, Bennett, Traver, & Shotland, 1997; Sitzmann, Brown, Casper, Ely, & Zimmerman, 2008).

Of course, we shouldn’t dismiss all benchmarking efforts. Rigorous benchmarking efforts that are understood with a clear perspective can have value. Idea-generation brainstorming is probably more viable than a focus on comparison. By looking to other companies’ practices, we can gain insights and consider new ideas. Of course, we will want to be careful not to move toward the mediocre average instead of looking to excel.

The bottom line on benchmarking from other companies is: be careful, be willing to spend lots of time and money, and don’t rely on cross-company comparisons as your only indicator.

Finally, any results generated by brainstorming with other companies should be carefully considered and pilot-tested before too much investment is made.

 

Smile Sheet Issues

Both of the meta-analyses cited above found that smile sheets were correlated with an r = 0.09, which is virtually no correlation at all. I have detailed smile-sheet design problems in detail in my book, Performance-Focused Smile Sheets: A Radical Rethinking of a Dangerous Art Form. In short, most smile sheets focus on learner satisfaction, and fail to focus on factors related to actual learning effectiveness. Most smile sheets utilize Likert-like scales or numeric scales that offer learners very little granularity between answer choices, opening up responding to bias, fatigue, and disinterest. Finally, most learners have fundamental misunderstandings about their own learning (Brown, Roediger & McDaniel, 2014; Kirschner & van Merriënboer, 2013), so asking for their perceptions with general questions about their perceptions is too often a dubious undertaking.

The bottom line is that traditional smile sheets are providing almost everyone with meaningless data in terms of learning effectiveness. When we benchmark our smile sheets against other companies’ smile sheets we compound our problems.

 

Wisdom from Earlier Comments

Ryan Watkins, researcher and industry guru, wrote:

I would add to this argument that other companies are no more static than our own — thus if we implement in September 2011 what they are doing in March 2011 from our benchmarking study, then we are still behind the competition. They are continually changing and benchmarking will rarely help you get ahead. Just think of all the companies that tried to benchmark the iPod, only to later learn that Apple had moved on to the iPhone while the others were trying to “benchmark” what they were doing with the iPod. The competition may have made some money, but Apple continues to win the major market share.

Mike Kunkle, sales training and performance expert, wrote:

Having used benchmarking (carefully and prudently) with good success, I can’t agree with avoiding it, as your title suggests, but do agree with the majority of your cautions and your perspectives later in the post.

Nuance and context matter greatly, as do picking the right metrics to compare, and culture, which is harder to assess. 70/20/10 performance management somehow worked at GE under Welch’s leadership. I’ve seen it fail miserably at other companies and wouldn’t recommend it as a general approach to good people or performance management.

In the sales performance arena, at least, benchmarking against similar companies or competitors does provide real benefit, especially in decision-making about which solutions might yield the best improvement. Comparing your metrics to world-class competitors and calculating what it would mean to you to move in that direction, allows for focus and prioritization, in a sea of choices.

It becomes even more interesting when you can benchmark internally, though. I’ve always loved this series of examples by Sales Benchmark Index:
http://www.salesbenchmarkindex.com/Portals/23541/docs/why-should-a-sales-professional-care-about-sales-benchmarking.pdf

 

Citations

Alliger, Tannenbaum, Bennett, Traver, & Shotland (1997). A meta-analysis of the relations among training criteria. Personnel Psychology, 50, 341-357.

Brown, P. C., Roediger, H. L., III, & McDaniel, M. A. (2014). Make It Stick: The Science of Successful Learning. Cambridge, MA: Belknap Press of Harvard University Press.

Kirschner, P. A., & van Merriënboer, J. J. G. (2013). Do learners really know best? Urban legends in education. Educational Psychologist, 48(3), 169–183.

Sitzmann, T., Brown, K. G., Casper, W. J., Ely, K., & Zimmerman, R. D. (2008). A review and meta-analysis of the nomological network of trainee reactions. Journal of Applied Psychology, 93, 280-295.

OMG! The best deal ever for a full-day workshop on how to radically improve your smile-sheet designs! Sponsored by the Hampton Roads Chapter of ISPI. Free book and subscription-learning thread too!

 

Friday, June 10, 2016

Reed Integration

7007 Harbour View Blvd #117

Suffolk, VA

 

Click here to register now…

 

Performance Objectives:

By completing this workshop and the after-course subscription-learning thread, you will know how to:

  1. Avoid the three most troublesome biases in measuring learning.

  2. Persuade your stakeholders to improve your organization’s smile sheets.

  3. Create more effective smile sheet questions.

  4. Create evaluation standards for each question to avoid bias.

  5. Envision learning measurement as a bulwark for improved learning design.

 

Recommended Audience:

The content of this workshop will be suitable to those who have at least some background and experience in the training field. It will be especially valuable to those who are responsible for learning evaluation or who manage the learning function.

 

Format:

This is a full-day workshop. Participants are encouraged to bring laptops if they prefer to use a computer to write their questions.  

 

Bonus Take-Away:

Each Participant will receive a copy of Dr. Thalheimer’s Book, Performance-Focused Smile Sheets: A Radical Rethinking of a Dangerous Art Form.

Wow! What a week! I published my first book on Tuesday and have been hearing from well wishers ever since.

Here are some related links you might find interesting:

And here are some random visuals, but maybe related.

  • In the rush of purchases, Amazon briefly ran out of my book, telling folks they'd have to wait 1-3 weeks — their signal that they have no stock. Later, they found some copies…but the genie was out of the bottle.

Amazon showing sold out

 

 

  • The animal kingdom seems to be behind the book:

Bozarth's Corgi

Olah's Cat

 

 

  • Even one of the United States' presidential candidates has spoken up:

Bernie2

 

What a week!

 

Thank you all!!

 

= Will Thalheimer

 

Wow!!

I almost can't believe it. Finally, after 17 years of research and writing, I'm finally a published author.

Today is the day!

It's kind of funny really.

When I began this journey back in 1997 I had a well-paying job running a leadership-development product line, building multimedia simulations, and managing and working with a bunch of great folks.

As I looked around the training-and-development field — that's what we called it back then — I saw that we jumped from one fad to another and held on sanctimoniously to learning methods that didn't work that well. I concluded that what was needed was someone to play a role in bridging the gap between the research side and the practice side.

I had a very naive idea about how I might help. I thought the field needed a book that would specify the fundamental learning factors that should be baked into every learning design. I thought I could write such a book in two or three years, that I'd get it published, that consulting gigs would roll in, that I'd make good money, that I'd make a difference.

Hah! The blind optimism of youth and entrepreneurship!

I've now written over 700 pages on THAT book…without an end in sight.

 

How The Smile-Sheet Book Got its Start

Back in 2007, as I was mucking around in the learning research, I began to see biases in how we were measuring learning. I noticed, for instance, that we always measured at the top of the learning curve, before the forgetting curve had even begun. We measured with trivial multiple-choice questions on definitions and terminology — when these clearly had very little relevance for on-the-job performance. I wrote a research-to-practice report on these learning measurement biases and suddenly I was getting invited to give keynotes…

In my BIG book, I wrote hundreds of paragraphs on learning measurement. I talked about our learning-measurement blind spots to clients, at conferences, and on my blog.

Where feedback is the lifeblood of improvement, we as learning professionals were getting very little good feedback. We were practicing in the dark.

I'd also come to ruminate on the meta-analytic research findings that showed that traditional smile sheets were virtually uncorrelated with learning results. If smile sheets were feeding us bad information, maybe we should just stop using them.

It was about three or four years ago that I saw a big client get terrible advice about their smile sheets from a well-known learning-measurement vendor. And, of course, because the vendor had an industry-wide reputation, the client almost couldn't help buying into their poor smile-sheet designs.

I concluded that smile-sheets were NOT going away. They were too entrenched and there were some good reasons to use them.

I also concluded that smile sheets could be designed to be more effective, more aligned with the research on learning, and designed to better support learners in making smile-sheet decisions.

I decided to write a shorter book than the aforementioned BIG book. That was about 2.5 years ago.

I wrote a draft of the book and I knew I had something. I got feedback from learning-measurement luminaries like Rob Brinkerhoff, Jack Phillips, and Bill Coscarelli. I got feedback from learning gurus Julie Dirksen, Clark Quinn, and Adam Neaman. I made major improvement based on the feedback from these wonderful folks. The book then went through several rounds of top-tier editing, making it a much better read. 

As the publication process unfolded, I realized that I didn't have enough money on hand to fund the printing of the book. Kickstarter and 227 people raised their hands to help, reserving over 300 books in return for their generous Kickstarter contributions. I will be forever indepted to them.

Others reached out to help as well, from people on my newsletter list, to my beloved clients, to folks in trade organizations and publications, to people I've met through the years, to people I haven't met, to followers on Twitter, to the industry luminaries who agreed to write testimonials after getting advanced drafts of the book, to family members, to friends.

Today, all the hard work, all the research, all the client work, all the love and support comes together for me in gratitude.

Thank you!

 

= Will Thalheimer

 

P.S. To learn more about the book, or buy it:  SmileSheets.com

I created a video to help organizations fully understand the meaning of their smile sheets.

 

You can also view this directly on YouTube: https://youtu.be/QucqCxM2qW4

More and more training departments are considering the use of the Net Promoter Score as a question–or the central question–on their smile sheets.

This is one of the stupidest ideas yet for smile sheets, but I understand the impetus–traditional smile sheets provide poor information. In this blog post I am going to try and put a finely-honed dagger through the heart of this idea.

Note that I have written a replacement question for the Net Promoter Score (for getting learner responses).

What is the Net Promoter Score?

Here’s what the folks who wrote the book on the Net Promoter Score say it is:

The Net Promoter Score, or NPS®, is based on the fundamental perspective that every company’s customers can be divided into three categories: Promoters, Passives, and Detractors.

By asking one simple question — How likely is it that you would recommend [your company] to a friend or colleague? — you can track these groups and get a clear measure of your company’s performance through your customers’ eyes. Customers respond on a 0-to-10 point rating scale and are categorized as follows:

  • Promoters (score 9-10) are loyal enthusiasts who will keep buying and refer others, fueling growth.
  • Passives (score 7-8) are satisfied but unenthusiastic customers who are vulnerable to competitive offerings.
  • Detractors (score 0-6) are unhappy customers who can damage your brand and impede growth through negative word-of-mouth.

To calculate your company’s NPS, take the percentage of customers who are Promoters and subtract the percentage who are Detractors.

So, the NPS is about Customer Perceptions, Right?

Yes, its intended purpose is to measure customer loyalty. It was designed as a marketing tool. It was specifically NOT designed to measure training outcomes. Therefore, we might want to be skeptical before using it.

It kind of makes sense for marketing right? Marketing is all about customer perceptions of a given product, brand, or company? Also, there is evidence–yes, actual evidence–that customers are influenced by others in their purchasing decisions. So again, asking about whether someone might recommend a company or product to another person seems like a reasonable thing to ask.

Of course, just because something seems reasonable, doesn’t mean it is. Even for its intended purpose, the Net Promoter Score has a substantial number of critics. See wikipedia for details.

But Why Not for Training?

To measure training with a Net-Promoter approach, we would ask a question like, “How likely is it that you would recommend this training course to a friend or colleague?” 

Some reasonable arguments for why the NPS is stupid as a training metric:

  1. First we should ask, what is the causal pathway that would explain how the Net Promoter Score is a good measure of training effectiveness? We shouldn’t willy-nilly take a construct from another field and apply it to our field without having some “theory-of-causality” that supports its likely effectiveness.Specifically we should ask whether it is reasonable to assume that a learner’s recommendation about a training program tells us SOMETHING important about the effectiveness of that training program? And, for those using the NPS as the central measure of training effectiveness–which sends shivers down my spine–the query than becomes, is it reasonable to assume that a learner’s recommendation about a training program tells us EVERYTHING important about the effectiveness of that training program?Those who would use the Net Promoter Score for training must have one of the following beliefs:
    • Learners know whether or not training has been effective.
    • Learners know whether their friends/colleagues are likely to have the same beliefs about the effectiveness of training as they themselves have.

    The second belief is not worth much, but it is probably what really happens. It is the first belief that is critical, so we should examine that belief in more depth. Are learners likely to be good judges of training effectiveness?

  2. Scientific evidence demonstrates that learners are not very good at judging their own learning. They have been shown to have many difficulties adequately judging how much they know and how much they’ll be able to remember. For example, learners fail to utilize retrieval practice to support long-term remembering, even though we know this is one of the most powerful learning methods (e.g., Karpicke, Butler, & Roediger, 2009). Learners don’t always overcome their incorrect prior knowledge when reading (Kendeou & van den Broek, 2005). Learners often fail to utilize examples in ways that would foster deeper learning (Renkl, 1997). These are just a few examples of many.
  3. Similarly, two meta-analyses on the potency of traditional smile sheets, which tend to measure the same kind of beliefs as NPS measures, have shown almost no correlation between learner responses and actual learning results (Alliger, Tannenbaum, Bennett, Traver, & Shotland, 1997; Sitzmann, Brown, Casper, Ely, & Zimmerman, 2008).
  4. Similarly, when we assess learning in the training context at the end of learning, several cognitive biases creep in to make learners perform much better than they would perform if they were in a more realistic situation back on the job at a later time (Thalheimer, 2007).
  5. Even if we did somehow prove that NPS was a good measure for training, is there evidence that it is the best measure? Obviously not!
  6. Should it be used as the most important measure. No! As stated in the Science of Training review article from last year: “The researchers [in talking about learning measurement] noted that researchers, authors, and practitioners are increasingly cognizant of the need to adopt a multidimensional perspective on learning [when designing learning measurement approaches].”
    Salas, Tannenbaum, Kraiger, & Smith-Jentsch, 2012).
  7. Finally, we might ask are there better types of questions to ask on our smile sheets? The answer to that is an emphatic YES! Performance-Focused Smile Sheets provide a whole new approach to smile sheet questions. You can learn more by attending my workshop on how to create and deploy these more powerful questions.

The Bottom Line

The Net Promoter Score was designed to measure customer loyalty and is not relevant for training. Indeed, it is likely to give us dangerously misguided information.

When we design courses solely so that learners like the courses, we create learning that doesn’t stick, that fails to create long-term remembering, that fails to push for on-the-job application, etc.
Seriously, this is one of the stupidest ideas to come along for learning measurement in a long time. Buyers beware!! Please!

Enrollments are open for my first Workout-Workplace Workshop on the World Wide Web (w6).

Fittingly, it will cover the most significant improvement in smile-sheet design in a generation–The Performance-Focused Smile Sheet.

Click for details…

I know I'm going completely against most training-industry practice in saying this, but it's the truth. Likert-like scales create poor data on smile sheets.

If you're using questions on your smile sheets with answer choices such as:

  • Strongly Agree
  • Agree
  • Neither Agree Nor Disagree
  • Disagree
  • Strongly Disagree

You're getting data that isn't that useful. Such questions will create
data that your stakeholders–and you too–won't be able to decipher very
well. What does it mean if we average a 4.2 rating? It may sound good,
but it doesn't give your learners, your stakeholders, or your team much
information to decide what to do.

Moreover, let's remember that our learners are making decisions with every smile-sheet question they answer. It's a lot tougher to decide between "Strongly Agree" and "Agree" than between two more-concrete answer choices. 

Sharon Shrock
and Bill Coscarelli, authors of the classic text, now in its third edition, Criterion-Referenced Test Development,
offer the following wisdom: On using Likert-type
Descriptive Scales (of the kind that use response words such as “Agree,”
“Strongly Agree,” etc.):

“…the
resulting scale is deficient in that the [response words] are open to many
interpretations.”
(p. 188)

So why do so many surveys use Likert-like scales? Answer: It's easy, it's tradition, and surveys have psychometric advantages often because they are repeating the same concepts in multiple items and they are looking to compare one category to another category of response.

Smile sheets are different. On our smile sheets, we want the learners to be able to make good decisions, and we want to send clear messages about what they have decided. Anything that fuzzes that up, hurts the validity of the smile-sheet data.

Last year I wrote at length about my efforts to improve my own smile sheets. It turns out that this is an evolving effort as I continue to learn from my learners and my experience.

Check out my new 2009 version.

You may remember that one of the major improvements in my smile sheet was to ask learners about the value and newness of EACH CONCEPT TAUGHT (or at least each MAJOR concept). This is beneficial because people respond more accurately to specifics than to generalities, they respond better to concrete learning points than to the vague semblance of a full learning experience.

What I forgot in my previous version was the importance of getting specific feedback on how well I taught each concept. Doh!

My latest version adds a column for how well each concept is taught. There is absolutely no more room to add any columns (I didn't think I could fit this latest one in), so I suppose this may allow diminishing returns on any more improvements.

Check it out and let me know what you think.

Update May 2014

Sarah Boehle wrote an article that included Neil Rackham's famous story on the dangers of measuring training only with smile sheets. The story used to be available from Training Magazine directly, but after some earlier disruptions and recoveries at Training Magazine, their digital archive was reconstituted and currently only goes back to 2007.

Fortunately, you can read the article here.