Original post appeared in 2011. I update it here.
When companies think of evaluation, they often first think of benchmarking their performance against other companies. There are important reasons to be skeptical of this type of approach, especially as a sole source of direction.
I often add this warning to my workshops on how to create more effective smile sheets: Watch out! There are vendors in the learning field who will attempt to convince you that you need to benchmark your smile sheets against your industry. You will spend (waste) a lot of money with these extra benchmarking efforts!
Two forms of benchmarking are common, (1) idea-generation, and (2) comparison. Idea-generation involves looking at other company’s methodologies and then assessing whether particular methods would work well at our company. This is a reasonable procedure only to the extent that we can tell whether the other companies have similar situations to ours and whether the methodologies have really been successful at those other companies.
Comparison benchmarking for training and development looks further at a multitude of learning methods and results and specifically attempts to find a wide range of other companies to benchmark against. This approach requires stringent attempts to create valid comparisons. This type of benchmarking is valuable only to the extent that we can determine whether we are comparing our results to good companies or bad and whether the comparison metrics are important in the first place.
Both types of benchmarking require exhaustive efforts and suffer from validity problems. It is just too easy to latch on to other company’s phantom results (i.e., results that seem impressive but evaporate upon close examination). Picking the right metrics are difficult (i.e., a business can be judged on its stock price, its revenues, profits, market share, etc.). Comparing companies between industries presents the proverbial apple-to-orange problem. It’s not always clear why one business is better than another (e.g., It is hard to know what really drives Apple Computer’s current success: its brand image, its products, its positioning versus its competitors, its leaders, its financial savvy, its customer service, its manufacturing, its project management, its sourcing, its hiring, or something else). Finally, and most pertinent here, it is extremely difficult to determine which companies are really using best practices (e.g., see Phil Rosenweig’s highly regarded book on The Halo Effect) because companies’ overall results usually cloud and obscure the on-the-job realities of what’s happening.
The difficulty of assessing best practices in general pales in comparison to the difficulties of assessing its training-and-development practices. The problem is that there just aren’t universally accepted and comparable metrics to utilize for training and development. Where baseball teams have wins and losses, runs scored, and such; and businesses have revenues and profits and the like; training and development efforts produce more fuzzy numbers—certainly ones that aren’t comparable from company to company. Reviews of the research literature on training evaluation have found very low levels of correlation (usually below .20) between different types of learning assessments (e.g., Alliger, Tannenbaum, Bennett, Traver, & Shotland, 1997; Sitzmann, Brown, Casper, Ely, & Zimmerman, 2008).
Of course, we shouldn’t dismiss all benchmarking efforts. Rigorous benchmarking efforts that are understood with a clear perspective can have value. Idea-generation brainstorming is probably more viable than a focus on comparison. By looking to other companies’ practices, we can gain insights and consider new ideas. Of course, we will want to be careful not to move toward the mediocre average instead of looking to excel.
The bottom line on benchmarking from other companies is: be careful, be willing to spend lots of time and money, and don’t rely on cross-company comparisons as your only indicator.
Finally, any results generated by brainstorming with other companies should be carefully considered and pilot-tested before too much investment is made.
Smile Sheet Issues
Both of the meta-analyses cited above found that smile sheets were correlated with an r = 0.09, which is virtually no correlation at all. I have detailed smile-sheet design problems in detail in my book, Performance-Focused Smile Sheets: A Radical Rethinking of a Dangerous Art Form. In short, most smile sheets focus on learner satisfaction, and fail to focus on factors related to actual learning effectiveness. Most smile sheets utilize Likert-like scales or numeric scales that offer learners very little granularity between answer choices, opening up responding to bias, fatigue, and disinterest. Finally, most learners have fundamental misunderstandings about their own learning (Brown, Roediger & McDaniel, 2014; Kirschner & van Merriënboer, 2013), so asking for their perceptions with general questions about their perceptions is too often a dubious undertaking.
The bottom line is that traditional smile sheets are providing almost everyone with meaningless data in terms of learning effectiveness. When we benchmark our smile sheets against other companies’ smile sheets we compound our problems.
Wisdom from Earlier Comments
Ryan Watkins, researcher and industry guru, wrote:
I would add to this argument that other companies are no more static than our own — thus if we implement in September 2011 what they are doing in March 2011 from our benchmarking study, then we are still behind the competition. They are continually changing and benchmarking will rarely help you get ahead. Just think of all the companies that tried to benchmark the iPod, only to later learn that Apple had moved on to the iPhone while the others were trying to “benchmark” what they were doing with the iPod. The competition may have made some money, but Apple continues to win the major market share.
Mike Kunkle, sales training and performance expert, wrote:
Having used benchmarking (carefully and prudently) with good success, I can’t agree with avoiding it, as your title suggests, but do agree with the majority of your cautions and your perspectives later in the post.
Nuance and context matter greatly, as do picking the right metrics to compare, and culture, which is harder to assess. 70/20/10 performance management somehow worked at GE under Welch’s leadership. I’ve seen it fail miserably at other companies and wouldn’t recommend it as a general approach to good people or performance management.
In the sales performance arena, at least, benchmarking against similar companies or competitors does provide real benefit, especially in decision-making about which solutions might yield the best improvement. Comparing your metrics to world-class competitors and calculating what it would mean to you to move in that direction, allows for focus and prioritization, in a sea of choices.
It becomes even more interesting when you can benchmark internally, though. I’ve always loved this series of examples by Sales Benchmark Index:
Alliger, Tannenbaum, Bennett, Traver, & Shotland (1997). A meta-analysis of the relations among training criteria. Personnel Psychology, 50, 341-357.
Brown, P. C., Roediger, H. L., III, & McDaniel, M. A. (2014). Make It Stick: The Science of Successful Learning. Cambridge, MA: Belknap Press of Harvard University Press.
Kirschner, P. A., & van Merriënboer, J. J. G. (2013). Do learners really know best? Urban legends in education. Educational Psychologist, 48(3), 169–183.
Sitzmann, T., Brown, K. G., Casper, W. J., Ely, K., & Zimmerman, R. D. (2008). A review and meta-analysis of the nomological network of trainee reactions. Journal of Applied Psychology, 93, 280-295.