The learning-and-performance industry is deluged with instruments purported to help people (1) work better in teams, (2) manage more effectively, (3) hire the right people, (4) promote the best people, (5) etcetera. Unfortunately, many of these instruments have validity, reliability, and magnitude-of-effect issues, despite being well-received by respondents and by learning-and-performance professionals. For example, I will note problems with the MBTI Myers-Briggs below.

Such instruments include multi-rater 360-degree instruments, job-skills tests, knowledge tests, and personality inventories. This blog post is related specifically to personality inventories.

Personality instruments include the wildly-popular MBTI Myers-Briggs Type Indicator and the DISC, plus all sorts of other tests indexed with colors, shapes, and other personality dimensions.

The thinking is that people’s personalities influence their actions and their actions determine their workplace effectiveness. This makes sense intuitively, but in practice it has not always been easy to show that personality affects behavior. Early excitement about this possibility in the mid 1900’s (i.e., 1930 to 1960) gave way to skepticism, only rebounding into favor in the 1990’s as new research found evidence that personality tests could be used in relationship to job performance. For a good historic overview see John and Srivastava (1999, link in reference section below).

Recent research has generally found that personality inventories are related to job performance, though the relationships may be modest and not always consistent. Barrick and Mount (1991) did a meta-analysis looking at many aspects of job performance and found personality to be a factor. Zhao and Seibert (2006) found that the Five-Factor Personality types were related to entrepreneurial skills. Clarke and Robertson (2005) found that personality was related to workplace and non-workplace accidents. Barrick, Mount, and Judge (2001) examined 15 different meta-analyses and concluded that personality and performance were linked.

But this research needs to be understood with some perspective. As Hurtz and Donovan (2000) and others have pointed out, the relationship between the five-factor personality inventories and job performance can be somewhat limited. In other words, just because a person scores a certain way doesn’t necessarily mean that they will act a certain way; while there is a slight tendency in the predicted direction, it often is only a slight tendency. Hurtz and Donovan worry further that when other indicators are used (e.g., previous job experience, interviews, etc.), personality measures may provide very little additional information. Moreover, they cite the worry that respondents can fake their responses on personality inventories (see also, Birkeland, Manson, Kisamore, Brannick, & Smith, 2006).

It is particularly important to note that personality research is now almost all tied to the “Big-Five” or “Five-Factor” personality taxonomy. This taxonomy measures personality along five distinct scales, including Openness, Conscientiousness, Extraversion, Agreeableness, and Emotional Stability. The “Big-Five” or “Five-Factor” Personality taxonomy has been validated in many scientific studies (Digman, 1990; Hogan, Hogan, & Roberts, 1996) and is the most widely-regarded of the many personality models, especially as it relates to workplace behaviors. For example, Barrick, Mount, and Judge in 2001 looked at 15 meta-analyses that investigated the relationship between the five personality factors and job performance.

Other personality taxonomies have not fared as well. For example, the MBTI (Myers-Briggs) has been widely discredited by researchers. It is considered neither reliable nor valid. For example, see Pittenger’s (2005) caution about using the MBTI. The DISC has not been studied enough to be scientifically validated.

Years ago, I used the MBTI in leadership training to make the point that people are different and may bring different skills and needs to the table. While using such a diagnostic seemed helpful in making that point, today I would use other ways to get that message across or use instruments that are scientifically validated.

To Learn More about Five-Factor Model of Personality

To Purchase/Use Instruments based on the Five-Factor Model


Research Citations

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1-26.

Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and performance at the beginning of the new millennium: What do we know and where do we go next?. International Journal of Selection and Assessment, 9, 9-30.

Birkeland, S. A., Manson, T. M., Kisamore, J. L., Brannick, M. T., & Smith, M. A. (2006). A Meta-Analytic Investigation of Job Applicant Faking on Personality Measures. International Journal of Selection and Assessment, 14, 317-335.

Clarke, S., & Robertson, I. T. (2005). A meta​-​analytic review of the Big Five personality factors and accident involvement in occupational and non​-​occupational settings. Journal of Occupational and Organizational Psychology, 78(3), 355-376.

Costa, P., & McCrae, R. (1992). NEO-PI-R and NEO-FFI professional manual. Odessa, FL: Psychological Assessment Resources.

Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41, 417-440.

Hogan, R., Hogan, J., & Roberts, B. W. (1996). Personality measurement and employment decisions: Questions and answers. American Psychologist, 51, 469-477.

John, O. P., & Srivastava, S.  (1999). The Big Five Trait Taxonomy:  History, measurement, and theoretical perspectives.  In L. A. Pervin & O. P. John (Eds.), Handbook of Personality:  Theory and Research (2nd ed., pp. 102-138), New York:  Guilford Press. Available at:,1999.pdf or

Pittenger, D. J. (2005). Cautionary Comments Regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal: Practice and Research, 57, 210-221.

Zhao, H., & Seibert, S. E. (2006). The Big Five personality dimensions and entrepreneurial status: A meta-analytical review. Journal of Applied Psychology, 91, 259-271.


Some Interesting Articles on Personality and the Workplace (and their abstracts)


Personality and Team Performance: A Meta​-​Analysis.

By Peeters, Miranda A. G.; Van Tuijl, Harrie F. J. M.; Rutte, Christel G.; Reymen, Isabelle M. M. J.
European Journal of Personality. Vol 20(5), Aug 2006, 377-396.

Using a meta-analytical procedure, the relationship between team composition in terms of the Big-Five personality traits (trait elevation and variability) and team performance were researched. The number of teams upon which analyses were performed ranged from 106 to 527. For the total sample, significant effects were found for elevation in agreeableness (p = 0.24) and conscientiousness (p = 0.20), and for variability in agreeableness (p = -0.12) and conscientiousness (p = -0.24). Moderation by type of team was tested for professional teams versus student teams. Moderation results for agreeableness and conscientiousness were in line with the total sample results. However, student and professional teams differed in effects for emotional stability and openness to experience. Based on these results, suggestions for future team composition research are presented.


An examination of the role of personality in work accidents using meta​-​analysis.

By Clarke, Sharon; Roberston, Ivan
Applied Psychology: An International Review. Vol 57(1), Jan 2008, 94-108.

Personality has been studied as a predictor variable in a range of occupational settings. The study reported is based on a systematic search and meta-analysis of the literature, using the “Big Five” personality framework. The results indicated that there was substantial variability in the effect of personality on workplace accidents, with evidence of situational moderators operating in most cases. However, one aspect of personality, low agreeableness, was found to be a valid and generalisable predictor of involvement in work accidents. The implications of the findings for future research are discussed. Although meta-analysis can be used to provide definite estimates of effect sizes, the limitations of such an approach are also considered.


Personality and Transformational and Transactional Leadership: A Meta​-​Analysis.

By Bono, Joyce E.; Judge, Timothy A.
Journal of Applied Psychology. Vol 89(5), Oct 2004, 901-910.

This study was a meta-analysis of the relationship between personality and ratings of transformational and transactional leadership behaviors. Using the 5-factor model of personality as an organizing framework, the authors accumulated 384 correlations from 26 independent studies. Personality traits were related to 3 dimensions of transformational leadership–idealized influence-inspirational motivation (charisma), intellectual stimulation, and individualized consideration–and 3 dimensions of transactional leadership–contingent reward, management by exception-active, and passive leadership. Extraversion was the strongest and most consistent correlate of transformational leadership. Although results provided some support for the dispositional basis of transformational leadership–especially with respect to the charisma dimension–generally, weak associations suggested the importance of future research to focus on both narrower personality traits and nondispositional determinants of transformational and transactional leadership.

The Big Five personality dimensions and entrepreneurial status: A meta​-​analytical review.

By Zhao, Hao; Seibert, Scott E.
Journal of Applied Psychology. Vol 91(2), Mar 2006, 259-271.

In this study, the authors used meta-analytical techniques to examine the relationship between personality and entrepreneurial status. Personality variables used in previous studies were categorized according to the five-factor model of personality. Results indicate significant differences between entrepreneurs and managers on 4 personality dimensions such that entrepreneurs scored higher on Conscientiousness and Openness to Experience and lower on Neuroticism and Agreeableness. No difference was found for Extraversion. Effect sizes for each personality dimension were small, although the multivariate relationship for the full set of personality variables was moderate (R = .37). Considerable heterogeneity existed for all of the personality variables except Agreeableness, suggesting that future research should explore possible moderators of the personality-entrepreneurial status relationship.


Predicting job performance using FFM and non​-​FFM personality measures.

By Salgado, Jesús F.
Journal of Occupational and Organizational Psychology. Vol 76(3), Sep 2003, 323-346.

This study compares the criterion validity of the Big Five personality dimensions when assessed using Five-Factor Model (FFM)-based inventories and non-FFM-based inventories. A large database consisting of American as well as European validity studies was meta-analysed. The results showed that for conscientiousness and emotional stability, the FFM-based inventories had greater criterion validity than the non FFM-based inventories. Conscientiousness showed an operational validity of .28 (N=19,460, 90% CV=.07) for FFM-based inventories and .18 (N=5,874, 90% CV=-.04) for non-FFM inventories. Emotional stability showed an operational validity of .16 (N=10,786, 90% CV=.04) versus .05 (N=4,54I, 90% CV=-.05) for FFM and non-FFM-based inventories, respectively. No relevant differences emerged for extraversion, openness, and agreeableness. From a practical point of view, these findings suggest that practitioners should use inventories based on the FFM in order to make personnel selection decisions.

A Meta​-​Analytic Investigation of Job Applicant Faking on Personality Measures.

By Birkeland, Scott A.; Manson, Todd M.; Kisamore, Jennifer L.; Brannick, Michael T.; Smith, Mark A.
International Journal of Selection and Assessment. Vol 14(4), Dec 2006, 317-335.

This study investigates the extent to which job applicants fake their responses on personality tests. Thirty-three studies that compared job applicant and non-applicant personality scale scores were meta-analyzed. Across all job types, applicants scored significantly higher than non-applicants on extraversion (d = .11), emotional stability (d = .44), conscientiousness (d = .45), and openness (d = .13). For certain jobs (e.g., sales), however, the rank ordering of mean differences changed substantially
suggesting that job applicants distort responses on personality dimensions that are viewed as particularly job relevant. Smaller mean differences were found in this study than those reported by Viswesvaran and Ones (Educational and Psychological Measurement, 59(2), 197-210), who compared scores for induced ‘fake-good’ vs. honest response conditions. Also, direct Big Five measures produced substantially larger differences than did indirect Big Five measures.


A meta​-​analytic review of the Big Five personality factors and accident involvement in occupational and non​-​occupational settings.

By Clarke, Sharon; Robertson, Ivan T.
Journal of Occupational and Organizational Psychology. Vol 78(3), Sep 2005, 355-376.

Although a number of studies have examined individual personality traits and their influence on accident involvement, consistent evidence of a predictive relationship is lacking due to contradictory findings. The current study reports a meta-analysis of the relationship between accident involvement and the Big Five personality dimensions (extraversion, neuroticism, conscientiousness, agreeableness, and openness). Low conscientiousness and low agreeableness were found to be valid and generalizable predictors of accident involvement, with corrected mean validities of .27 and .26, respectively. The context of the accident acts as a moderator in the personality-accident relationship, with different personality dimensions associated with occupational and non-occupational accidents. Extraversion was found to be a valid and generalizable predictor of traffic accidents, but not occupational accidents. Avenues for further research are highlighted and discussed.

Big Five personality predictors of post​-​secondary academic performance.

By O’Connor, Melissa C.; Paunonen, Sampo V.
Personality and Individual Differences. Vol 43(5), Oct 2007, 971-990.

We reviewed the recent empirical literature on the relations between the Big Five personality dimensions and post-secondary academic achievement, and found some consistent results. A meta-analysis showed Conscientiousness, in particular, to be most strongly and consistently associated with academic success. In addition, Openness to Experience was sometimes positively associated with scholastic achievement, whereas Extraversion was sometimes negatively related to the same criterion, although the empirical evidence regarding these latter two dimensions was somewhat mixed. Importantly, the literature indicates that the narrow personality traits or facets presumed to underlie the broad Big Five personality factors are generally stronger predictors of academic performance than are the Big Five personality factors themselves. Furthermore, personality predictors can account for variance in academic performance beyond that accounted for by measures of cognitive ability. A template for future research on this topic is proposed, which aims to improve the prediction of scholastic achievement by overcoming identifiable and easily correctable limitations of past studies.


Gender differences in personality traits across cultures: Robust and surprising findings.

By Costa Jr., Paul; Terracciano, Antonio; McCrae, Robert R.
Journal of Personality and Social Psychology. Vol 81(2), Aug 2001, 322-331.

Secondary analyses of Revised NEO Personality inventory data from 26 cultures (N =23,031) suggest that gender differences are small relative to individual variation within genders; differences are replicated across cultures for both college-age and adult samples, and differences are broadly consistent with gender stereotypes: Women reported themselves to be higher in Neuroticism, Agreeableness, Warmth, and Openness to Feelings, whereas men were higher in Assertiveness and Openness to Ideas. Contrary to predictions from evolutionary theory, the magnitude of gender differences varied across cultures. Contrary to predictions from the social role model, gender differences were most pronounced in European and American cultures in which traditional sex roles are minimized. Possible explanations for this surprising finding are discussed, including the attribution of masculine and feminine behaviors to roles rather than traits in traditional cultures.


Five​-​factor model of personality and job satisfaction: A meta​-​analysis.

By Judge, Timothy A.; Heller, Daniel; Mount, Michael K.
Journal of Applied Psychology. Vol 87(3), Jun 2002, 530-541.

This study reports results of a meta-analysis linking traits from the 5-factor model of personality to overall job satisfaction. Using the model as an organizing framework, 334 correlations from 163 independent samples were classified according to the model. The estimated true score correlations with job satisfaction were -.29 for Neuroticism, .25 for Extraversion, .02 for Openness to Experience, .17 for Agreeableness, and .26 for Conscientiousness. Results further indicated that only the relations of Neuroticism and Extraversion with job satisfaction generalized across studies. As a set, the Big Five traits had a multiple correlation of .41 with job satisfaction, indicating support for the validity of the dispositional source of job satisfaction when traits are organized according to the 5-factor model.


Relationship of personality to performance motivation: A meta​-​analytic review.

By Judge, Timothy A.; Ilies, Remus
Journal of Applied Psychology. Vol 87(4), Aug 2002, 797-807.

This article provides a meta-analysis of the relationship between the 5-factor model of personality and 3 central theories of performance motivation (goal-setting, expectancy, and self-efficacy motivation). The quantitative review includes 150 correlations from 65 studies. Traits were organized according to the 5-factor model of personality. Results indicated that Neuroticism (average validity=-.31) and Conscientiousness (average validity=.24) were the strongest and most consistent correlates of performance motivation across the 3 theoretical perspectives. Results further indicated that the validity of 3 of the Big Five traits–Neuroticism, Extraversion, and Conscientiousness–generalized across studies. As a set, the Big 5 traits had an average multiple correlation of .49 with the motivational criteria, suggesting that the Big 5 traits are an important source of performance motivation.


Temperament and personality in dogs (Canis familiaris): A review and evaluation of past research.

By Jones, Amanda C.; Gosling, Samuel D.
Applied Animal Behaviour Science. Vol 95(1-2), Nov 2005, 1-53.

Spurred by theoretical and applied goals, the study of dog temperament has begun to garner considerable research attention. The researchers studying temperament in dogs come from varied backgrounds, bringing with them diverse perspectives, and publishing in a broad range of journals. This paper reviews and evaluates the disparate work on canine temperament. We begin by summarizing general trends in research on canine temperament. To identify specific patterns, we propose several frameworks for organizing the literature based on the methods of assessment, the breeds examined, the purpose of the studies, the age at which the dogs were tested, the breeding and rearing environment, and the sexual status of the dogs. Next, an expert-sorting study shows that the enormous number of temperament traits examined can be usefully classified into seven broad dimensions. Meta-analyses of the findings pertaining to inter-rater agreement, test-retest reliability, internal consistency, and convergent validity generally support the reliability and validity of canine temperament tests but more studies are needed to support these preliminary findings. Studies examining discriminant validity are needed, as preliminary findings on discriminant validity are mixed. We close by drawing 18 conclusions about the field, identifying the major theoretical and empirical questions that remain to be addressed.


Will’s Note: I included this last one because it amused me that searching for “personality” one might find a research review on dog personality—and to keep all this research stuff in perspective.

Latest Update (October 2019): Here is the newest version, which I use in my presentations.

Update: Check out The Learning Landscape Video by clicking here.

Original Post Is Below:

This is the first draft of a chapter from my forthcoming book (don’t ask when, it’s a labor of love), modified somewhat to avoid references to other parts of the book (to make it whole here). Your comments, criticisms, and good ideas will be gratefully explored.

The Genesis of the Model

It’s helpful to have an overall understanding of what we’re trying to do in the learning-and-performance profession. I offer the following model as a way to frame our discussion, as a way to provide a deep conceptual map of our world—at least the world to which we should be aspiring.

I’ve been in the learning-and-performance field for almost a quarter century. I’ve been an instructional designer, a trainer, a university adjunct, a simulation architect, a project manager, a project manager, a business leader, a researcher, and a consultant. And yet, even recently, I have found myself wanting to build a better model of what we do. In the book, I’m going to offer several models that provide a good starting place for deep understanding. The first of these models I call, The Learning Landscape. I’ve been gradually building this model for years, and I recently added some additional complexity that completes the picture of what we do—or what we should aim to do.

Of course, all models are simplifications in the interest of understanding and usability. Early versions of The Learning Landscape have resonated with my clients, and I think this latest version provides additional value.

I’m going to unveil the model a piece at a time, adding complexity as this blog post progresses.

The Model’s Phases

Look at the bottom of the following diagram. You’ll notice three labels there. The learning landscape I’m describing is one in which we build a learning intervention to help our learners perform in their future performance situations (for example on the job) in an attempt to create certain beneficial learning outcomes. So for example, if we build a course to teach creative-thinking skills, we do it to help learners be more creative in their jobs and produce more innovations for their organizations.


From Learning Intervention to Learning Outcomes 

Look at the diagram below. It shows how a learning intervention creates performance that lead to results. In the learning intervention (box A), the learners learn—they build an understanding of the learning content. Later, in the performance situation (box C), the learner retrieves from memory the information that they learned. They also apply what they learned (box E). This successful retrieval and application enable the learner to get from the learning what they hoped to get (box F) and the organization gets the learning results it wanted to get (box G) from the investment it made.

The diagram above shows the minimum requirements for a successful learning intervention. The learners have to learn (box A), retrieve (box C), and apply (box E) what they’ve learned in order to create beneficial learning outcomes.

It would be pollyannaish for us to believe that this process always works as diagrammed above. When our learners fail to learn (box A), the whole process breaks down. You can’t retrieve what you never learned and you can’t apply what you can’t retrieve. Even when our learners fully learn a topic, at a later time they may fail to retrieve what they learned (box C). People forget information. They may forget something permanently or they may suffer temporary or contextually-induced forgetting. Learners can also learn and retrieve successfully, but not apply what they’ve learned (box E). There are many reasons that learners fail to apply what they are able to retrieve. The learning intervention might not have been sufficiently motivating to prompt the learners to apply what they learned. The incentives in the performance situation may discourage application. The learners may not have time, resources, or other competencies that enable them to be successful.

The diagram above is very helpful in understanding how our learning interventions create learning results. It highlights the obvious importance of the quality of the learning intervention itself. It also highlights the criticality of retrieval. Later we will explore in-depth how we can build learning interventions to specifically support retrieval. Finally, the diagram above highlights the importance of the performance context in supporting learners in applying what they learned. Later we will talk about how we can gain more influence in the performance situation. We’ll also discuss how to enroll learners’ managers to improve the likelihood of successful application.

Adding Other Working-Memory Processing (besides retrieval of what was learned)

The above diagram is missing a few key elements. While it nicely highlights the importance of creating retrieval, it doesn’t account for other working-memory triggers. In the diagram below, I’ve added another box (box D “Learner Responds”) to represent working-memory processing not directly related to what was previously learned in our original learning intervention.

While working-memory “retrieval” and “responding” are overlapping and often interdependent processes, I wanted to distinguish between the retrieval of information from the learning intervention and responding generated by other means, for example job aids, performance support, management prompting, and other guidance mechanisms.

While often box C and box D work in concert (as when a job aid provides guidance and also supports retrieval of what was previously learned), we need to remember that we can get our learners thinking productive thoughts without necessarily relying on course-like learning interventions.

To reiterate, there are two ways to trigger working-memory processing related to our learning efforts. First, retrieval cues can trigger our learners to remember what they’ve learned in the original learning intervention. Second, other triggers (i.e., performance-support tools) can also stimulate working-memory responding. As you probably know, sometimes it is more effective to rely on retrieval, while other times it is more effective to rely on performance support interventions. Sometimes it is better to utilize both in concert.

The Full Model 

We also should recognize that learners can learn in their performance situations. The full model below adds this performance-situation learning, what some people call informal learning.

Performance-situation learning (box B) belongs in the model because it is a powerful force in real-world performance. Our learners do a large part of their learning on the job. Whether they receive formal training or not, learners learn through their experiences at work, at play, just living their lives.

Performance-situation learning (box B) can support the learning-intervention learning (box A) by reinforcing what was originally learned, taking the learning deeper, determining real-world contingencies, and creating fluency in retrieval, among other things. One thing we need to realize as learning professionals is that we don’t necessarily have to get learners up to speed completely in our formal learning interventions—in fact it is difficult to do so. Instead of trying to cram every bit of information into our training programs, we would be far better off to design our programs with an eye to what can be learned on the job and a plan for how to support that later on-the-job learning.

From the opposite direction, learning-intervention learning (box A) can be designed specifically to help learners learn in their performance situations. We’ll talk more about this later, but briefly, we can help our learners learn by helping them notice cues they might not readily notice, by providing them with relevant mental models of how the world works, and by influencing the performance-situation itself—for example by providing reminding mechanism and getting learners’ managers involved. To highlight this point again, box A learning can influence and support box B learning and box B learning can reinforce and extend box A learning.

The Learning Landscape Summary

All models and all metaphors have limitations, even if they are brilliantly clear in simplifying complex realities into workable conceptual maps (think e=mc2). I’ve been working on this learning landscape model for years and though it seems complete and potent to me now, I imagine that in the years to come I and others will find chinks in its armor or improvements that can be made.

I admit the possibility of the model’s limitation partly because it is true (that it is likely to have limitations) and partly to model useful thought processes in learning design. Too often, our instructional-design programs have taught models as gospel, unfortunately creating instructional designers who are only able to follow recipes—and who particularly (a) are unable to be creative when unusual situations confront them, (b) are unable to create new models of increasing or context-specific usefulness, and (c) are unable/unwilling to listen and learn from the wisdom of other people and other disciplines.

The learning landscape model is intended to make the following points:

  1. Our ultimate goal should be to create beneficial learning outcomes.
  2. There are two aspects of learning outcomes—the fulfillment the learner gets from undertaking a learning effort and the learning results the organization gets from investing in learning.
  3. For formal learning interventions to produce their benefits, they must ultimately produce appropriate behaviors in some future performance situation (often a workplace on-the-job situation).
  4. For formal learning interventions to produce their benefits, they must support the learners in being able to retrieve what they’ve learned. This means specifically, that our learning interventions have to be designed to minimize forgetting and elicit spontaneous remembering.
  5. There are two ways to generate appropriate behaviors, retrieval of previously learned information and triggering of appropriate responding. These working-memory processes can work in concert. Triggering appropriate working-memory responding is an underutilized tool. We need to more aggressively look to utilize performance support tools, reminding mechanisms, and management oversight.
  6. Formal learning interventions are not the only means to produce appropriate behaviors.
  7. Learners do a lot of their learning in their performance situations. We ought to leverage that learning to reinforce and extend any formal learning that was utilized. We ought to design our formal learning interventions to improve our learners’ informal-learning opportunities.

The most important question that instructional designers can ask is:

“What do learners need to be able to do, and in what situations do they need to do those things?”

While we might discount such a simple question as insignificant, the question brilliantly forces us to focus on our ultimate goals and helps us to align our learning interventions with the human learning system.

Too many of us design with a focus on topics, content, knowledge. This tendency pushes us, almost unconsciously, to create learning that is too boring, filled with too much information, and bereft of practice in realistic situations.

The Magic Question requires us to be relevant. For workplace learning, it focuses our thinking toward learners’ future job situations. For education learning, it focuses our thinking toward real-world relevance of our academic topics.

The Magic Question in Practice

In practice, the Magic Question forces us to begin our instructional-design efforts by not only creating a list of instructional objectives, but also by creating a list of performance situations. For example, if we’re creating leadership training, we not only need to compile objectives like, “For most decisions, it can be helpful to bring your direct reports into decision-making, so as to increase the likelihood that they will bring energy and passion in implementing decisions.” We also need to compile a list of situations were this objective is relevant, for example in weekly staff meetings, project meetings, in one-on-one face-to-face conversations, in phone conversations, etc. Also, for general decision making, but not in situations where time is urgent, where safety is an issue, where legal ramifications are evident, etc.

By framing our instructional-design projects in this way, we get to think about our learning designs in ways that are much more action-oriented, relevant, and practical. The framing makes it more likely that we will align our learning and performance contexts, making it more likely that our learners, in their future situations, will spontaneously remember what we’ve taught them. The framing makes it more likely that we will focus on practice instead of overloading our learners with information. The framing also makes it more likely that we will utilize relevant scenarios that more fully engage our learners. Finally, using the Magic Question forces our SME’s (subject-matter experts) to reformulate their expertise into potent practical packages of relevant material. It’s not always easy to bend SME’s to this discipline, but after the pain, they’ll thank you profusely as together you push their content to a much higher level.

Obviously, there is more to be said about how the Magic Question can be integrated into learning-design efforts. On the other hand, as my clients have reported, the Magic Question has within it a simple power to (1) change the way we think about instructional design, and (2) transform the learning interventions we build.

I’ve just completed a new research-to-practice white paper. As far as I can tell, it is the first work on learning measurement (assessment and evaluation) that actually takes human learning into consideration. I’d like to thank Questionmark for agreeing to support this work.


Words from the paper’s introduction:

In writing this report on using fundamental learning research to inform assessment  design, I am combining two of my passions—learning and the measurement of learning. As an experienced learner and learning designer, I have come to the belief that those of us responsible for designing, developing, and delivering learning interventions are often left in the dark about our own successes and failures. The measurement techniques we use simply do not provide us with valid feedback about our own performances.

The traditional model of assessment utilizes end-of-learning assessments provided to learners in the context in which they learned. This model is seriously flawed, especially in failing to give us an idea of how well our learning interventions are doing in preparing our learners to retrieve information in future situations—the ultimate goal of training and education. By failing to measure our performance in this regard, we are missing opportunities to provide ourselves with valid feedback. We are also likely failing our institutions and our learners because we are not able to create a practice of continuous improvement to maximize our learning outcomes.

This report is designed to help you improve your assessments in this regard. I certainly won’t claim to have all the answers, nor do I think it is easy to create the perfect assessment, but I do believe very strongly that all of us can improve our assessments substantially, and by so doing improve the practice of education and training.

Click here to access the report.


I will give $1000 (US dollars) to the first person or group who can prove that taking learning styles into account in designing instruction can produce meaningful learning benefits.

I’ve been suspicious about the learning-styles bandwagon for many years. The learning-style argument has gone something like this: If instructional designers know the learning style of their learners, they can develop material specifically to help those learners, and such extra efforts are worth the trouble.

I have my doubts, but am open to being proven wrong.

Here’s the criteria for my Learning-Styles Instructional-Design Challenge:

  1. The learning program must diagnose learners’ learning styles. It must then provide different learning materials/experiences to those who have different styles.
  2. The learning program must be compared against a similar program that does not differentiate the material based on learning styles.
  3. The programs must be of similar quality and provide similar information. The only thing that should vary is the learning-styles manipulation.
  4. The comparison between the two versions (the learning-style version and the non-learning-style version) must be fair, valid, and reliable. At least 70 learners must be randomly assigned to the two groups (with at least 35 minimum in each group completing the experience). The two programs must have approximately the same running time. For example, the time required by the learning-style program to diagnose learning styles can be used by the non-learning-styles program to deliver learning. The median learning time for the programs must be no shorter than 25 minutes.
  5. Learners must be adults involved in a formal workplace training program delivered through a computer program (e-learning or CBT) without a live instructor. This requirement is to ensure the reproducability of the effects, as instructor-led training cannot be precisely reproduced.
  6. The learning-style program must be created in an instructional-development shop that is dedicated to creating learning programs for real-world use. Programs developed only for research purposes are excluded. My claim is that real-world instructional design is unlikely to be able to utilize learning styles to create learning gains.
  7. The results must be assessed in a manner that is relatively authentic–at a minimum level learners should be asked to make scenario-based decisions or perform activities that simulate the real-world performance the program teaches them to accomplish. Assessments that only ask for information at the knowledge level (e.g., definitions, terminology, labels) are NOT acceptable. The final assessment must be delayed at least one week after the end of the training. The same final assessment must be used for both groups. It must fairly assess the whole learning experience.
  8. The magnitude of the difference in results between the learning-style program and the non-learning-style program must be at least 10%. (In other words, the average of the learning-styles scores subtracted by the average of the non-learning-styles scores must be more than 10% of the non-learning-styles scores). So for example, if the non-learning-styles average is 50, then the learning-styles score must be equal to 55 or more. This magnitude is to ensure that the learning-styles program produces meaningful benefits. 10% is not too much to ask.
  9. The results must be statistically significant at the p<.05 level. Appropriate statistical procedures must be used to gauge the reliability of the results. Cohen’s d effect size should be equal to .4 or more (a small to medium effect size according to Cohen, 1992).
  10. The learning-style program cannot cost more than twice as much as the non-learning-style program to develop, nor can it take more than twice as long to develop. I want to be generous here.
  11. The results can be documented by unbiased parties.

To reiterate, the challenge is this:

Can an e-learning program that utilizes learning-style information outperform an e-learning program that doesn’t utilize such information by 10% or more on a realistic test of learning, even it is allowed to cost up to twice as much to build?

$1,000 says it just doesn’t happen in the real-world of instructional design. $1,000 says we ought to stop wasting millions trying to cater to this phantom curse.

Publication Note

This article was originally published on the Work-Learning Research website ( in 2002. It may have had some minor changes since then. It was moved to my WillAtWorkLearning Blog in 2006, and has now been moved here in late 2017.

Updated Research

Even after more than a decade, this blog post still provides valuable information explaining the issues — and the ramifications for learning. However, further research has uncovered additional information and has been published in a scientific journal in 2014. You can read a review of that research here.


People do NOT remember 10% of what they read, 20% of what they see, 30% of what they hear, etc. That information, and similar pronouncements are fraudulent. Moreover, general statements on the effectiveness of learning methods are not credible—learning results depend on too many variables to enable such precision. Unfortunately, this bogus information has been floating around our field for decades, crafted by many different authors and presented in many different configurations, including bastardizations of Dale’s Cone. The rest of this article offers more detail.

My Search For Knowledge

My investigation of this issue began when I came across the following graph:

The Graph is a Fraud!

After reading the cited article several times and not seeing the graph—nor the numbers on the graph—I got suspicious and got in touch with the first author of the cited study, Dr. Michelene Chi of the University of Pittsburgh (who is, by the way, one of the world’s leading authorities on expertise). She said this about the graph:

“I don’t recognize this graph at all. So the citation is definitely wrong; since it’s not my graph.”

What makes this particularly disturbing is that this graph has popped up all over our industry, and many instructional-design decisions have been based on the information contained in the graph.

Bogus Information is Widespread

I often begin my workshops on instructional design and e-learning and my conference presentations with this graph as a warning and wake up call. Typically, over 90% of the audience raises their hands when I ask whether anyone has seen the numbers depicted in the graph. Later I often hear audible gasps and nervous giggles as the information is debunked. Clearly, lots of experienced professionals in our field know this graph and have used it to guide their decision making.

The graph is representative of a larger problem. The numbers presented on the graph have been circulating in our industry since the late 1960’s, and they have no research backing whatsoever. Dr. JC Kinnamon (2002) of Midi, Inc., searched the web and found dozens of references to those dubious numbers in college courses, research reports, and in vendor and consultant promotional materials.

Where the Numbers Came From

The bogus percentages were first published by an employee of Mobil Oil Company in 1967, writing in the magazine Film and Audio-Visual Communications. D. G. Treichler didn’t cite any research, but our field has unfortunately accepted his/her percentages ever since. NTL Institute still claims that they did the research that derived the numbers. See my response to NTL.

Michael Molenda, a professor at Indiana University, is currently working to track down the origination of the bogus numbers. His efforts have uncovered some evidence that the numbers may have been developed as early as the 1940’s by Paul John Phillips who worked at University of Texas at Austin and who developed training classes for the petroleum industry. During World War Two Phillips taught Visual Aids at the U. S. Army’s Ordnance School at the Aberdeen (Maryland) Proving Grounds, where the numbers have also appeared and where they may have been developed.

Strange coincidence: I was born on these very same Aberdeen Proving Grounds.

Ernie Rothkopf, professor emeritus of Columbia University, one of the world’s leading applied research psychologists on learning, reported to me that the bogus percentages have been widely discredited, yet they keep rearing their ugly head in one form or another every few years.

Many people now associate the bogus percentages with Dale’s “Cone of Experience,” developed in 1946 by Edgar Dale. It provided an intuitive model of the concreteness of various audio-visual media. Dale included no numbers in his model and there was no research used to generate it. In fact, Dale warned his readers not to take the model too literally. Dale’s Cone, copied without changes from the 3rd and final edition of his book, is presented below:

Dale’s Cone of Experience (Dale, 1969, p. 107)

You can see that Dale used no numbers with his cone. Somewhere along the way, someone unnaturally fused Dale’s Cone and Treichler’s dubious percentages. One common example is represented below.

The source cited in the diagram above by Wiman and Meierhenry (1969) is a book of edited chapters. Though two of the chapters (Harrison, 1969; Stewart, 1969) mention Dale’s Cone of Experience, neither of them includes the percentages. In other words, the diagram above is citing a book that does not include the diagram and does not include the percentages indicated in the diagram.

Here are some more examples:



The “Evidence” Changes to Meet the Need of the Deceiver

The percentages, and the graph in particular, have been passed around in our field from reputable person to reputable person. The people who originally created the fabrications are to blame for getting this started, but there are clearly many people willing to bend the information to their own devices. Kinnamon’s (2002) investigation found that Treichler’s percentages have been modified in many ways, depending on the message the shyster wants to send. Some people have changed the relative percentages. Some have improved Treichler’s grammar. Some have added categories to make their point. For example, one version of these numbers says that people remember 95% of the information they teach to others.

People have not only cited Treichler, Chi, Wiman and Meierhenry for the percentages, but have also incorrectly cited William Glasser, and correctly cited a number of other people who have utilized Treichler’s numbers.

It seems clear from some of the fraudulent citations that deception was intended. On the graph that prompted our investigation, the title of the article had been modified from the original to get rid of the word “students.” The creator of the graph must have known that the term “students” would make people in the training / development / performance field suspicious that the research was done on children. The creator of Wiman and Meierhenry diagram did four things that make it difficult to track down the original source: (1) the book they cited is fairly obscure, (2) one of the authors names is spelled wrong, (3) the year of publication is incorrect, (4) and the name Charles Merrill, which was actually a publishing house, was ambiguously presented so that it might have referred to an author or editor.

But Don’t The Numbers Speak The Truth?

The numbers are not credible, and even if they made sense, they’d still be dangerous.

If we look at the numbers a little more closely, they are highly unconvincing. How did someone compare “reading” and “seeing?” Don’t you have to “see” to “read?” What does “collaboration” mean anyway? Were two people talking about the information they were learning? If so, weren’t they “hearing” what the other person had to say? What does “doing” mean? How much were they “doing” it? Were they “doing” it correctly, or did they get feedback? If they were getting feedback, how do we know the learning didn’t come from the feedback—not the “doing?” Do we really believe that people learn more “hearing” a lecture, than “reading” the same material? Don’t people who “read” have an advantage in being able to pace themselves and revisit material they don’t understand? And how did the research produce numbers that are all factors of ten? Doesn’t this suggest some sort of review of the literature? If so, shouldn’t we know how the research review was conducted? Shouldn’t we get a clear and traceable citation for such a review?

Even the idea that you can compare these types of learning methods is ridiculous. As any good research psychologist knows, the measurement situation affects the learning outcome. If we have a person learn foreign-language vocabulary by listening to an audiotape and vocalizing their responses, it doesn’t make sense to test them by having them write down their answers. We’d have a poor measure of their ability to verbalize vocabulary. The opposite is also nonsensical. People who learn vocabulary by seeing it on the written page cannot be fairly evaluated by asking them to say the words aloud. It’s not fair to compare these different methods by using the same test, because the choice of test will bias the outcome toward the learning situation that is most like the test situation.

But why not compare one type of test to another—for example, if we want to compare vocabulary learning through hearing and seeing, why don’t we use an oral test and written one? This doesn’t help either. It’s really impossible to compare two things on different indices. Can you imagine comparing the best boxer with the best golfer by having the boxer punch a heavy bag and having the golfer hit for distance? Would Muhammad Ali punching with 600 pounds of pressure beat Tiger Woods hitting his drives 320 yards off the tee?

The Importance of Listing Citations

Even if the numbers presented on the graph had been published in a refereed journal—research we were reasonably sure we could trust—it would still be dangerous not to know where they came from. Research conclusions have a way of morphing over time. Wasn’t it true ten years ago that all fat was bad? Newer research has revealed that monounsaturated oils like olive oil might actually be good for us. If a person doesn’t cite their sources, we might not realize that their conclusions are outdated or simply based on poor research. Conversely, we may also lose access to good sources of information. Suppose Teichler had really discovered a valid source of information? Because he/she did not use citations, that research would remain forever hidden in obscurity.

The context of research makes a great deal of difference. If we don’t know a source, we don’t really know whether the research is relevant to our situation. For example, an article by Kulik and Kulik (1988) concluded that immediate feedback was better than delayed feedback. Most people in the field now accept their conclusions. Efforts by Work-Learning Research to examine Kulik and Kulik’s sources indicated that most of the articles they reviewed tested the learners within a few minutes after the learning event, a very unrealistic analog for most training situations. Their sources enabled us to examine their evidence and find it faulty.

Who Should We Blame?

The original shysters are not the only ones to blame. The fact that many people who have disseminated the graph used the same incorrect citation makes it clear that they never accessed the original study. Everyone who uses a citation to make a point (or draw a conclusion) ought to check the citation. That, of course, includes all of us who are consumers of this information.

What Does This Tell Us About Our Field?

It tells us that we may not be able to trust the information that floats around our industry. It tells us that even our most reputable people and organizations may require the Wizard-of-Oz treatment—we may need to look behind the curtain to verify their claims.

The Danger To Our Field

At Work-Learning Research, our goal is to provide research-based information that practitioners can trust. We began our research efforts several years ago when we noticed that the field jumps from one fad to another while at the same time holding religiously to ideas that would be better cast aside.

The fact that our field is so easily swayed by the mildest whiffs of evidence suggests that we don’t have sufficient mechanisms in place to improve what we do. Because we’re not able or willing to provide due diligence on evidence-based claims, we’re unable to create feedback loops to push the field more forcefully toward continuing improvement.

Isn’t it ironic? We’re supposed to be the learning experts, but because we too easily take things for granted, we find ourselves skipping down all manner of yellow-brick roads.

How to Improve the Situation

It will seem obvious, but each and every one of us must take responsibility for the information we transmit to ensure its integrity. More importantly, we must be actively skeptical of the information we receive. We ought to check the facts, investigate the evidence, and evaluate the research. Finally, we must continue our personal search for knowledge—for it is only with knowledge that we can validly evaluate the claims that we encounter.

Updated Research

Even after more than a decade, this blog post still provides valuable information explaining the issues — and the ramifications for learning. However, further research has uncovered additional information and has been published in a scientific journal in 2014. You can read a review of that research here.

Our Citations

Chi, M. T. H., Bassok, M., Lewis, M. W., Reimann, P., & Glaser, R. (1989). Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13, 145-182.

Dale, E. (1946, 1954, 1969). Audio-visual methods in teaching. New York: Dryden.

Harrison, R. (1969). Communication theory. In R. V. Wiman and W. C. Meierhenry (Eds.) Educational media: Theory into practice. Columbus, OH: Merrill.

Kinnamon, J. C. (2002). Personal communication, October 25.

Kulik, J. A., & Kulik, C-L. C. (1988). Timing of feedback and verbal learning. Review of Educational Research, 58, 79-97.

Molenda, M. H. (2003). Personal communications, February and March.

Rothkopf, E. Z. (2002). Personal communication, September 26.

Stewart, D. K. (1969). A learning-systems concept as applied to courses in education and training. In R. V. Wiman and W. C. Meierhenry (Eds.) Educational media: Theory into practice. Columbus, OH: Merrill.

Treichler, D. G. (1967). Are you missing the boat in training aids? Film and Audio-Visual Communication, 1, 14-16, 28-30, 48.

Wiman, R. V. & Meierhenry, W. C. (Eds.). (1969). Educational media: Theory into practice. Columbus, OH: Merrill.