Which is better, (1) computer-based animations with audio narration or (2) paper-based diagrams with text narratives?

Suppose further that the animations and diagrams equalized as much as possible the amount of visual information presented, and the words in the narration and text were identical. In other words, the comparison was fair.

Suppose also that the content areas in the learning materials dealt with dynamic spatial causation, utilizing topics seemingly appropriate for dynamic graphical displays. Specifically, the topic areas included:

  • How lightning forms.
  • How a toilet works.
  • How ocean waves work.
  • How a car’s braking system works.

Suppose also that the tests queried learners on retention and transfer? In other words, to gauge their memory, learners were asked questions such as, “Please write down an explanation of how lightning works,” and to assess transfer, they were asked, “What could you do to decrease the intensity of a lightning storm?”

Under those conditions, which would produce the best learning on the questions asked?

A. Animations with audio narration.
B. Paper-based diagrams with text narratives.
C. Both would produce equal learning benefits.

Richard Mayer, Mary Hegarty, Sarah Mayer, and Julie Campbell (all of the University of California at Santa Barbara) created four experiments that attempted to answer this question. Given that each experiment had two comparisons (retention and transfer), they ended up with eight comparisons.

The results were clear. In not one case did the computer-based animations outperform the static paper-based depictions!! In four of the eight cases, the static diagrams outperformed the animations, and in the other four cases, the differences were not statistically significant. The animation conditions never outperformed the paper-based conditions.

The average percentage difference (for the paper-based depictions compared with the animation depiction) was 27% (with the average Cohen’s d effect size of 0.68, a moderately high magnitude difference). The animation conditions never outperformed the paper-based conditions.

The Authors’ Explanations of these Remarkable Findings

For many of us, this result is non-intuitive. Why would paper-based diagrams outperform animations? Although the authors of the research paper make some conjectures, their experiments don’t really shed light on this question. The experiments simply compare animations to paper-based depictions.

The authors suggest that paper-based depictions may have outperformed the animation-based depictions because (described in detail on page 264):

  1. The paper-based depictions involve simultaneous presentation of the graphical illustrations, whereas the animation-based depictions presented the graphical content in a chronological flow with no simultaneity.
  2. The paper-based materials enable learner control through pacing and eye movements, whereas the animations do not.
  3. The paper-based materials are purposely segmented into meaningful units showing crucial states of the system, whereas the animation presents the diagrams in one continuous flow.
  4. The paper-based materials utilize printed words, whereas the animation condition uses audio narration.
  5. The paper-based materials are presented on paper, whereas the animation materials are presented on a computer screen.

In future experiments, these things will need to be varied to determine the actual cause of the differences. Specifically, it would be helpful for e-learning designers to know the relative effectiveness of animations that also show crucial states of the system and enable more learner control.

Other Caveats and Shortcomings

Skeptical instructional designers may wonder about the target audience. Could it be, for example, that these results aren’t relevant for young adults (those who have great experience in using computers)? This worry seems misplaced. The learners in these experiments were all young college students, with an average age about 19 years old. On the other hand, 82% of the learners were women, suggesting that the results may not apply to men.

I worry about the short retention interval. As in most of Mayer’s experiments, immediate tests of retention and transfer are used. In other words, the students encounter the learning material and then are immediately tested on it. This should make us wonder whether the differences between animation and static images would survive the vagaries of cognitive forgetting processes. It might be true for example, that static images help for short retention intervals and animations help for longer—more realistic—retention intervals.

The experiments also use very short learning events—seven minutes or less in length, with some learning sessions lasting only a minute or two. This tends to limit the generalizability of the results. Real-world instructional designers are apt to question these results by noting that animations may energize learners to pay attention to e-learning courses that take, say, 30 minutes or more, whereas static animations are less likely to produce this energizing effect. So while static graphics may work for five-minute snippets of learning, more authentic learning events may benefit from animations.

Despite these major limitations, the findings are compelling. They show, at the very least, that in micro-learning situations, animations may not be as obvious a choice as we might have believed.

The experimental results also are partly consistent with a recent review of the research literature which found no difference in learning results between animations and paper-based depictions (Tversky, Morrison, & Betrancourt, 2002). Neither the current study or the review of the literature found any advantage for animations.

Again, it could be that well-designed animations have a facilitative effect. On the other hand, it appears that more research is needed to uncover principles that outline effective animation design.

Will’s Recommendations for Instructional Designers/Developers:

  1. If possible, utilize evidence-based instructional-design practices to experiment with different animation designs (to see which work for your content, your learners, and your delivery methods). Specifically, compare static graphics to animations and compare different animation designs.
  2. As a first cut in designing animations, enable learners to control the movement from one crucial system state to the next.
  3. As a first cut in designing animations, utilize audio narration, but also provide a text version that can be read separately (not simultaneously).
  4. Consider utilizing the spacing effect by presenting both a dynamic animation and a later static depiction with simultaneous text presentation. The second depiction, because it enables studying, could be utilized with some augmenting questions or exercises to get the learners to think deeply about the dynamic flow of events. Also consider alternating between dynamic and static depictions or presenting the static one before the dynamic.

Citations:

Mayer, R. E.; Hegarty, M.; Mayer, S.; Campbell, J. (2005). When Static Media Promote Active Learning: Annotated Illustrations Versus Narrated Animations in Multimedia Instruction. Journal of Experimental Psychology: Applied, 11, 256-265.

Tversky, B.; Morrison, J. B.; Betrancourt, M. (2002). Animation: Can it facilitate? International Journal of Human-Computer Studies, 57, 247-262.

CPP, Inc., known formerly as Consulting Psychologists Press, announces that it is offering research grants for research on the Myers-Briggs Type Indicator.

This may seem commendable, but their research-grant program is biased. Here are the facts:

  1. CPP makes money by selling MBTI implementations, consulting, and paraphernalia.
  2. The MBTI (Myers-Briggs) is widely discredited by researchers. It is considered neither reliable nor valid. For example, see Pittenger, D. J. (2005). Cautionary Comments Regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal: Practice and Research, 57, 210-221.
  3. The research grant program is biased toward research findings that support the MBTI. Here are some details:
    • CPP, a biased party, selects the grantees.
    • One of the criteria for selection is “advancement of the MBTI assessment.”
    • Money is distributed only for research reports selected by CPP for the “Best Paper Awards.”
  4. Instead of these regrettable procedures, CPP should form a body of unbiased reviewers, have criteria that don’t push toward a confirmatory bias, distribute money for good proposals not “favorable” results, and form an unbiased committee to select the best papers.

This Research Grant Program (as outlined in the publicly available materials produced by CPP) is clearly designed to produce results that support CPP’s financial interests and resurrect the flagging image of the MBTI. Statements in the proposal requiring researchers to “conform to the Americal Psychological Association’s Ethical Principles of Psychologists” do little to overcome the biases built into the program. As the materials make clear, the intention is to provide comfort to CPP’s clients. How else are we to interpret the following statement in CPP’s research-grant announcement?

“Abstracts from the papers will be used by CPP to communicate results with its customers.”

This type of biased research program is completely unacceptable. Not only does it have the potential to create biased information and lead to suboptimal or dangerous recommendations, but it also casts a shadow on fair-and-balanced research that might be used to guide learning-and-performance agendas.

If you’d like to share your thoughts with CPP, it appears that the person to write is available through this email address.

Publication Note

This article was originally published on the Work-Learning Research website (www.work-learning.com) in 2002. It may have had some minor changes since then. It was moved to my WillAtWorkLearning Blog in 2006, and has now been moved here in late 2017.

Updated Research

Even after more than a decade, this blog post still provides valuable information explaining the issues — and the ramifications for learning. However, further research has uncovered additional information and has been published in a scientific journal in 2014. You can read a review of that research here.

Introduction

People do NOT remember 10% of what they read, 20% of what they see, 30% of what they hear, etc. That information, and similar pronouncements are fraudulent. Moreover, general statements on the effectiveness of learning methods are not credible—learning results depend on too many variables to enable such precision. Unfortunately, this bogus information has been floating around our field for decades, crafted by many different authors and presented in many different configurations, including bastardizations of Dale’s Cone. The rest of this article offers more detail.

My Search For Knowledge

My investigation of this issue began when I came across the following graph:

The Graph is a Fraud!

After reading the cited article several times and not seeing the graph—nor the numbers on the graph—I got suspicious and got in touch with the first author of the cited study, Dr. Michelene Chi of the University of Pittsburgh (who is, by the way, one of the world’s leading authorities on expertise). She said this about the graph:

“I don’t recognize this graph at all. So the citation is definitely wrong; since it’s not my graph.”

What makes this particularly disturbing is that this graph has popped up all over our industry, and many instructional-design decisions have been based on the information contained in the graph.

Bogus Information is Widespread

I often begin my workshops on instructional design and e-learning and my conference presentations with this graph as a warning and wake up call. Typically, over 90% of the audience raises their hands when I ask whether anyone has seen the numbers depicted in the graph. Later I often hear audible gasps and nervous giggles as the information is debunked. Clearly, lots of experienced professionals in our field know this graph and have used it to guide their decision making.

The graph is representative of a larger problem. The numbers presented on the graph have been circulating in our industry since the late 1960’s, and they have no research backing whatsoever. Dr. JC Kinnamon (2002) of Midi, Inc., searched the web and found dozens of references to those dubious numbers in college courses, research reports, and in vendor and consultant promotional materials.

Where the Numbers Came From

The bogus percentages were first published by an employee of Mobil Oil Company in 1967, writing in the magazine Film and Audio-Visual Communications. D. G. Treichler didn’t cite any research, but our field has unfortunately accepted his/her percentages ever since. NTL Institute still claims that they did the research that derived the numbers. See my response to NTL.

Michael Molenda, a professor at Indiana University, is currently working to track down the origination of the bogus numbers. His efforts have uncovered some evidence that the numbers may have been developed as early as the 1940’s by Paul John Phillips who worked at University of Texas at Austin and who developed training classes for the petroleum industry. During World War Two Phillips taught Visual Aids at the U. S. Army’s Ordnance School at the Aberdeen (Maryland) Proving Grounds, where the numbers have also appeared and where they may have been developed.

Strange coincidence: I was born on these very same Aberdeen Proving Grounds.

Ernie Rothkopf, professor emeritus of Columbia University, one of the world’s leading applied research psychologists on learning, reported to me that the bogus percentages have been widely discredited, yet they keep rearing their ugly head in one form or another every few years.

Many people now associate the bogus percentages with Dale’s “Cone of Experience,” developed in 1946 by Edgar Dale. It provided an intuitive model of the concreteness of various audio-visual media. Dale included no numbers in his model and there was no research used to generate it. In fact, Dale warned his readers not to take the model too literally. Dale’s Cone, copied without changes from the 3rd and final edition of his book, is presented below:

Dale’s Cone of Experience (Dale, 1969, p. 107)

You can see that Dale used no numbers with his cone. Somewhere along the way, someone unnaturally fused Dale’s Cone and Treichler’s dubious percentages. One common example is represented below.

The source cited in the diagram above by Wiman and Meierhenry (1969) is a book of edited chapters. Though two of the chapters (Harrison, 1969; Stewart, 1969) mention Dale’s Cone of Experience, neither of them includes the percentages. In other words, the diagram above is citing a book that does not include the diagram and does not include the percentages indicated in the diagram.

Here are some more examples:

 

 

The “Evidence” Changes to Meet the Need of the Deceiver

The percentages, and the graph in particular, have been passed around in our field from reputable person to reputable person. The people who originally created the fabrications are to blame for getting this started, but there are clearly many people willing to bend the information to their own devices. Kinnamon’s (2002) investigation found that Treichler’s percentages have been modified in many ways, depending on the message the shyster wants to send. Some people have changed the relative percentages. Some have improved Treichler’s grammar. Some have added categories to make their point. For example, one version of these numbers says that people remember 95% of the information they teach to others.

People have not only cited Treichler, Chi, Wiman and Meierhenry for the percentages, but have also incorrectly cited William Glasser, and correctly cited a number of other people who have utilized Treichler’s numbers.

It seems clear from some of the fraudulent citations that deception was intended. On the graph that prompted our investigation, the title of the article had been modified from the original to get rid of the word “students.” The creator of the graph must have known that the term “students” would make people in the training / development / performance field suspicious that the research was done on children. The creator of Wiman and Meierhenry diagram did four things that make it difficult to track down the original source: (1) the book they cited is fairly obscure, (2) one of the authors names is spelled wrong, (3) the year of publication is incorrect, (4) and the name Charles Merrill, which was actually a publishing house, was ambiguously presented so that it might have referred to an author or editor.

But Don’t The Numbers Speak The Truth?

The numbers are not credible, and even if they made sense, they’d still be dangerous.

If we look at the numbers a little more closely, they are highly unconvincing. How did someone compare “reading” and “seeing?” Don’t you have to “see” to “read?” What does “collaboration” mean anyway? Were two people talking about the information they were learning? If so, weren’t they “hearing” what the other person had to say? What does “doing” mean? How much were they “doing” it? Were they “doing” it correctly, or did they get feedback? If they were getting feedback, how do we know the learning didn’t come from the feedback—not the “doing?” Do we really believe that people learn more “hearing” a lecture, than “reading” the same material? Don’t people who “read” have an advantage in being able to pace themselves and revisit material they don’t understand? And how did the research produce numbers that are all factors of ten? Doesn’t this suggest some sort of review of the literature? If so, shouldn’t we know how the research review was conducted? Shouldn’t we get a clear and traceable citation for such a review?

Even the idea that you can compare these types of learning methods is ridiculous. As any good research psychologist knows, the measurement situation affects the learning outcome. If we have a person learn foreign-language vocabulary by listening to an audiotape and vocalizing their responses, it doesn’t make sense to test them by having them write down their answers. We’d have a poor measure of their ability to verbalize vocabulary. The opposite is also nonsensical. People who learn vocabulary by seeing it on the written page cannot be fairly evaluated by asking them to say the words aloud. It’s not fair to compare these different methods by using the same test, because the choice of test will bias the outcome toward the learning situation that is most like the test situation.

But why not compare one type of test to another—for example, if we want to compare vocabulary learning through hearing and seeing, why don’t we use an oral test and written one? This doesn’t help either. It’s really impossible to compare two things on different indices. Can you imagine comparing the best boxer with the best golfer by having the boxer punch a heavy bag and having the golfer hit for distance? Would Muhammad Ali punching with 600 pounds of pressure beat Tiger Woods hitting his drives 320 yards off the tee?

The Importance of Listing Citations

Even if the numbers presented on the graph had been published in a refereed journal—research we were reasonably sure we could trust—it would still be dangerous not to know where they came from. Research conclusions have a way of morphing over time. Wasn’t it true ten years ago that all fat was bad? Newer research has revealed that monounsaturated oils like olive oil might actually be good for us. If a person doesn’t cite their sources, we might not realize that their conclusions are outdated or simply based on poor research. Conversely, we may also lose access to good sources of information. Suppose Teichler had really discovered a valid source of information? Because he/she did not use citations, that research would remain forever hidden in obscurity.

The context of research makes a great deal of difference. If we don’t know a source, we don’t really know whether the research is relevant to our situation. For example, an article by Kulik and Kulik (1988) concluded that immediate feedback was better than delayed feedback. Most people in the field now accept their conclusions. Efforts by Work-Learning Research to examine Kulik and Kulik’s sources indicated that most of the articles they reviewed tested the learners within a few minutes after the learning event, a very unrealistic analog for most training situations. Their sources enabled us to examine their evidence and find it faulty.

Who Should We Blame?

The original shysters are not the only ones to blame. The fact that many people who have disseminated the graph used the same incorrect citation makes it clear that they never accessed the original study. Everyone who uses a citation to make a point (or draw a conclusion) ought to check the citation. That, of course, includes all of us who are consumers of this information.

What Does This Tell Us About Our Field?

It tells us that we may not be able to trust the information that floats around our industry. It tells us that even our most reputable people and organizations may require the Wizard-of-Oz treatment—we may need to look behind the curtain to verify their claims.

The Danger To Our Field

At Work-Learning Research, our goal is to provide research-based information that practitioners can trust. We began our research efforts several years ago when we noticed that the field jumps from one fad to another while at the same time holding religiously to ideas that would be better cast aside.

The fact that our field is so easily swayed by the mildest whiffs of evidence suggests that we don’t have sufficient mechanisms in place to improve what we do. Because we’re not able or willing to provide due diligence on evidence-based claims, we’re unable to create feedback loops to push the field more forcefully toward continuing improvement.

Isn’t it ironic? We’re supposed to be the learning experts, but because we too easily take things for granted, we find ourselves skipping down all manner of yellow-brick roads.

How to Improve the Situation

It will seem obvious, but each and every one of us must take responsibility for the information we transmit to ensure its integrity. More importantly, we must be actively skeptical of the information we receive. We ought to check the facts, investigate the evidence, and evaluate the research. Finally, we must continue our personal search for knowledge—for it is only with knowledge that we can validly evaluate the claims that we encounter.

Updated Research

Even after more than a decade, this blog post still provides valuable information explaining the issues — and the ramifications for learning. However, further research has uncovered additional information and has been published in a scientific journal in 2014. You can read a review of that research here.

Our Citations

Chi, M. T. H., Bassok, M., Lewis, M. W., Reimann, P., & Glaser, R. (1989). Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13, 145-182.

Dale, E. (1946, 1954, 1969). Audio-visual methods in teaching. New York: Dryden.

Harrison, R. (1969). Communication theory. In R. V. Wiman and W. C. Meierhenry (Eds.) Educational media: Theory into practice. Columbus, OH: Merrill.

Kinnamon, J. C. (2002). Personal communication, October 25.

Kulik, J. A., & Kulik, C-L. C. (1988). Timing of feedback and verbal learning. Review of Educational Research, 58, 79-97.

Molenda, M. H. (2003). Personal communications, February and March.

Rothkopf, E. Z. (2002). Personal communication, September 26.

Stewart, D. K. (1969). A learning-systems concept as applied to courses in education and training. In R. V. Wiman and W. C. Meierhenry (Eds.) Educational media: Theory into practice. Columbus, OH: Merrill.

Treichler, D. G. (1967). Are you missing the boat in training aids? Film and Audio-Visual Communication, 1, 14-16, 28-30, 48.

Wiman, R. V. & Meierhenry, W. C. (Eds.). (1969). Educational media: Theory into practice. Columbus, OH: Merrill.