Testing Effect as an Overlooked but Effective Learning Tool

What it is, how it works, and what can be done to increase its use.

Mar 10, 2022

A small update: Starting today, I’ll be out on a meditation retreat for the next 10 days. I planned on having a post about personality psychology ready before I leave, but yeah… I went down the rabbit hole and it’s going to take longer than expected.

Anyway, here’s an essay about the testing effect I put together a few months back. I suspect it might be helpful to many of you. Warning, it’s more on the sciency side of things, with a bunch of references and all that.

The beneficial effect of testing1 - practicing retrieval of target information - on subsequent retention and learning in a final test, is a well-known, much-tested, and robust phenomenon in cognitive psychology (cf. Adesope, 2017)2. To get an intuitive sense of what the testing effect (or “practice testing”, “retrieval practice” - which I'll use interchangeably throughout this article) is, consider the study of Roediger and Karpicke (2006) as an example. In it, the students were tasked with learning a prose text under two conditions, restudy or retest. After using their respective learning strategy, the students then proceeded to recall the information in the text on three separate occasions - 5 minutes, 2 days, and 1 week after the initial learning session.

Figure 1, copied from Roediger & Karpicke (2006); TE - testing effect

As the Figure 1 shows, the more temporally distant the testing session - 2 days or 1 week - the higher the benefits of practice testing over restudying. In five words: testing facilitates retention over time.

Cognitive Underpinnings of the Testing Effect

The question that cognitive psychology asks itself is: what are the cognitive mechanisms underlying the testing effect? There are several accounts (for a review, see van den Broek et al., 2016), which we can broadly clump into 3 categories.

The first account looks at alterations in semantic memory representations of the target information through

elaboration of associative links in the memory (or "there are many ways to Rome") and/or
through suppression of irrelevant associations (or "there's a highway to Rome").

Literature delivers evidence for both (which is interesting because they seem to be mutually exclusive). For (1), studies show that practicing retrieval facilitates not only the retention of target information, but also of related semantic information (e.g., Carpenter, 2009). This suggests that retention of target information is strengthened through forming additional alternative retrieval routes, making the entire network more likely to be activated during retrieval practice. For example, if I ask you to recall fish, you are also likely to recall related concepts (tuna, salmon, but also maybe sea, river, etc.)

For (2), or the suppression account, studies on retrieval-induced forgetting show that the link between target information and the cue get strengthened over time with the result that related information is less likely to be activated (e.g., Thomas & McDaniel, 2012). For instance, repeated retrieval of "pineapple" to the cue "fruit" facilitates such a link, but also inhibits the alternative response "pear".

The second explanatory category considers mental effort, or how hard one must try to retrieve the to-be-recalled information. The more mental effort expended, the bigger the testing effect, and the more likely the information is to be retained (e.g., Roediger & Butler, 2011).

The final category is known as test-potentiated encoding (TPE), and it assumes that testing improves the efficiency of subsequent encoding (e.g., Grimaldi & Karpicke, 2012). In other words, the more you know - i.e., the broader your knowledge base, the better your learning capacity.

In sum, cognitive psychology mostly investigates the explanatory mechanisms of the testing effect. It has identified several plausible accounts, all of which are - to an extent - empirically supported.

Neural Correlates of the Testing Effect

The research has – next to the cognitive mechanisms - implicated several brain regions relevant for the testing effect (for a review, see van den Broek et al., 2016). The level of analysis is rather gross, but hopefully still informative. The following text follows the same 3 categories mentioned above.

Recall that the testing effect is assumed to alter semantic memory representations (either through elaboration or suppression). Thus, we would assume that the activation patterns in areas related to semantic retrieval - temporal and parietal lobes - would become less similar through elaboration (Wirebring et al., 2015), or more similar through suppression (Xue et al., 2010). As the reader can observe, activation patterns for both accounts have been found.

Selective memory retrieval is effortful: it requires control, attention, and executive capacity. Thus, the areas most likely to be associated with mental effort, the second explanatory category, can be found within the prefrontal cortex. One of the more studied brain areas is the ventrolateral prefrontal cortex (VLPFC). The findings suggest that during the initial stages of practice testing (vs. e.g., restudying) the activity in VLPFC is higher, but the final performance is signified with lower demands on executive control - lower activity (e.g., Badre & Wagner, 2007). In other words, the effects of testing are signified by the inverted-U pattern of activity in the related brain areas: more activity at first, followed by a drop-off later.

Lastly, the results suggest that insula, prefrontal and parietal cortex, and hippocampus are all implicated in TPE. Specifically, when a testing occasion is unsuccessful (i.e., the target information is not recalled), those items receive more attention, which - next time around - leads to successful TPE (e.g., van den Broek et al., 2013).

Now that we know about the assumed mechanisms and associated brain areas, we can turn our attention to the practical use of the testing effect.

Testing Effect in Educational Psychology

Practice testing is, of course, imminently applicable in educational settings. To illustrate, consider the study by McDaniel et al. (2012) (cited in Dunlosky et al., 2013). In it, undergraduate students attended an online psychology course, where they could earn course points each week by completing an online practice activity. This activity took the form of either practice testing with feedback, restudying, or with no information at all. The final unit exam consisted of both the questions used in practice testing, as well as new questions (testing transfer, more on that later). As shown in the Figure 2, practice tests led to the highest course exam grades, irrespective of whether they were applied to repeated (known) questions, or new questions.

Figure 2, copied from Dunlosky et al. (2013); practice tests help both the retention of known information (Repeated Questions) and facilitate transfer (New Questions)

Generalizing further, the testing effect has been found regardless of:

the material (e.g., geography or statistics; Lyle & Crawford, 2011; Kromann et al., 2009);
age group – it works for both the children (Carpenter et al., 2009) and older adults (Meyer & Logan, 2013);
and settings - e.g., on-line (Wiklund-Hörnqvist et al., 2014), or in-class (McDaniel et al., 2007).

Practice testing also surpasses other pedagogical learning strategies, such as group discussions (e.g., Stenlund et al., 2017) and mind mapping (e.g., Karpicke & Blunt, 2011). As a self-study technique, retrieval practice outperforms other common ways of studying3 that include re-reading and highlighting text (Dunlosky et al., 2013). Finally, the effects of testing have also been found to facilitate transfer (using the acquired information in novel situations), the holy grail of educational efforts (Rohrer et al., 2010). All this has led Dunlosky et al. (2013) to conclude, in their massive review of effective learning techniques, that practice testing is of high utility.

Putting it Together

Despite knowing so much about the testing effect, it is often overlooked as a learning technique both by the self-learners, and by the educators4 (McDermott, 2021). When we study, most of us tend to highlight, re-read, or summarize information. If we are feeling especially fancy, we annotate. And while these techniques have been found to be somewhat effective (cf. Dunlosky et al., 2013), they pale in comparison to the possible retention effects of retrieval practice.

The question is then: why is such a simple technique, backed by solid empirical evidence, not utilized as much as it could? In the following, I will a) discuss the possible reasons why; b) link them to (some of) the discussed ideas above, and; c) derive recommendations for practice.

You might recall that one of the accounts that explains the testing effect pertains to mental effort. Thus, as a self-study technique, retrieval practice is rivaled by other popular - and less effortful - learning techniques such as highlighting, re-reading, or summarizing. If the decision to study with a certain learning technique is conceptualized as a self-control conflict within the value-based choice model (Berkman et al., 2017), this would mean that many self-learners opt for the less effortful variant - even though these might not be the most effective - because their positive value inputs are higher than those of retrieval practice.

I have discussed, in the section about TPE, that unsuccessful retest can lead to allocating more attention to the problematic cue. Consequently, when the student sees the cue the next time around, they are more likely to recall the answer. Failure, in other words, is cognitively beneficial. What I didn’t mention, however, is that such an unsuccessful attempt is likely to be accompanied by a negative affective reaction. Thus, another reason why learners don't utilize retrieval practice as often as they could (and should) might be because testing feels bad: it shows how inadequate the current knowledge is. In contrast, restudy – another learning technique - breeds familiarity and encoding fluency through repeated exposure - it feels good, and is, as a result, more likely to be chosen by the learner.

So, what can be done to make retrieval practice more likely to be picked up as a learning technique?

First, practice testing clearly needs a boost to make it more attractive compared to less effortful (and less efficient) options such as re-reading and highlighting. The possible intervention could thus focus on a) lowering the mental effort required to use retrieval practice, and b) improving student's meta-knowledge about testing effect's effectiveness, despite the negative emotion. How might this look?

To lower the initial friction caused by high mental effort, a possible intervention could target formation of habits. Such intervention could be initiated through specific and achievable goal setting and reinforced by habit piggybacking (choosing after which behaviors specifically should a bout of practice testing occur). It could also include helping the student build a simple system that lets them practice daily. Indeed, many learners (including myself) have built such habits through, for example, a flashcard application (e.g. Anki; for a good primer, see here). The added benefit of these apps is that not only one employs practice testing, but also distributed practice (spacing out the to-be-learned information over days, weeks, and months) and interleaving (mingling multiple topics together), both of which are also effective learning tools (cf. Dunlosky et al., 2013).

Figure 3, My Anki stats in the past year. Bar color represents the maturity of a card. Orange and red means learning and relearning, respectively (i.e., new, or problematic cards); the different shades of green mean "mature" (i.e., known, or learned) cards.

Besides building habits to reduce friction, the intervention should also target the possible negative emotion arising from practice testing. How? Firstly, the student learns that practice testing is an effective study technique, despite the negative affect (Emmerdinger & Kuhbandner, 2019). Secondly, the student gives the technique an honest try for a prolonged period5. Repeated exposure should, over time, lead to lower or no negative emotion while practicing retrieval.

In sum, the combination of goal setting, information provision, habit piggybacking, and the employment of a digital tool such as Anki should make it easier for the learner to pick up and integrate practice testing into their study routine.

Summary

Most of us don’t question that learning to play a musical instrument or acquiring any skill requires practice and endless repetition. Yet we somehow don’t apply the same logic to the retention of information we want to learn. We assume that it enters our skulls through re-reading, summarizing, or highlighting. I believe this is misguided. What I tried to show is that retrieval practice is an effective, albeit often overlooked, learning tool; It can be applied in a broad range of domains, facilitates transfer of learned information, and can be employed regardless of material, age, and settings. Further, I suggested that the effects of testing come about in complex ways and are mostly allocated to brain areas related to semantic retrieval and executive control. Finally, to make testing more palatable as a self-study technique, I recommended two approaches: building a habit out of it and addressing the possible negative emotion through information provision and repeated exposure.

References

Adesope, O. O., Trevisan, D. A., & Sundararajan, N. (2017). Rethinking the Use of Tests: A Meta-Analysis of Practice Testing. Review of Educational Research, 87(3), 659–701. https://doi.org/10.3102/0034654316689306
Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45(13), 2883–2901. https://doi.org/10.1016/j.neuropsychologia.2007.06.015
Berkman, E. T., Hutcherson, C. A., Livingston, J. L., Kahn, L. E., & Inzlicht, M. (2017). Self-Control as Value-Based Choice. Current Directions in Psychological Science, 26(5), 422–428. https://doi.org/10.1177/0963721417704394
Carpenter, S. K. (2009). Cue strength as a moderator of the testing effect: The benefits of elaborative retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(6), 1563.
Carpenter, S. K., Pashler, H., & Cepeda, N. J. (2009). Using tests to enhance 8th grade students’ retention of U.S. history facts. Applied Cognitive Psychology, 23(6), 760–771. https://doi.org/10.1002/acp.1507
Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology. Psychological Science in the Public Interest, 14(1), 4–58. https://doi.org/10.1177/1529100612453266
Emmerdinger, K. J., & Kuhbandner, C. (2019). Tests improve memory – no matter if you feel good or bad while taking them. Memory, 27(8), 1043–1053. https://doi.org/10.1080/09658211.2019.1618339
Grimaldi, P. J., & Karpicke, J. D. (2012). When and why do retrieval attempts enhance subsequent encoding? Memory & Cognition, 40(4), 505–513. https://doi.org/10.3758/s13421-011-0174-0
Karpicke, J. D., & Blunt, J. R. (2011). Retrieval Practice Produces More Learning than Elaborative Studying with Concept Mapping. Science. https://doi.org/10.1126/science.1199327
Lally, P., van Jaarsveld, C. H. M., Potts, H. W. W., & Wardle, J. (2010). How are habits formed: Modelling habit formation in the real world. European Journal of Social Psychology, 40(6), 998–1009. https://doi.org/10.1002/ejsp.674
Lyle, K. B., & Crawford, N. A. (2011). Retrieving Essential Material at the End of Lectures Improves Performance on Statistics Exams. Teaching of Psychology, 38(2), 94–97. https://doi.org/10.1177/0098628311401587
McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19(4–5), 494–513. https://doi.org/10.1080/09541440701326154
McDermott, K. B. (2021). Practicing Retrieval Facilitates Learning. Annual Review of Psychology, 72(1), 609–633. https://doi.org/10.1146/annurev-psych-010419-051019
Meyer, A. N. D., & Logan, J. M. (20130225). Taking the testing effect beyond the college freshman: Benefits for lifelong learning. Psychology and Aging, 28(1), 142. https://doi.org/10.1037/a0030890
Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences, 15(1), 20–27. https://doi.org/10.1016/j.tics.2010.09.003
Roediger, H. L., & Karpicke, J. D. (2006). Test-Enhanced Learning: Taking Memory Tests Improves Long-Term Retention. Psychological Science, 17(3), 249–255. https://doi.org/10.1111/j.1467-9280.2006.01693.x
Rohrer, D., Taylor, K., & Sholar, B. (2010). Tests enhance the transfer of learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36(1), 233–239. https://doi.org/10.1037/a0017678
Stenlund, T., Jönsson, F. U., & Jonsson, B. (2017). Group discussions and test-enhanced learning: Individual learning outcomes and personality characteristics. Educational Psychology, 37(2), 145–156. https://doi.org/10.1080/01443410.2016.1143087
Thomas, R. C., & McDaniel, M. A. (20120625). Testing and feedback effects on front-end control over later retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(2), 437. https://doi.org/10.1037/a0028886
van den Broek, G. S. E., Takashima, A., Segers, E., Fernández, G., & Verhoeven, L. (2013). Neural correlates of testing effects in vocabulary learning. NeuroImage, 78, 94–102. https://doi.org/10.1016/j.neuroimage.2013.03.071
van den Broek, G., Takashima, A., Wiklund-Hörnqvist, C., Karlsson Wirebring, L., Segers, E., Verhoeven, L., & Nyberg, L. (2016). Neurocognitive mechanisms of the “testing effect”: A review. Trends in Neuroscience and Education, 5(2), 52–66. https://doi.org/10.1016/j.tine.2016.05.001
Wiklund-Hörnqvist, C., Jonsson, B., & Nyberg, L. (2014). Strengthening concept learning by repeated testing. Scandinavian Journal of Psychology, 55(1), 10–16. https://doi.org/10.1111/sjop.12093
Wirebring, L. K., Wiklund-Hörnqvist, C., Eriksson, J., Andersson, M., Jonsson, B., & Nyberg, L. (2015). Lesser Neural Pattern Similarity across Repeated Tests Is Associated with Better Long-Term Memory Retention. Journal of Neuroscience, 35(26), 9595–9602. https://doi.org/10.1523/JNEUROSCI.3550-14.2015
Xue, G., Dong, Q., Chen, C., Lu, Z., Mumford, J. A., & Poldrack, R. A. (2010). Greater Neural Pattern Similarity Across Repetitions Is Associated with Better Memory. Science. https://doi.org/10.1126/science.1193125

Testing as understood within the confines of this article, is to be understood not via the scholastic lens of taking a graded test, but rather refers to the activity of quizzing oneself on the to-be-learned content.

There is a considerable overlap between cognitive psychology and educational psychology as the “home discipline”. I chose Adesope’s categorization but I can safely say that the decision is mostly arbitrary.

Whether these should be considered learning techniques, knowing what we know about how humans learn, is debatable but irrelevant for the current discussion.

The space precludes discussing what the educators could do to make the adoption of retrieval practice in classrooms more widespread than it is now. I refer you to a practical resource here, should you be interested.

Habit formation ranges with complexity of behavior, stability of the environment and a dozen other variables, but a minimal time before a habit is formed is approximately 18 days, according to Lally and colleagues, (Lally et al., 2010).

TheQuaintPickle