Introduction

The research on embodied cognition has shown that cognition involves relationships between the brain, the body, and the environment (see Foglia & Wilson, 2013; Glenberg, 1997; Glenberg et al., 2013; Stolz, 2015; Wilson, 2002). The finding that cognitive processes may be affected by body movements and posture, and interactions with the environment, has important implications for learning and instruction. When the brain, body, and environment act together, such as when making or observing human movements in an instructional setting, learning and problem solving can be boosted (see Abrahamson et al., 2020).

The instructional research about making or observing human movements typically involves the hand motions of gesturing and object manipulation (e.g., Castro-Alonso et al., 2015; de Koning et al., 2019; Novack & Goldin-Meadow, 2015; Post et al., 2013; Pouw et al., 2020; Zhang et al., 2023), which have shown to be effective aids to promote learning about a range of diverse topics, such as writing foreign characters (Lajevardi et al., 2017), solving math problems (Goldin-Meadow et al., 2001; Wang et al., 2022), understanding graphs (Duijzer et al., 2019), and playing the piano (Mierowsky et al., 2020). As reviewed by de Koning and Tabbers (2011; see also Skulmowski & Rey, 2018), there are also embodied cognition effects with human movements that not only use the hands, but, for example, the arms (cf. Gálvez-García et al., 2020), eyes (e.g., Beege et al., 2017) and the whole body (e.g., Mavilidi et al., 2020; Scheiter et al., 2020). While this article includes some research on whole-body movements, our primary focus is on hand movements, specifically gesturing and object manipulation, due to their widespread presence in instructional studies.

The aims of this narrative review are (1) to propose six distinct yet interconnected research pathways (avenues) for categorizing various studies, most of which are associated with embodied learning and instruction; (2) to explain through these six avenues why human movements (e.g., gesturing and manipulations) are effective for learning and instruction; and (3) to include the influencing features that enhance learning and instruction in these avenues. With the first aim (categorizing different studies), we contribute to the literature by including studies that have not traditionally been associated with embodied cognition research. With the second and third aims (explaining the avenues and the influencing features), we contribute by suggesting future research. This includes potential follow-up studies that incorporate the influencing features identified in the studies we have reviewed.

The first three research avenues (physical activity, generative learning, and offloaded cognition) apply when students make human movements. The fourth and fifth avenues (specialized processor and signaling) apply when students observe human movements being made by others (e.g., teachers, instructors, peers). The sixth avenue (social cognition) concerns mechanisms that are triggered when the students either make or observe others’ movements.

These six avenues are listed in Table 1 along with brief explanations, examples, and key features. As shown in the fourth column of Table 1, different variables can influence the extent to which embodiment supports learning. In the next sections, we address each of the six avenues in more detail following the same structure: introduction and explanation, evidence, influencing feature(s), and conclusion. When describing each study, we included (where possible) its number of participants, indicative of the inferential power of the study, and its female percentage, as some studies (e.g., Castro-Alonso et al., 2019c) show that the gender variable may affect learning.

Table 1 Six research avenues that help explain beneficial embodied cognition effects on learning and instruction

Making Human Movements

When students make movements, such as gestures and object manipulations, they are involved in a rich type of personal processing that triggers motoric and perceptual activity to deal with the learning tasks (see Wilson, 2002). As a result of this rich processing, making gestures (e.g., Mierowsky et al., 2020; Pouw et al., 2020; Zhang et al., 2022) and object manipulations (e.g., Forbes-Lorman et al., 2016; Höst et al., 2013; Kontra et al., 2015) are effective strategies to enhance learning in many domains. The three research avenues described next can help explain why making these human movements is beneficial for cognition and learning.

Physical Activity

Introduction and Explanation

Engaging in physical activity is associated with enhanced cognitive processing and learning (see Erickson et al., 2015; Ludyga et al., 2020; Nazlieva et al., 2019). As reviewed by Stillman et al. (2016), these beneficial effects range from cellular mechanisms to brain effects and whole body behavioral and socioemotional consequences. Usually, these effects are related to demanding physical exercising (e.g., aerobic and resistance training; see Pothier & Bherer, 2016), but there is also compelling evidence that less strenous physical activity is helpful for cognitive processing, as presented next.

Evidence

Empirical evidence reveals that involvement in physical activity not only indirectly bolsters learning through the enhancement of underlying cognitive processes, but also directly contributes to the improvement of learning outcomes. Indirect effects of physical activity on learning have been reported in studies showing that engaging in physical activity improves visuospatial processing (e.g., Kao et al., 2020; Wang et al., 2019), a key cognitive variable for learning about medicine, anatomy, biology, chemistry, and other disciplines (see Castro-Alonso, 2019).

For example, in an experiment with 30 male adult participants, Wang et al. (2019) compared two conditions following training sessions in a visuospatial working memory n-back task. For one condition (exercise), all sessions started with a 30-min running activity on a treadmill, before practicing the working memory task. For the other condition (control), the sessions began with academic readings for 30 min. Results showed that the condition engaging in aerobic exercising showed larger transfer to another visuospatial task than the reading (control) group. In another example, Kao et al. (2020) used a within-subjects design to compare visuospatial performance of 23 undergraduates (52% females) following a 20-min session of either walking on a treadmill or sitting on a chair. Results also revealed positive effects of this mild exercising on a visuospatial n-back task attempted later.

In addition to these indirect effects on visuospatial processing, there is also evidence of direct effects on learning associated with engaging in physical activity (e.g., Mavilidi et al., 2018; see also Bjorklund, 2022). For example, Mavilidi et al. (2020) summarized five studies where preschool children were exposed to various domains of learning (e.g., language, geography, and science), with the comparative conditions being their engagement, or lack thereof, in simultaneous physical activity. Results showed that participants in the simultaneous physical activity conditions presented higher learning outcomes than participants in the control conditions who did not make movements. Critically, Mavilidi et al. (2020) also observed that not all movements were equally effective, suggesting that the type and timing of the movement can influence the effects of physical activity on learning, as described next.

Influencing Features

Both the relevance of a particular movement to the learning task, as well as its temporal integration within the task, play a significant role in determining how physical activity influences learning outcomes (see Mavilidi et al., 2018; see also Skulmowski & Rey, 2018). This means that making movements which are not relevant to the learning task (e.g., moving the hands randomly) may not engage as many embodied cognitive mechanisms as movements that are directly applicable to the task (e.g., moving the hands up when learning about height). Similarly, when considering temporal integration, embodied mechanisms may be stimulated more when movements are executed concurrently with or shortly before the learning process, compared to those conducted at more extended temporal intervals from the learning event. Hence, human movements related in a meaningful and timely way to the learning task are more effective than irrelevant or non-proximate motions.

The influencing feature of relevance was investigated in the studies described by Mavilidi et al. (2020). They compared physical activity with relevant movements for the learning tasks (e.g., walking between models of the Sun and Mercury to learn the distances between them) versus physical activity with irrelevant movements (e.g., walking around the astronomy models). Larger learning benefits were observed with relevant movements.

In addition to whole body actions, the relevance feature can also affect hand motions, such as gesturing. For example, in two experiments with a total of 190 undergraduates, Zhang et al. (2021) compared three groups learning about statistics (e.g., probability distributions) through videos: (a) making relevant gesturing movements, (b) making irrelevant gesturing movements, and (c) not making gestures (control). Results showed significantly higher performance when students made relevant gestures (e.g., moving the hands horizontally to learn about the distribution), compared to both irrelevant gesturing (e.g., moving the hands vertically) or no gesturing. These findings echo previous results with 115 university students (54% females) explaining solutions to math problems, in which the participants making meaningful movements (gestures) were more effective than those making either meaningless hand movements or no movements (Cook et al., 2012).

The timing feature was reported by Statton et al. (2015) in an experiment with 24 adults attempting a motor skill learning task after conditions of running without rest, running plus 1-h rest, or walking (control). Results showed that the beneficial effects of the aerobic exercise on motor learning were diminished when there was 1 h of rest between the exercise and the learning task. Similarly, Kashihara and Nakahara (2005) conducted within-subjects experiments with six adult males completing a choice reaction task after either exercise (cycling) or resting (control). Results showed faster choice reactions immediately after exercising than immediately after resting, but this difference between conditions disappeared after 8 min of the task. These studies support that cognition and learning have a greater improvement after close to immediate physical activity rather than after longer times.

Conclusion

The research avenue of physical activity can be used as a basis for some of the beneficial learning effects of making human movements, including whole body movements, gesturing, and object manipulation. When students make these movements, effects of exercising and of less strenuous physical activity may operate at different levels, including cell level (e.g., neurons) and brain level (e.g., hippocampus). Importantly, the type and timing of movement can influence these effects. When the produced movements link what the mind learns to the actual body motion, and these movements are made closer in time to the learning task, the greatest embodied effects that connect the brain and body can be obtained.

We next describe generative learning, another avenue categorized under those investigating the making of human movements, which mostly includes finer motor-skills rather than those used in the physical activity avenue.

Generative Learning

Introduction and Explanation

Wittrock (1989) described generative processing as the cognitive activities making connections between the learning contents and personal beliefs, knowledge, and experience. Examples of generative learning activities (see Fiorella & Mayer, 2016b; Wittrock, 1989) include composing questions, writing summaries, drawing pictures, and enacting (e.g., gesturing and object manipulation); the latter being the focus of the present article. Generative actions are effective learning strategies because they allow students to make personal connections between the learning materials and their existing knowledge or experience (see Fiorella & Mayer, 2016b; see also Castro-Alonso et al., 2021a; Fiorella, 2023).

Evidence

The study by Macken and Ginns (2014) can be considered an example of effective generative learning in the form of making hand actions. In the study, which has also links to the enacting research, the authors investigated 42 adults (74% females) learning about the human heart through visualizations and texts. In one experimental condition, participants could freely use their fingers to point and trace connections between the visualizations and texts. Results on retention and comprehension tests showed that participants in the gesturing condition outperformed those in the control condition without gesturing (see also a replication by Ginns & Kydd, 2020).

Also, an example of enacting via object touching is provided by Novak and Schwan (2021), who investigated adult participants learning about animal husbandry tools in one of four experimental groups: touching and observation of the objects (haptics and vision), touching without observation (haptics only), no manipulation but only observation (vision only), and control without touching or observation (no objects). Three weeks afterwards, recall of the tools was higher in the group that had touched and observed, compared to the group who only observed.

Analogously, examples of manipulations can consider the scenarios where students manipulate interactive multimedia, which is sometimes known as engagement (see Castro-Alonso & Fiorella, 2019). Chi and Wylie (2014) described a cognitive engagement framework with distinguishing features: interactive, constructive, active, and passive (ICAP). Aligned with generative learning and enacting literature, the ICAP framework predicts that learning activities fostering interactive or constructive engagement will be more effective than activities fostering active or passive engagement.

The study by Bokosmaty et al. (2017) can be framed under the ICAP framework of multimedia engagement. In this study, 60 primary school students (50% females) were randomly assigned to learn about geometrical angles under different conditions. One condition (constructive) allowed participants to use the computer mouse to drag the angles to see different degrees. In another condition (passive), participants observed the researcher using the mouse to drag the angles. Results for the retention test showed that making the dragging manipulations was more effective than observing them (see also Schwartz & Plass, 2014).

Influencing Features

The research avenue of generative learning focuses on the learning that students get from instructional materials when they interact with them using human movements such as gesturing and manipulation. An influencing feature of this avenue is the degree of personalization or creativity allowed by the learning material or task. Making personal or creative human movements will allow more effective generative learning than copying movements. The personal aspect of these movements could also trigger positive emotions (e.g., motivation, pride) about these creations (e.g., Norton et al., 2012), so emotion (affect) can be considered another influencing feature of the generative learning avenue (see about emotion, cognition, and learning in Plass & Kalyuga, 2019; see also Fraser et al., 2015).

The comparison between making personal and non-personal human movements was made by Mason et al. (2013) with the task of drawing. In this study, 199 seventh grade students (47% females) were randomly assigned to one of three conditions: generative drawing (free drawing), copying drawing (tracing over and joining the dots), and no drawing. The task was to learn the behavior of pendulums in a five-ball Newton’s Cradle shown through an animation. Comprehension tests revealed that the students in the generative drawing condition outperformed both the copying and the no drawing conditions, which did not differ from each other (see analogous findings for highlighting instructional texts in Fowler & Barker, 1974).

Regarding emotional aspects, current studies (e.g., Ginns & King, 2021; Wang et al., 2022) have considered the variable of intrinsic motivation when students make gestures for learning. Ginns and King (2021) investigated 44 university students (70% females) making pointing and tracing gestures to learn astronomy, while Wang et al. (2022) studied 93 primary school students (47% females) and 90 university students (49% females) making tracing gestures to learn math. Both studies compared groups of students making gestures against groups of students not making these hand motions. Consistently, the studies reported that making these gestures supported learning and was connected to higher intrinsic motivation.

Moreover, the personal feature and its impact on emotional aspects can be related to the IKEA effect (Norton et al., 2012) of manipulative tasks, in which the name of the effect is related to the IKEA brand of furniture, as these items generally require some buyer’s assembly. For example, in four experiments with 315 university participants (50% females), Norton et al. (2012) required volunteers to build simple paper shapes, Lego models, and IKEA boxes. The affect that these volunteers showed to their self-produced items was compared to the affect that non-builders showed to the same objects. As predicted, builder participants believed that their produced objects were better than non-builders. This indicates an emotion to prefer manipulations leading to personal objects, and it is related to the generative activity whose outcome has its own (creative) touch.

Conclusion

The research avenue of generative learning can be used to explain some of the beneficial learning effects of making human movements, including the enacting movements of gesturing and object manipulation, and other forms of generative human movements (e.g., engagement with interactive multimedia and drawing). Compared to the physical activity avenue, this form of generative learning research usually investigates human movement with finer motor skills (e.g., hand and finger motion). However, the most important difference is the personal input (and produced emotions), which is considered in the generative learning research literature. As such, allowing personal gestures and object manipulations should increase the generative learning effects, compared to only allowing copying or imitating hand movements. The next research avenue also involves finer and more precise movements than the physical activity avenue, but it does not include the personal and emotional aspects of generative learning.

Offloaded Cognition

Introduction and Explanation

A review by Risko and Gilbert (2016) reported evidence that some cognitive processes, such as working memory processing, can be somewhat offloaded onto-the-body or into-the-world. In other words, parts of the body beyond the brain, and also parts of the environment beyond the body, can act as distributors that help support cognitive processing in the brain (see Foglia & Wilson, 2013). For example, gesturing can help partially offload cognition from the brain onto the hands, such as using finger counting to solve arithmetic tasks (see Neveu et al., 2023). Also, manipulations can help partially offload cognition from the brain into the manipulative object in the environment, such as using stones to help adding numbers mentally.

As the avenues we present here are not mutually exclusive, offloading cognition is also a phenomenon involved in some of the generative learning activities. For example, when making a personal drawing onto a piece of paper, that generative learning task can also entail offloading a mental visualization into the paper. The key difference between offloading and generative learning avenues is that offloaded cognition researchers are mainly interested in how working memory processing is aided by making human movements, rather than the personal and emotional processes associated with generative learning. Next, we present evidence of this avenue in learning contexts.

Evidence

The offloaded cognition avenue has provided several examples where making gestures can lower the cognitive burden on working memory, enhancing performance and learning (e.g., Chu & Kita, 2011; Goldin-Meadow et al., 2001; Mierowsky et al., 2020; Pyers et al., 2021). For example, Marstaller and Burianová (2013) assessed the working memory capacity of 58 undergraduate psychology students (83% females) and found that only low-working memory capacity students benefitted from making pointing gestures when explaining number equations. A likely interpretation of these results is that the cognitive resources of students with low-working memory capacity get overburdened more easily by demanding tasks and need the scaffolding resources provided by gesturing. In contrast, students with higher working memory capacity do not need the offloading support of gesturing, as it is redundant (see redundancy in Kalyuga & Sweller, 2022).

Similar results were reported in the study by Pouw et al. (2016), which involved 20 adult participants (75% females) solving a virtual Tower of Hanoi manual puzzle. It was observed that participants who made pointing gestures reduced the number of eye movements to solve the task. This reduction in eye movements associated with gesturing, which suggests a shift from visual-based to gestural-based processing, was more evident in participants with lower visual working memory capacity.

Offloaded cognition that uses the environment has been investigated with manipulative objects, in which these manipulatives help to solve arithmetic (e.g., Carlson et al., 2007), science (e.g., Stull et al., 2012), or mental rotation tasks (e.g., Weis & Wiese, 2019). For example, Vallée-Tourangeau et al. (2016) investigated 52 psychology students (87% females) performing mental arithmetic under two conditions. In the manipulative condition, students were given number tokens that could be manipulated during the mental calculations. In the control condition, students could not use their hands for any purpose. Also, all students engaged in articulatory suppression (mental repetition of a short word), in order to reduce their total working memory capacity. Results revealed that the manipulative condition outperformed the control group. This means that the vocal repetition that interfered with working memory was not as problematic when the participants could physically manipulate the number tokens. Arguably, by offloading the arithmetic task with the manipulative tokens, the students could better manage the calculations with the few working memory resources left due to articulatory suppression.

Influencing Feature

The offloaded cognition perspective assumes a distribution of cognitive processing from the brain onto the hands or the manipulative objects when it is needed the most. In other words, the influencing feature of this avenue is the degree of load on working memory when processing the given task. This influencing feature can have two contributing factors, either through the complexity of the task or the available working memory capacity of the participant. Regarding task complexity, easy tasks that do not demand much working memory processing would be in less need of offloading cognition than complex tasks that can overload working memory. Note that task complexity depends on the task itself and also on the expertise of the learner attempting the task (see Chen et al., 2023). Regarding participants’ cognitive capacity, those with high working memory capacity (or higher expertise) would need fewer offloading scaffolds than those with lower available working memory capacity.

The two contributing factors of this influencing feature were investigated in a gesturing study by Eielts et al. (2020). In this experiment, 73 university students (59% females) completed virtual Tower of Hanoi tasks at two levels of difficulty. Participants made more gestures to solve the more complex level, compared to the easy level. Also, participants with lower visual working memory capacity who made gestures solved problems at both levels more rapidly than their counterparts who did not use gestures. Similarly related to participants’ available cognitive capacity, a recent review by Neveu et al. (2023) revealed that children with math learning disabilities made more finger gestures, compared to learners without these disabilities, when solving arithmetic problems.

Employing a novel computer task that involved manipulations, Gilbert and colleagues (Ball et al., 2022; Gilbert, 2015) have also investigated both factors of this influencing feature. Concerning complexity of the task, Gilbert (2015) reported a study where 100 participants (57% females) engaged in the interactive multimedia task with two levels of complexity. On each trial, participants had to drag ten on-screen circles from the center to specified screen margins (e.g., Circle 1 to the left limit of the screen, Circle 2 to the top limit, etc.). The circles disappeared when released in these limits. Participants were free to adopt the offloaded manipulation strategy of releasing the circles just before reaching the disappearance limits. Hence, the strategy reduced memorizing the positions of the circles. Results showed that this offloading manipulation was used more frequently in the more complex level. Concerning participants’ working memory capacity, in a study with 268 undergraduates (69% females), Ball et al. (2022) compared high- versus low-working memory capacity participants attempting these computer manipulative tasks with the option of offloading cognition. As expected, when using this offloading option, individuals with low working memory capacity increased their performance to a greater extent than those with high capacity.

Conclusion

The research avenue of offloaded cognition can also be used to explain some of the beneficial educational effects of making human movements, notably gesturing and object manipulations. The influential feature of these effects is the demand on working memory capacity. When the cognitive resources are being exhausted by a complex learning task (and/or by a limited working memory capacity to process this complexity), students can use the cognitive scaffolds provided by the hands or the manipulatives as aids for a better performance. This avenue also predicts that making human movements is less effective when the cognitive resources are not being challenged (e.g., attempting an easy learning task).

Observing Human Movements

In addition to making human movements, observing movements made by other humans can also trigger embodied cognition mechanisms (see Duijzer et al., 2019; Skulmowski & Rey, 2018). Not only is there accumulated evidence showing that making gestures and object manipulation is effective for learning and more general cognition, as we presented above: there is also a body of research showing that observing gestures (e.g., Bentley et al., 2023; Brucker et al., 2015; Pi et al., 2019) and manipulations (e.g., Cui et al., 2017; de Koning et al., 2019) is beneficial.

Evidence suggests that these mechanisms are triggered by the mirror neuron system and other imitation systems that match the production of human actions to the observation of these actions made by other humans (see Rizzolatti & Craighero, 2004; see also Cracco et al., 2018; van Gog et al., 2009). In other words, due to these systems, observing embodied actions (e.g., gesturing and object manipulation) can automatically activate similar neural processes that are involved in making these actions (e.g., Fadiga et al., 1995). Consequently, both making or observing these actions may have beneficial effects on learning and cognition (e.g., Feyereisen, 2009). The two research avenues described next can help explain the reasons for the beneficial learning effects of observing these human movements.

Specialized Processor

Introduction and Explanation

The multicomponent model of working memory by Baddeley and Hitch (1974; see also Baddeley, 1992) defines working memory as a system with two limited and relatively independent subsystems, one processing visuospatial information (e.g., written text, visualizations) and the other one managing auditory information (e.g., narrations). The identification of these two limited subprocessors in working memory has led to a number of instructional phenomena, such as the modality effect within cognitive load theory research (e.g., Mousavi et al., 1995; see Castro-Alonso et al., 2019a; Castro-Alonso & Sweller, 2022). The modality effect states that learning from mutually referring pictures and text is superior when the text is presented in spoken format rather than written format, due to using the visuospatial and auditory subsystems simultaneously, compared to using only one subsystem (e.g., visuospatial) and risking overloading it (see Ginns, 2005; Reinwein, 2012).

The updated multicomponent model by Baddeley (2012) and its subsequent extension by Sepp et al. (2019) propose the potential for distinct processing of instructional visuospatial elements (e.g., written text, illustrations, pictures) and visuospatial information conveyed through human movements. In other words, as well as visuospatial and auditory information providing two separate processing streams in working memory, observing human movements could potentially add a third processing stream by a specialized processor (see also Wong et al., 2009). Hence, observing gestures, manipulations, and other human movements could recruit additional working memory resources to deal with this information, and free more working memory capacity to deal with the visuospatial and auditory information conveyed in the instructional topics.

Additionally, the specialized processor for human movement could lead to richer representations of the instructional topics in working memory. As such, if the information processed by this specialized processor is integrated with the information from the visuospatial or auditory processor, learning could be boosted. This is like the referential connections (see Clark & Paivio, 1991) or the mental conversions (see Mayer, 2022) that boost learning by allowing integration of information between the visuospatial and auditory processors. The integration allowed by the specialized processor should be greater when the information conveyed in the different streams is complementary, not redundant (see Kalyuga & Sweller, 2022). In all, the specialized processor avenue would not only involve research about a reduction in potential overload in one processing stream, but also about nonredundant interconnections between two or three different processing streams.

Evidence

The cognitive optimization due to a specialized processor for human movement can be used to explain some of the beneficial learning effects of observing gestures (e.g., Austin et al., 2018; Bentley et al., 2023; Brucker et al., 2015, 2022) and manipulations (e.g., Feyereisen, 2009; Springer, 2014). For example, Feyereisen (2009) investigated 44 adults (59% females) attempting two memory tasks (cued recall and recognition) with diverse gestures of hand actions (e.g., “peel a potato,” “sharpen a pencil,” “push a balloon”). Following a within-subject experimental design, all participants completed the memory tasks after: (a) only reading the description of each action (control), (b) reading and observing the gesture of each action being made by the experimenter, and (c) reading and making the gesture of each action. The results showed that either observing or making the gestures was a more effective strategy than only reading about the actions.

Regarding the observation of manipulations, Springer (2014) reported an investigation with 78 undergraduates (51% females) learning about molecular representations in organic chemistry. In the experimental condition, groups of students were shown manipulations of 3D computer chemical models, made by the instructor. In the control condition, these manipulations and the models were not shown. While the overall results from the molecular structures learning test indicated superior performance by the experimental group over the control group, this study does not conclusively establish whether the observed manipulations or the models themselves were the primary influencing variables.

In all, these examples tend to support the beneficial effects of observing gestures and manipulations on learning, which could be explained by the specialized processor avenue. Note that both the specialized processor and the offloaded cognition avenues involve recruiting additional processing power from human movements (see Table 1). The difference between both avenues is that the specialized processor avenue entails research about students observing others (e.g., teachers, instructors) making the movements, whereas the offloaded cognition avenue concerns students making the movements themselves. As both specialized processor and offloading cognition avenues share the recruitment of additional processing for working memory, both share one influencing feature, as described next.

Influencing Features

The specialized processor avenue (as the offloading cognition avenue) predicts larger effects when working memory is more taxed. This means that the specialized processor avenue can help predict larger effects by observing human movements under two contributing factors: (a) more demanding tasks or (b) for lower-working memory capacity individuals or students with less expertise.

An experiment with 62 university students learning English words in video lectures (Pi et al., 2022) investigated the first factor where more demanding tasks show larger gesturing effects. An experimental condition showing many words on the board (more visually demanding) was compared to a condition with few words (less demanding). As predicted, pointing gestures made by the instructor were only beneficial in the more demanding format and were ineffective in the less demanding format.

The other contributing factor, namely, participants’ cognitive capacity, was investigated by Brucker et al. (2015) in a study with 45 university students (69% females) learning about fish movements. The participants studied through dynamic visualizations complemented with instructors making gestures. Visuospatial working memory capacity was indirectly measured with a test of spatial ability, the Paper Folding Test (see applications in Castro-Alonso & Atit, 2019). Results showed that when students watched gestures that matched the fish motions, only low visuospatial learners benefited. Observing the same gestures did not help high visuospatial students.

In addition to the demands in working memory, another influencing feature of the specialized processor avenue is the degree of duplicated or redundant information conveyed by this processor. When using the visuospatial and/or auditory processing streams to learn, the observation of gestures or object manipulations will be more effective when these human movements provide nonredundant information (i.e., original information that is not already conveyed in the visuospatial or auditory streams) to the specialized processor (cf. Kalyuga & Sweller, 2022). This prediction for complementary information between the processors was tested by Austin et al. (2018) in an experiment with 125 adults (50% females) recalling a route given with auditory descriptions and gestures. Crucially, there were gesture conditions providing redundant information (e.g., the description indicated “turn right” while the gesture made a “turn right” movement) and gesture conditions providing nonredundant information (e.g., the description indicated “turn” but only the gesture showed that it was “turn right”). As expected, observing the gestures was more effective when these hand movements provided complementary rather than redundant information, as this complementariness allows referential connections (Clark & Paivio, 1991) or mental conversions (Mayer, 2022) in working memory.

Conclusion

Different working memory processors can be activated by observing visuospatial learning information and the information of human movements (e.g., gestures and object manipulations). This relatively separate processing in working memory allows an overall larger capacity for the learning task if gestures or manipulations are shown. Also, a richer representation of an instructional topic can be formed in working memory by integrating information from the visuospatial, auditory, and human movement processors. An influential feature of this avenue, analogously to offloaded cognition, is the degree of demands on working memory resources. Consequently, observing gesturing or manipulations is predicted to be more effective when attempting tasks that demand more working memory processing, or when students have less working memory capacity. Another influencing feature is the redundancy of the information conveyed via gestures and manipulations, in relation to the information in the visuospatial and auditory processors. Hence, redundant or duplicated information between the streams will be less effective than complementary information that allows referential connections between the processors.

Signaling

Introduction and Explanation

The signaling principle in multimedia learning promotes adding visual signals or cues to highlight the most important information to aid understanding (see Castro-Alonso et al., 2019a; Castro-Alonso et al., 2021a; de Koning & Jarodzka, 2017; de Koning et al., 2009; van Gog, 2022). These signals (e.g., arrows, underlining, colors, frames), when added to the learning information, indicate to students when and where to focus their attention, so they do not attend less relevant information (Castro-Alonso et al., 2019a; van Gog, 2022).

As signaling helps learning by not wasting attentional and cognitive resources in processing less relevant information, it allows more working memory resources to be devoted to learning. This means that there is some overlap between the specialized processor and the signaling avenues in that both increase effective working memory capacity for learning. An important difference is that research on these avenues focuses on different influencing features (see below).

When the signals are provided with cues from the human body, such as pointing fingers (e.g., Pi et al., 2017), gesturing hands (e.g., Cook et al., 2017; Pi et al., 2019), or observing eyes (e.g., Chacón-Candia et al., 2023), there is the extra factor of embodied signaling. There is evidence (e.g., de Koning & Jarodzka, 2017; de Koning & Tabbers, 2013; Pi et al., 2017) suggesting that embodied signaling (e.g., with human limbs) could be more effective than general signaling with non-human limbs (e.g., arrows). For example, a gesturing finger or hand could be a better signaling device than a manipulative object (e.g., wooden pointer, plastic frame; cf. Davoli & Brockmole, 2012). This has connections to the research that has supported placing hands near visual stimuli to increase attention and memory for those stimuli (see Brockmole et al., 2013). Both the research on general signaling and embodied signaling constitute the signaling research avenue, which can help explain some of the beneficial effects of observing gestures and human movements on learning.

Evidence

Meta-analyses (e.g., Alpizar et al., 2020; Schneider et al., 2018) have reported that signaling is effective for increasing retention and transfer test scores. Much of this research has been conducted using multimedia pedagogical agents, because under these conditions controlling gesturing versus non-gesturing agents tends to be easier than with human agents (cf. Cook et al., 2017). Multimedia pedagogical agents are on-screen characters designed to facilitate multimedia learning (e.g., Castro-Alonso et al., 2021b; Moreno et al., 2001), especially when they can signal and gesture (see Fiorella & Mayer, 2022). For example, Wang et al. (2018) studied 109 undergraduates (88% women) learning about synaptic transmission through a multimedia module that included either a pointing or a non-pointing pedagogical agent. Retention and transfer test results showed that the pointing agent was more effective than the non-pointing agent (see an extension in Li et al., 2023).

Similarly, Cook et al. (2017) investigated children learning math equivalence through computer instructions aided by multimedia pedagogical agents. In this controlled study, the effects of pedagogical agents who pointed and made other gestures were compared to those of non-gesturing agents. All other variables, such as eye gaze and body movements (except for gesturing), were equal in both types of agents. Results showed that the group of students learning from the gesturing agents presented faster learning and higher transfer of learning than the students with the non-gesturing agents. Analogously, Fiorella and Mayer (2016a) reported that students who observed the hands drawing physics instructional diagrams (hands plus diagrams being made) achieved higher understanding scores than students who only observed the diagrams being drawn without the hand depictions (only diagrams being made).

Influencing Feature

The influencing feature of signaling is its focus area. Signaling is more effective when the signaling devices (e.g., arrows, human limbs) are specific rather than nonspecific. In other words, pointing or framing to a specific learning place is more effective than signaling to broader visual areas. For example, in a study with 123 undergraduates (85% females) learning about neuron synapsis, Li et al. (2019) observed that multimedia pedagogical agents doing specific pointing to a key area of the depiction were more effective than those doing nonspecific pointing to all the depiction, which were also more effective than those showing non-pointing general gestures.

Craig et al. (2015) studied 77 adult participants learning about the formation of lightning from a multimedia module assisted with one of three types of speaking pedagogical agents: a specific signaling agent who pointed to the relevant areas when she mentioned them, a nonspecific signaling agent who pointed to a broader area, and a non-gesturing agent (control). The retention and concept-based questions showed that both signaling agents outperformed the non-gesturing agent. Although both gesturing agents were similarly effective, the effect sizes were larger for the agent showing specific signaling.

Relatedly, there is research showing detrimental effects of adding static or moving images of hands near the learning depictions (e.g., Castro-Alonso et al., 2015, 2018; Schroeder & Traxler, 2017). As discussed by the authors of these studies, the negative effect of adding these hands could be attributed to those depictions being redundant or distracting (cf. Kalyuga & Sweller, 2022) instead of providing specific signaling to the essential learning areas.

Conclusion

Signaling indicates to students the most important or relevant learning parts to focus their attention on. This means that it can increase the working memory resources available for learning. There is accumulating evidence that observing signaling that is done with the hands and other human limbs may produce better signaling effects than other customary instructional signals, such as arrows and frames. Hence, signaling (both general and embodied) is another research avenue that provides evidence to explain the beneficial effects on learning by observing human movements, gesturing, and manipulations. An influencing feature of this avenue is the focus area of the signaling, because specific signaling, which focuses on the key parts of the learning elements, can be more effective than nonspecific signaling.

Making or Observing Human Movements

The last research avenue of this article is unique in that it can contribute to the beneficial effects on learning from either making or observing human movements. We commented above that observation and imitation mechanisms produce similar brain responses to making and observing human movements (see Cracco et al., 2018; Rizzolatti & Craighero, 2004). However, the evidence tends to show that it is more beneficial for learning and other cognitive processes to make rather than observe human movements (e.g., Bokosmaty et al., 2017; Duijzer et al., 2019; Jang et al., 2017; Kontra et al., 2015; Schwartz & Plass, 2014; Stull et al., 2018; see Dargue et al., 2019). Despite this difference, either making or observing human movements provides beneficial effects on learning that can be explained, at least partially, by this last research avenue.

Social Cognition

Introduction and Explanation

Making or observing human movements can be effective learning strategies because they are examples of primary abilities, which have evolved in order for the human species to survive and thrive (see Geary, 2002, 2008, 2012). Primary abilities (e.g., making and observing hand movements) have evolved over a long period of time and are fairly straightforward to learn. In contrast, secondary abilities (e.g., producing and reading texts) have a much shorter human history and require more effort to learn. As such, Paas and Sweller (2012) promote using primary knowledge to support the learning of secondary knowledge, as the former requires far less conscious effort (see also Bjorklund, 2022; Geary, 2002, 2008, 2012).

Social cognition (see Kampis & Southgate, 2020) comprises the primary abilities of folk psychology (e.g., Geary, 2002), which originally developed from nonverbal communication with other members of the tribe by understanding their social cues (e.g., gestures, facial expression, eye gaze). Social cognition is a fundamental primary ability that is believed to have prepared the way for humans to communicate, manipulate tools, and evolve their full contemporary processing potential (cf. Jolly, 1966). These evolutionary trends affect our present life as human primates, where social interactions and nonverbal communication are key to learn and develop our intellect (see Bjorklund, 2022; Dunbar, 2009; Laland & Seed, 2021). Hence, social cognition researchers can help explaining the beneficial effects on learning of making or observing gesturing and manipulations.

Evidence

As social cognition entails nonverbal communication, it affects both the sender and the receiver of this communication. In other words, makers of human movements (senders) and observers of human movements (receivers) have been investigated in the social cognition avenue. Here, we consider the evidence of making and observing gestures and object manipulations, and how communicating by these human movements benefits learning and instruction.

As reviewed by Alibali (2005) for the case of spatial information, making gestures while talking about spatial features can be helpful for both senders and receivers of this type of communication. In a more general review, which also included spatial and math tasks, Goldin-Meadow and Wagner (2005) concluded that a clearer message could be conveyed by learners who made gestures while talking. Thus, learners who gesture can enhance communication with their learning peers and teachers, which benefits learning (see also Dargue et al., 2019; Ping & Goldin-Meadow, 2010). At the receiving end, observing gestures can also benefit learning or memory performance (e.g., Sánchez-Borges & Álvarez, 2023; see also Dargue et al., 2019).

These results support both the broader social cognition research and also the more specific social agency theory (see Moreno et al., 2001; see also Castro-Alonso et al., 2021b; Fiorella & Mayer, 2022). Social agency theorists predict that gesturing and other human movements can predispose learners to engage in social interchanges with the instructional agents (e.g., teachers) and boost their learning. Various studies (e.g., Li et al., 2019; Mayer & DaPra, 2012; Pi et al., 2019; Wang et al., 2018) have supported this prediction by revealing that agents showing more gesturing and facial expression are more effective than agents showing less of this nonverbal communication.

Analogous effects to gesturing can be expected with object manipulations, as making or observing these manipulations could also be effective aids for communication and learning. For example, as described by the material-engagement theory (see Malafouris, 2020), the use of objects between individuals (make and observe object manipulations) can affect their communication and create new meanings for them. These effects could be oriented to enhance learning.

Influencing Feature

The effects investigated in this avenue are influenced by the degree of communication that can be conveyed via the human movements (e.g., gesturing and object manipulation). When the hand movements that are made or observed convey a meaningful message, they are more effective than human motion that is rather meaningless (see Dargue et al., 2019). Note that making a hand movement that sends a meaningful message (social cognition avenue) does not always imply making a relevant movement for learning (physical activity avenue). For example, making a thumb up gesture conveys a clear message, but it is less relevant for learning about the sky above than pointing upward with the index finger.

An example supporting the importance of meaningful hand movements is provided in two experiments (N = 229, 80% females) by Beege et al. (2020). They compared the effectiveness of observing a male lecturer making either meaningful or rather meaningless gestures in instructional videos. Meaningful movements were conveyed as pointing gestures that signaled areas of the learning depictions, while these depictions were explained verbally. Less meaningful gesturing involved making beat or rhythmic movement that were more arbitrary and not visually associated with the presentations. Both experiments showed that retention test scores were higher in the groups of students learning from meaningful pointing gesturing, compared to beat gesturing. Similarly, in a study with 51 adult participants (69% females) learning about cell division through instructional videos, Kang et al. (2013) observed that the instructor was more effective when making meaningful (representational) gestures than beat gestures.

Conclusion

Social cognition serves as another significant research avenue contributing to the understanding of how embodiment, particularly the creation or observation of human movements, enhances learning and instructional effectiveness. Social cognition is a type of primary ability that has equipped human beings to effortlessly communicate by making and observing hand movements (e.g., gestures, manipulations). An influencing feature of this avenue is the degree of communication that can be conveyed via making or observing these movements, which allows the prediction that gestures and manipulations will be more instructionally effective when they are more meaningful.

Discussion

The research on embodied cognition has provided an understanding that cognitive processes can be helped by the whole body and the environment (Glenberg, 1997; Glenberg et al., 2013). An important implication for education is that when the brain, the rest of the body, and the environment act together, such as when making or observing human hand movements, learning and instruction can be enhanced (e.g., see Fiorella, 2022). We have delineated six research trajectories (avenues) that elucidate why the act of making and observing human movements—primarily gestures and object manipulations executed by hands—constitutes effective strategies for both learning and instruction (see also Castro-Alonso et al., 2019b), and have identified influential features that should be considered within this realm of research.

Three research avenues (physical activity, generative learning, and offloaded cognition) investigate the learning benefits associated with making human movements (e.g., gesturing and object manipulation). Two research avenues (specialized processor and signaling) investigate cases of students observing human movements being made by others (e.g., instructors, teachers, and fellow students). The last research avenue, social cognition (rooted in folk psychology), can explain the benefits of making or observing human movements. We note again that these avenues can be complementary and not mutually exclusive.

A further contribution of this article is that it includes research that has not been traditionally linked with embodied cognition. For example, our generative learning avenue considers the interactive, constructive, active, and passive framework (ICAP; Chi and Wylie, 2014), which is more commonly employed by multimedia rather than embodiment researchers. Also in generative learning, embodied cognition is not typically connected with emotions (Plass & Kalyuga, 2019) or the IKEA effect (Norton et al., 2012). Likewise, the signaling avenue includes examples from general signaling, which is not related with embodied cognition (de Koning et al., 2009). By including these research areas in our avenues, we hope to broaden the scope of embodied cognition.

Nine Instructional Implications

Making Human Movements

The first set of four instructional implications from this article focuses on encouraging learners to make movements during learning. Studies in different learning domains have shown the effectiveness of making gestures (e.g., Macken & Ginns, 2014; Mierowsky et al., 2020; Zhang et al., 2023) and object manipulations (e.g., Forbes-Lorman et al., 2016; Kontra et al., 2015). Therefore, the first instructional implication is that instructors and teachers should encourage students to produce gesturing and object manipulation, to boost their learning and problem solving achievements.

However, not all human movements are equally effective. A second instructional implication (physical activity) is that the movements that students are instructed to make should be relevant for and integrated in time with the learning tasks (Mavilidi et al., 2018; Skulmowski & Rey, 2018). For example, Mavilidi et al. (2020) observed that relevant movements (e.g., dancing when learning a foreign word for “dance”) were more effective than irrelevant movements (e.g., moving without dancing).

A third instructional implication (generative learning) is that the movements that students are guided to make should be related to students’ experience, personal input, and emotions, rather than nonpersonal movements. As reported by Mason et al. (2013), allowing personal decisions when making human movements (e.g., drawing in a personal style) can be more effective than making nonpersonal movements (e.g., drawing by copying).

A fourth implication for instructors and teachers (offloaded cognition) is that they should allow students to make gestures and manipulations particularly when the tasks are cognitively challenging, such as tasks involving heavy visuospatial processing (e.g., Eielts et al., 2020; Marstaller & Burianová, 2013). When the tasks are easier, or when learners have larger availability of working memory capacity (e.g., more expert learners), making gestures or object manipulations could be redundant (cf. Kalyuga & Sweller, 2022) and of little value.

Observing Human Movements

Another set of four instructional implications (fifth to eighth) focuses on the observation of movements. The fifth instructional implication is that instructors and teachers should show human movements in their instructional activities and tasks. This could allow students to benefit from observing movements such as gesturing (e.g., Pi et al., 2019) and manipulations (e.g., de Koning et al., 2019).

The observation of these movements is more effective under specific circumstances, leading to the sixth instructional implication. This implication (specialized processor), linked to the fourth (offloaded cognition), is that observing gestures and manipulations made by others should be particularly important when the learning tasks are too demanding on working memory or visuospatial processing (e.g., Brucker et al., 2015), or when the available working memory capacity of the learners is limited (e.g., less expert learners). In contrast, non-demanding tasks could be less aided by observing human movements made by teachers, instructors, or peers.

The seventh implication (specialized processor) is that observing gestures and object manipulations should be more effective when these human movements convey information that is not already given in visuospatial or auditory forms, as this could be redundant and counterproductive for learning (Kalyuga & Sweller, 2022). Consequently, teachers and instructors should provide new information with their gestures and manipulations; information that is complementary to what they are already conveying with visualizations and auditory descriptions (e.g., Austin et al., 2018).

The eighth implication (signaling) is that the human movements showed by instructors, pedagogical agents, or peers should aim to signal specific areas of learning information. In that sense, observing movements that do not signal to the most important learning elements or areas could be less effective (e.g., Craig et al., 2015; Li et al., 2019, 2023).

Making or Observing Human Movements

The ninth and last implication (social cognition) can be applied to both making and observing human movements. Teachers and instructors should allow students to make and observe human movements that foster communication between students and between teachers and students (e.g., Alibali, 2005; Beege et al., 2020; Goldin-Meadow & Wagner, 2005; Kang et al., 2013; Mayer & DaPra, 2012). This implication, which can be related to the second implication (physical activity), means that instructors should promote meaningful and communicative use of gesturing and manipulation in learning activities.

Future Research Directions

Although much of the research presented here supports the effectiveness of human body movement, there could also be cases of effective static human body parts. For example, static images of fingers could be used for counting (offloaded cognition) and still images of hands could be used by instructors for pointing to relevant learning information (signaling; e.g., Castro-Alonso et al., 2018; de Koning & Tabbers, 2013). Although static body parts may be less effective than moving body parts (e.g., Castro-Alonso et al., 2015), future research could investigate how to make static human depictions more effective for learning and instruction.

Related to making human movements and learning, future research should consider the inclusion of moderating variables. For example, producing more vigorous human movements (physical activity) could be compared to making less energetic movements (generative learning and offloaded cognition). Analogously (cf. Mason et al., 2013), making more personally relevant movements (generative learning) could be compared to producing movements that are less personal (offloaded cognition).

Similarly, future investigations could tackle the most influential variables when observing human movements, for example, from the specialized processor and signaling research literature. As such, conditions that tax the student’s cognitive resources (e.g., Pi et al., 2022) should be compared to less demanding conditions in which the specialized processor avenue is expected to be less effective. Also, the learning benefits of fostering interconnections between (a) pictorial (visuospatial processor) or narrated (auditory processor), and (b) hand information (human movement processor) could be investigated (cf. Austin et al., 2018). In addition, signaling research should investigate the most promising signaling movements made with the body under different learning conditions, as well as under different working memory demands.

Also, both the avenues of offloaded cognition and specialized processor entail recruiting extra processing capacity from gestures and manipulations. The difference is that offloaded cognition concerns research about making these movements, whereas specialized processor concerns research about observing these movements. Future research could tackle both avenues by investigating, for example, how the available working memory capacity is affected differently by either making or observing these human movements.

Lastly, the social cognition research avenue could test variables that affect the degree of meaning conveyed in gesturing and object manipulation. For example, cultural differences (see Wang, 2021) that affect understanding of gesturing communication could be investigated. Also, the degree of communication that can be achieved by making and observing the manipulation of different objects could be studied.

Conclusion

The research on embodied cognition can provide invaluable insights into more effective learning and instruction. We described six research avenues here, not mutually exclusive, that can help explain and predict the beneficial educational effects of one aspect of embodiment, namely, making or observing human movements (e.g., gesturing and object manipulation). By reviewing these research avenues and emphasizing their influential features, we hope to inspire future researchers on embodied cognition for learning, problem solving, and instruction.