Introduction

The papers included in this Special Issue present a collection of research aiming to untangle the visual perception processes that occur in educational context, and specifically, in naturalistic learning and teaching contexts. The studies combine various research methods in innovative ways to study student engagement in a classroom setting during instruction (Goldberg et al., this issue; Rosengrant et al., this issue), teacher expertise and “professional vision” in instructional context (Seidel et al., this issue; Wyss et al., this issue), and the characteristics of student-teacher interaction (Haataja et al., this issue; Pouta et al., this issue). A common denominator of the studies is that they utilize eye tracking, which offers very time-sensitive information about visual perception and attention. Some studies have used a specialized equipment to track the precise eye movements (i.e., fixations and saccades) of a participant, whereas others have relied on sophisticated automated image processing to extract the general direction of eye gaze of individual students from a video recording of a whole classroom.

The basic idea behind eye tracking is that eye gaze is assumed to reflect the contents of thought; for example, during reading, it is typically assumed that the reader is actively processing the piece of text their eyes are fixated on and that longer and more frequent gazes are taken as indicators of either more complex or difficult processing required to comprehend the information (Just and Carpenter 1980). Tracking the direction and duration of eye gaze thus reveals how cognitive processes unfold over time. The studies reported in this Special Issue utilized eye tracking to examine different aspects of learning and teaching in classroom contexts. Rosengrant et al. (this issue) used eye tracking to study fluctuation in student engagement and vigilance during instruction by identifying time periods of lectures when students were gazing their cell phones or peers instead of following the lecturer or gazing at their notes. Goldberg et al. (this issue) utilized machine vision and machine learning algorithms to analyze a set of different variables (eye gaze, head pose, and facial expression) extracted from a classroom video to predict student engagement. Seidel et al. (this issue) tracked the eye movements of expert and novice teachers while they evaluated student profiles, and Wyss et al. (this issue) examined whether teachers notice “critical incidents” on a classroom video. Pouta et al. (this issue) studied teacher-student interaction during classroom math instruction by tracking the eye movements of teachers. Haataja et al. (this issue) tracked the eye movements of both students and teachers during instruction to study the relationship between teachers’ interpersonal behavior and direct eye contact.

The papers included in this Special Issue demonstrate that the eye tracking methodology that has long been used mainly in laboratory settings (for reviews, see Rayner 1998, 2009) can also be utilized to study behavior in complex naturalistic environments. However, as evidenced by the present studies, there are many challenges. The first issue is related to the operationalization of the theoretical constructs. The papers in this Special Issue use eye tracking to study student engagement, teacher expertise, and student-teacher interaction. However, it is not always clear how the observed eye movement patterns reflect these theoretical concepts and the underlying psychological processes. The second issue is related to analyzing eye movement data. The main advantage of the methodology is that it can provide detailed information about the time-course of processing. However, it seems that this potential of the methodology has not always been utilized to the fullest. Moreover, many of the papers included in this issue report t tests, ANOVAs, or simple correlations, which do not take into account the typical features of eye movement data. In the following, I will discuss and reflect these questions in more detail and make connections to research in other domains. Finally, I will briefly comment on future directions for eye tracking research in education that seems to stem from the current set of papers.

Eye Gaze as a Measure of Student Engagement

Two of the papers in this Special Issue used eye tracking to track student engagement during classroom teaching (Goldberg et al., this issue; Rosengrant et al., this issue). Engagement as a term can be used in various meanings, but in the present studies, it refers to the degree with which attentional resources of an individual are concentrated on the learning task, as indicated by eye movements (see Miller 2015). The study by Rosengrant et al. (this issue) used eye movements to categorize students’ classroom behavior during a lecture: if a student’s eye gaze was fixated on the lecturer or the class notes, student was coded as being engaged with teaching. If, however, eye gaze traveled somewhere else (e.g., the student was looking at their cell phone), the student was categorized as being off-task. By looking at the distribution of on-task and off-task instances during a lecture, Rosengrant et al. gained insights into how engagement fluctuated across time. These results resonate well with some recent eye tracking studies that have examined momentary and sustained changes in cognitive engagement during reading (Ballenghein et al. 2020; Kaakinen et al. 2018), which suggest that in addition to momentary changes in attentional resources being directed to text information, there also are sustained changes that evolve during the task, which have been mostly ignored in previous studies. For example, these studies showed that attention was directed toward task-relevant text information whenever it was encountered in text, indicating task-induced momentary changes in attentional engagement (see also Kaakinen and Hyönä 2014). Moreover, the results indicated that engagement with relevant information remained constant across the whole text, whereas engagement with irrelevant text information gradually decreased across the reading task, implying that there also are more sustained changes in reader engagement. As indicated by the results of Rosengrant and colleagues, looking at fluctuation of engagement as it evolves across time instead of averaging, for example, on-task and off-task instances across the whole duration of the task provides novel insights into students’ behavior and learning.

Even though eye gaze data can provide information about engagement (and disengagement) of attentional resources in learning-related contexts (see e.g., Faber et al. 2018), combining eye movements with other measures might provide useful practical and theoretical information. Goldberg et al. used machine vision and a machine learning approach to analyze students’ engagement from classroom videos. In addition to eye gaze, they included also measures of head pose, facial expressions, and the synchronization with the surrounding peers as features in the models. The results were promising: the results of the automated analysis correlated with manual coding of the videos, and predicted self-reported cognitive engagement and involvement. This multimethod approach to study engagement is in line with recent studies combining eye movements with simultaneous recordings of head and effector movements (Ballenghein et al. 2020; Kaakinen et al. 2018) to study reader engagement. The results of these recent studies suggest that eye and head (and effector) movements reveal different dimensions of engagement: whereas eye movements are mostly sensitive to momentary changes, head and effector movement reflect also sustained changes in engagement. As these studies demonstrate, combining eye tracking data with other process measures can provide extremely useful information about student engagement.

Eye Gaze as an Indicator of Teacher Expertise

Papers in this Special Issue used eye tracking to study teacher expertise, or “professional vision” of teachers by presenting videos depicting classroom events and student behavior (Seidel et al., this issue; Wyss et al., this issue) or by examining teacher’s eye gaze behavior in an actual instructional context (Pouta et al., this issue). Eye tracking has long been utilized to study expert behavior in different domains, such as chess and radiology (Gegenfurtner et al. 2011; Reingold and Sheridan 2011), and recently also in teaching (Beach and McConnel 2019) and education (Jarodzka et al. 2017). Reingold and Sheridan (2011) suggested that eye movements are sensitive to two particular aspects of expert behavior: superior domain-related perceptual skills and tacit domain-relevant knowledge. First of all, superior domain-related perceptual skills means that experts are able to perceive larger patterns and form more holistic impressions than novices. Second, the tacit knowledge that the experts have is reflected in their eye movement behavior but is not necessarily available for conscious thought. An example is that whereas novices may not fixate an abnormality nor detect it in a radiological image, expert radiologists may fixate it but fail to report detecting it.

The studies included in this Special Issue provide some evidence about the nature of the domain-specific perceptual skills of expert teachers. Seidel at al. (this issue) report that expert teachers show a different pattern of eye movements while evaluating students’ profiles on a video than novices: they tend to make more fixations on uninterested, underestimated, and struggling students. Unfortunately the statistical evidence is rather weak. The results by Wyss et al. (this issue) also demonstrate expertise effects in eye movements. When asked to freely view a classroom video scene in which a “critical incident” happens, expert teachers made more fixations on the student who is disrupted during the critical incident than novices. A more detailed analysis revealed that the six expert teachers who reported noticing the critical incident made more fixations and spent longer time looking at the student during the incident than participants who did not report noticing the incident. These results suggest that (at least some) expert teachers seem to show domain-specific perceptual skills and be able to attend to relevant information. Moreover, these experts are consciously aware of what they are looking at.

In contrast, Pouta and colleagues (this issue) did not find expertise effects in eye gaze behavior. Pouta and colleagues utilized mobile eye tracking to study how expert and novice teachers interact with students in a math instruction setting. Both groups seemed to engage in “checking in” on the student—as indicated by gaze shifts from the materials to the student and then back to the materials—during instruction episodes. However, experts used more supporting instructions whereas novices used mainly non-supporting instructions, resulting in experts making more “checking in” during supporting and novices during non-supporting instructions.

The studies included in this Special Issue have looked at the perceptual skills of experienced teachers in very different contexts, and thus, it is not surprising that the results do not form a unified view of what “expert teacher vision” is. When the task is to evaluate student profiles (Seidel et al., this issue), different types of things are likely to be relevant than when the task is to freely observe the classroom situation (Wyss et al., this issue), or when the teacher is interacting with the student (Pouta et al., this issue). It’s been known since the seminal study by Yarbus (1967) that eye gaze patterns are heavily influenced by task specifications, and thus, the manifestations of expert teachers’ improved perceptual skills in a classroom setting are likely to be task and context specific (Gegenfurtner et al. 2011). In some contexts, expertise might mean having a greater perceptual span and being able to perceive larger patterns (e.g., Reingold and Sheridan 2011), whereas in other contexts, experts might focus on smaller or fewer relevant areas in the visual environment (e.g., Wolff et al. 2016). Thus, it is important to consider the specific task and the context in which it is performed when examining expertise effects in eye movements (see Jarodzka et al. 2017).

The issue of context specificity is addressed in the theoretical paper by Wolff et al. (this issue), which introduces “classroom management scripts” to conceptualize the differences between expert and novice teachers. Their model provides a view of how the knowledge structures and cognitive processes of an expert teacher might differ from that of a novice, resulting in different situational awareness and decisions of how to act in a given classroom situation. Experts are assumed to engage in automatic scanning of the visual environment for relevant cues, and to be driven by top-down control of eye movements guided by their knowledge. These assumptions are in line with the idea that sometimes experts demonstrate a greater perceptual span, whereas in some situations, experts focus only on relevant areas in the visual environment. According to the model, it depends on the classroom event representation the teacher has formed in the situation.

Observing the eye movement patterns in different naturalistic task contexts provides information about the typical eye gaze patterns of experts and novices. In contrast, experimental studies would allow testing hypotheses about the role of specific cognitive processes in expert behavior. An interesting approach would be to follow the suggestion by Reingold and Sheridan (2011) and examine the effective perceptual span with a gaze-contingent display paradigm, in which participants can only see a restricted part of the screen around the point of gaze in high resolution. By manipulating the size of the window, one could test how much information expert and novice teachers pick up on one glance. If expert teachers have a wider perceptual span and can perceive larger patterns, they should be more disturbed in the restricted vision condition than novices. On the other hand, if in some contexts experts actually focus on smaller or fewer relevant areas (Wolff et al. 2016), they should not be disturbed by the restricted viewing window whereas novices might show beneficial effects.

Eye Movements as an Indicator of Teacher-Student Interaction

An interesting application of the eye tracking method presented in this Special Issue is to use it to study teacher-student interaction (Haataja et al., this issue; Pouta et al., this issue). In the study by Pouta et al. (this issue), eye tracking was used to examine how experienced teachers and teacher students interact with students during actual teaching situations. By coding the instruction episodes on the basis of verbal reports given during cued retrospective reporting, they found that novices were more likely to use non-supporting instructions. They then analyzed teachers’ eye movements during different types of instruction episodes and found that experienced teachers made more gaze shifts from the teaching materials to the student’s face and then back to the materials during supporting instruction episodes. Also novice teachers made these gaze shifts, but as they were mostly doing non-supporting instruction, the gaze shifts happened during those episodes.

Haataja et al. (this issue) recorded both the teachers and students’ eye gaze to study teacher-student interaction. They rated teacher’s interpersonal behavior (i.e., communion and agency) from the video recordings of the classroom, and then analyzed the level of communion and agency demonstrated by the teacher as a function of type of gaze contact: teacher gazing at the student, student gazing at the teacher, teacher-initiated eye contact, and student-initiated eye contact. The results suggested that eye gaze behavior of both the teacher and the students is associated with teacher’s interpersonal behavior.

As demonstrated by these studies, eye tracking can potentially reveal important aspects of teacher-student interaction. However, something that neither of the papers directly considered is the teacher’s sensitivity to student emotion. Emotions evolving in the learning situation are important in what kind of processing and knowledge exploration strategies the student is likely to utilize (e.g., Loderer et al. 2018; Muis et al. 2015; Vogl et al. 2020), and one could argue that an expert teacher should be sensitive to changes in students’ emotions. Indeed, the results by Seidel et al. (this issue) suggest that expert teachers are sensitive to the students’ emotional and motivational status and pay more attention to students who might need adaptive pedagogical action. Facial expressions are informative about student emotions and engagement (e.g., Harley et al. 2015; see also Goldberg et al., this issue), and glances at the student’s face and eye area could be indicators of teacher’s attempts at checking on whether the student is expressing, for example, anxiety, boredom, concentration, confusion, or interest.

Analyzing Eye Tracking Data

One clear benefit of the eye tracking methodology is that it provides temporally very detailed information about the time-course of processing. There are many ways to analyze eye movement data, and the computation of eye movement measures and the following data analysis should naturally be guided by the research question. The researcher has to make several decisions about how the raw data is preprocessed before the actual statistical analyses can be conducted, and one should be aware of how they might influence the outcome of the study (Orquin and Holmqvist 2018).

Global analysis of eye movement characteristics (e.g., fixation duration and saccade amplitude) is useful for describing the general effects of a task or differences between participant groups, but often, a more detailed analysis based on areas of interest (AOI) is needed. In the AOI-based analysis of eye movement data, AOIs appropriate to the research question and the task are first identified and defined. The decision of what is an appropriate target to be defined as an AOI should be guided by theory and support hypothesis testing. For example, in reading research, AOIs often represent single words, as the theories tested make predictions about processing of single words embedded in a sentence context. However, sometimes it is more useful to examine processing of phrases or full sentences embedded in textual context, and to define AOIs that represent target sentences (see Hyönä et al. 2003). The next step is then to calculate different eye movement measures, such as the number and duration of fixations or gazes made on AOIs. In reading studies, different measures are usually reported to describe processing of words or sentences during the initial encounter of information (i.e., first-pass) and later reprocessing (see, e.g., Kaakinen 2017). In many of the studies included in this Special Issue, only total fixation time or gaze duration (and/or number of fixations) is reported. However, total fixation time or gaze duration sums up all fixations or gazes made to a particular area of interest, and it provides very limited information about the time-course of processing (see also Orquin and Holmqvist 2018). Measures such as time to fixate an area of interest for the first-time, time spent during the first-pass, number and duration of reinspections of an AOI, and number and duration of look-backs initiated from an AOI would provide much richer view of the processes that occur during the task (see also Hyönä et al. 2003).

In addition to computing different eye movement measures that are based on averaging or summing up fixations made on AOIs (e.g., first-pass vs. later reinspections), another approach would be to examine the probability of fixating AOIs as a function of time—an analysis strategy typically applied in the visual world paradigm (see e.g., Huettig et al. 2011). The idea is to examine how eye gaze is directed to and shifted between different AOIs during the task, providing information not only on which AOIs attract most gazes overall but also about the changes in “preferred” AOIs across time and the sequence in which AOIs are gazed at (e.g., Mudrick et al. 2019).

Sometimes answering the research question might require analyzing the (dis)similarities in the sequence of eye movements within or between individuals (see Anderson et al. 2015; Le Meur and Baccino 2013). The idea of scanpath analysis is to take into account both the spatial and temporal aspects of eye movement behavior in evaluating how similar or dissimilar the scanning patterns are, for example, within or between different participant groups. Different methods for performing scanpath comparisons provide slightly different types of information, and the method should be chosen on the basis of the research question and hypotheses (Anderson et al. 2015; Le Meur and Baccino 2013). For example, in a recent study by McIntyre and Foulsham (2018), scanpath analyses were conducted to examine differences between expert and novice teachers while they were engaged in classroom instruction. The results showed that teachers’ scanpaths were more similar within than between expertise groups and that expert teachers tend to refer back to the students with focused gaze during both talking and questioning. This kind of analysis strategy could provide valuable information about expert vision in different educational contexts.

The statistical analyses should naturally aim to answer the research questions and to be able to address the hypotheses. They should also be able to handle the typical characteristics of eye movement data. Often the “traditional” ANOVA or t test approach, which is prevalent in the papers of this Special Issue, is not optimal. Depending on the exact nature of the dataset, the appropriate methods should be able to deal with issues like correlational structure of data, missing observations, and unbalanced data. For example, when observing eye movements of teachers during viewing of a classroom video, there probably are individual differences in scanning behavior. The eye fixation or gaze times observed for an individual are not independent, and the statistical analysis method applied should take into account the correlational structure of the data. Moreover, information is lost when averages are computed across observations (as is done with ANOVAs), and missing data might introduce bias to the means. In reading research, linear mixed models have become the gold standard for eye movement data analyses, as they allow modeling the random variance between participants and items (e.g., Baayen et al. 2008; Judd et al. 2017). They can handle occasional missing observations, which typically occur in eye tracking data due to blinks or other unexpected participant behavior. Generalized linear mixed models can be used for non-continuous measures: fixation count data can be modeled with methods using Poisson distribution, and categorical outcomes such as probability of fixation or refixation on an AOI can be analyzed with mixed logit models (Jaeger 2008), which also allow modeling changes in fixation probabilities across time (see Barr 2008). However, use of these statistical methods requires enough statistical power, which is influenced by both the number of participants and the number of “items” or observations per participant (Brysbaert and Stevens 2018). In many of the studies included in this Special Issue, the power seems to be relatively low, which is likely to impact the generalizability and replicability of the results. It is understandable that recruiting participants (schools, teachers, students) to classroom studies is not easy, but sufficient sample size is necessary for obtaining generalizable results (see also Orquin and Holmqvist 2018). As power is influenced by the number of observations per participant, long test sessions may help in gaining enough data, but the sufficient sample size depends also on the research design and effect size (Brysbaert and Stevens 2018).

Machine learning and data mining approaches are gaining popularity in educational research (see Koedinger et al. 2015), and as demonstrated by Goldberg et al. (this issue), machine learning algorithms can be used to extract useful information from datasets that might be hard to get a grip on with traditional data analytic methods. As stated by D’Mello et al. (2020), “Such models are particularly useful when theoretical understanding is insufficient, when the data are rife with nonlinearities and interactivity, and when researchers aspire to take advantage of ‘big data’.” For example, Lou et al. (2017) used support vector machines to identify literacy skills from readers’ eye movement data, which consisted of several measures reflecting the time-course of processing of different segments of text. The models could predict students’ literacy skills with high accuracy. Similarly, D’Mello et al. (2020) used Random Forests models to predict text comprehension scores on the basis of six global eye movement features. When applied in correct way, this “big data” approach to eye movement data can inform theory construction in addition to producing potential practical educational applications.

An important thing to keep in mind is that even though different eye movement measures provide detailed information about the time-course of processing, gaze duration or fixation duration is just another reaction time measure. As expected on the basis of the eye-mind hypothesis (Just and Carpenter 1980), longer eye fixation time on some piece of information is thought to reflect a more complex cognitive process. However, longer eye fixation time might reflect either processing difficulty or extra processing effort needed to successfully comprehend the information. Whether longer eye fixation time is a “good” or a “bad” thing can only be defined by using measures that reflect the quality of the processing or its outcome (see Ferreira and Yang 2019). In the studies included in this Special Issue, this limitation has been addressed by using different measures to validate the eye movement data: self-reported engagement and learning outcomes (Goldberg et al., this issue), correct recognition of student profiles (Seidel et al., this issue), quality of instruction (Pouta et al., this issue), and retrospective think-alouds (Pouta et al., this issue; Wyss et al., this issue).

Conclusions and Future Directions

The studies in this Special Issue raised some critical questions related to (1) operationalization of the theoretical constructs of interest and (2) most efficient and appropriate ways to analyze eye movement data. These two issues are closely intertwined. Even though eye movement recordings provide rich information about the moment-to-moment processes occurring in the different contexts covered in this Special Issue, the critical question is what does an increased fixation time or number of fixations tell us about the underlying processes of students’ engagement, teachers’ expertise, or quality of student-teacher interaction? Most of the current studies offer insights into these questions by utilizing a multi-method approach, in which eye movement recordings are combined with various outcome measures. As valuable as these studies are in setting the future directions of research in this area, there are some shortcomings with respect to taking advantage of the full potential of eye movement recordings in revealing the time-course of processing, in statistical analysis methods applied to the data, and statistical power of the studies. Future research would benefit from careful consideration of these issues.

The current studies lay out some general trends and point to potential future directions for eye movement research on classroom behavior. First, it seems that the focus is widening from analyzing the eye movement behavior of an individual to examining eye movements of interacting pairs and groups. As demonstrated by the papers of this special issue, teacher-student interaction is an elementary aspect of classroom teaching (Haataja et al., this issue; Pouta et al., this issue), and individual students in a classroom are influenced by their surrounding peers (Goldberg et al., this issue; Rosengrant et al., this issue). Analyzing the simultaneous eye movement behavior of pairs or groups of students will pose novel challenges to data analysis, but once these problems are solved, there is huge potential for understanding the role of social interactions in learning and education.

Another important future direction is to advance the understanding of the role of emotions in learning and teaching. The study by Goldberg et al. (this issue) included facial expression analysis as a part of their measurement battery of student engagement, and the studies by Haataja et al. (this issue) and Pouta et al. (this issue) implied that emotions might be underlying factors in student-teacher interaction patterns. Seidel et al. (this issue) showed that expert teachers are sensitive to the motivational and emotional status of the student. Emotions emerging in the learning situation influence the knowledge exploration and processing strategies (e.g., Loderer et al. 2018; Muis et al. 2015; Vogl et al. 2020) and understanding how different emotions evolve and change in learning contexts is a key to understanding the processes underpinning learning. As recently stated by Art Graesser (2019): “Emotions are the experiential glue of learning environments in the 21st century”. To date, very little is still known about the relationship between different emotions and eye movements specifically in learning contexts, and there is a great need for empirical work that explores this further. As eye movement recordings provide detailed information about time-course of processing, they have great potential for advancing our understanding of the dynamics of the emotional and cognitive processes underpinning learning.

The third future trend that emerges is the combination of eye movement recordings with other process measures. For example Goldberg et al. (this issue) combined eye gaze tracking with analyses of facial expressions and head pose to study students’ engagement. Combinations of eye movements with simultaneous head, body and effector movement recordings, psychophysiological measures, and measurements of brain activity can provide new perspectives on the emotional and cognitive processes underlying learning. For example, some recent studies have used simultaneous recordings of eye movements and head and effector movements to study readers’ engagement (Ballenghein et al. 2020; Kaakinen et al. 2018), and in a study by Mason et al. (2020), eye movements were combined with a psychophysiological measure of skin conductance level to examine the role of arousal in comprehending controversial information. However, while combinations of different measures potentially provide more information about the underlying processes than eye movement data alone, one should remember that there are challenges in collecting, analyzing, and interpreting multichannel data (see Azevedo and Gašević 2019).

In summary, the studies included in this Special Issue used eye gaze tracking to examine different aspects of learning and teaching in a classroom: students’ engagement, teachers’ expertise, and teacher-student interaction. The studies demonstrate the utility of eye tracking to study student and teacher behavior in naturalistic classroom contexts. Even though there are some critical issues related to the methodology that should be addressed in the future, the present studies form an important starting point for empirical work using eye movements to study visual perception in the classroom.