Combining gaze and AI planning for online human intention recognition
Introduction
Autonomous software agents and robots are becoming part of modern society [1], and therefore, the development of autonomous agents that can function as productive members of human-agent teams becomes ever important. To achieve effective interactions with humans, artificial agents must reason about the goals, intentions, and beliefs of other agents [2]. By observing the history of other agents' actions—such as physical actions or verbal utterances—agents can build a basis for recognising intentions and predicting future actions, which in turn shapes their interactions with these agents.
Recent work has successfully used automated planning [3] for model-based intention recognition of intelligent agents. These approaches rely on sequences of already performed physical (ontic) actions [4], [5] for making their plans and project possible future actions to predict intentions. Ontic actions are actions that modify the state of the world. In contrast to the popularity of ontic actions for building intention recognition models, nonverbal signals such as gaze have been relatively under-explored. Gaze is a crucial signal in human nonverbal communication and, as such, offers promising directions for enhancing interactions in human-agent teams [6] and improving intention recognition [7], [8], [9]. With decreasing cost and increasing robustness, eye trackers are entering the consumer market. Eye movements play an important role in planning and executing actions and intentions [10]—both in the short- and long-term. Eye tracking is used to understand decision making [11], improve learning and training systems [12], [13], [14], [15], in human-robot interactions [6], [16], and games [17]. Agents can monitor humans' eye movements to anticipate future actions [7], [9] and to determine their level of engagement [18], [19]. Researchers now have a better understanding of how to link eye-tracking and attention [11], [20], [13], [21], [22], [23]. These investigations suggest that there is huge potential for intelligent agents to use gaze implicitly to derive people's intentions and adapt the interaction accordingly.
However, there are limitations with existing approaches using gaze for intention recognition. First, machine learning dominates existing approaches [24], [25], [26]. Though successful, these models require sufficient data to train such models — something that we do not have in many of our applications. Second, the models learnt by machine learning algorithms limit the prospects of explaining the reasons behind the inferences due to their opaqueness. Finally, most models have been evaluated in contexts requiring prediction of a single intention, for example, [7], [24], and these intentions are usually short-term or proximal intentions.
In recent work [9], we propose a novel intention recognition approach that incorporated visual behaviour into model-based intention recognition and demonstrate how it substantially improved the recognition performance. In the current paper, we improve this model in three ways. First, we improve the model to include a more robust and realistic model of visual attention and different forms of foveal vision. Second, we extend the model to handle distal intentions, rather than just proximal intentions. Third, we extend the evaluation to show that our model is robust in that it overcomes semi-rational gaze behaviours due to non-task related gaze data, including deceptive data.
Fig. 1 shows a simple example of our model. In this example, based on the board game Ticket to Ride described in Section 4, the player is trying to navigate a path (e.g. direct a vehicle) between Santa Fe and one of the other cities in the graph. The intention recognition problem is to determine the destination city. On the left, we see that the route from Santa Fe to Denver has already been traversed. We argue that this implies that the probability of the final destination being Oklahoma is smaller than that of any other of the other nodes: a rational navigator would more likely traverse the path from Santa Fe to Oklahoma directly. Existing model-based approaches would rate this as such. However, from this single traversed route, we are unable to distinguish the probability of the outer nodes (Seattle, Calgary, Winnipeg, etc.)—they are all the same distance from Denver. However, consider the example on the right, in which we know that the person has been looking at the route from Helena to Seattle. We argue that now, this represents a potential future action and that Seattle is a more likely final destination than Calgary, Winnipeg. We argue further that Calgary, are still more likely than Oklahoma, which fits neither our observed navigation actions nor our observed gaze actions.
We evaluated our model through a human-behavioural study. The study involved 40 players playing a digital multi-agent board game while their gaze data was being recorded. We compared a model that used gaze data only (GazeOnly), a model-based approach using only in-game ontic actions (OnticAction-Only), and a model enhanced with gaze-based priors (Gaze+ OnticAction), and we evaluated the three models with the presence of uncertain (deceptive) gaze. We find the proposed Gaze+OnticAction model outperforms the other two at recognising both proximal and distal intentions and maintains this performance irrespective of gaze behaviour being natural or deceptive. We also demonstrate the potential of the enhanced gaze model to help understand the different forms of foveal vision [27]. The results provide evidence that our combined approach using gaze and ontic action for intention recognition of human behaviour is robust enough that it could be used to improve interaction design between agents and humans in cases with uncertain gaze data. Further, extending to cases in which a person is honest (stable gaze data), deceitful (highly uncertain gaze behaviour), or somewhere in between (semi-irrational gaze behaviour).
The key contributions of this work are as follows:
- 1.
A computational model of gaze for intention recognition inspired by existing research on eye-tracking and visual attention.
- 2.
An empirical evaluation of the model demonstrating the success of the model at recognising both distal and proximal intentions.
- 3.
An empirical evaluation of the model in the presence of uncertain gaze behaviour.
- 4.
A model appropriate for situations where designers do not have sufficient data to train intention recognition models and require transparent inferences with better chances of explaining the intention recognition related decisions.
Broadly, our experimental results show that gaze actions are intentional, and therefore, an indicator of humans task-related intentions. Harnessing this ability can help improve human-agent interactions by assisting agents to reason about the human counterparts quicker and more accurately, giving the agents the ability to improve its proactiveness. We provide a theoretical model for combining gaze within AI-planning based intention recognition approach, and a computational model of gaze that can be used to investigate links between eye-tracking and visual attention.
Section snippets
Background
In this section, we start with the basics of eye-tracking and eye movements and their relationship to visual attention. We highlight existing works in human-computer interaction and related fields that use eye-tracking. Finally, we describe related contributions in model-based intention recognition.
Model
In this section, we detail our system that consists of two independent components constituting the input of our intention recognition algorithm: (1) the gaze model proposed in this paper that processes the gaze information and uses the concepts of fixation count and fixation duration on area-of-interest (AOI) to determine the probabilities of different intentions; and (2) the plan-based model, which takes an action model and an observed sequence of actions, and determines the probability of
Study
In this section, we describe the study used to evaluate the proposed models. We employed the use of a multi-player game called Ticket to Ride2 (TTR) to illicit human intention in a controlled environment, as used in our previous work [47], [52]. Fig. 2 shows a screenshot of the game and its corresponding target areas. Fig. 2 shows two circles around Chicago city. These circles represent the vision outside central foveal vision implemented by
Results
In this section, we first discuss the performance of our combined system (Gaze+OnticAction) when compared with the other single input systems in the natural condition. We then evaluate how the performance differs when compared when subjected to deceptive gaze behaviours elicited in deceptive condition. We tested the effects of the recognition approach on the dependent variables with a Welch's t-test. We further tested the data for normality using Shapiro-Wilks tests and did not find any
Discussion
Knowing the intentions of a partner (or competitor) can be advantageous in social settings, and gaze is a natural source of information for this purpose. In this paper, we have proposed an intention recognition approach that combined gaze with model-based intention recognition. Specifically, we have:
- 1.
Proposed a computational model of gaze for intention recognition inspired by existing research on eye-tracking and visual attention.
- 2.
Empirically validated the models demonstrating the success of the
Conclusion
In this paper, we extended our existing model [9] that combines gaze and model-based online intention recognition to infer intentions of humans by proposing an enhanced gaze model. In addition to previous work, we empirically validated the model to predict distal intentions and in the presence of semi-rational gaze behaviours.
Human-behavioural experiments demonstrated that gaze-based priors significantly improved the accuracy and quickness (horizon) of the inferences when compared with
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by two Australian Government Research Training Program Scholarship, the Microsoft Research Centre for Social NUI, and Defence Science and Technology Group CERA Grant 02.
References (72)
- et al.
Autonomous agents modelling other agents: a comprehensive survey and open problems
Artif. Intell.
(2018) - et al.
Anticipatory robot control for efficient human-robot collaboration
- et al.
Attention and choice: a review on eye movements in decision making
Acta Psychol.
(2013) - et al.
A review of using eye-tracking technology in exploring learning from 2000 to 2012
Educ. Res. Rev.
(2013) - et al.
Eye tracking for skills assessment and training: a systematic review
J. Surg. Res.
(2014) - et al.
A systematic review of eye tracking research on multimedia learning
Comput. Educ.
(2018) Gaze-based interaction: a 30 year retrospective
Comput. Graph.
(2018)- et al.
Modeling human plan recognition using Bayesian theory of mind
The phenomenology of action: a conceptual framework
Cognition
(2008)- et al.
Human–agent teaming for multirobot control: a review of human factors issues
IEEE Trans. Human-Mach. Syst.
(2014)
A Concise Introduction to Models and Methods for Automated Planning: Synthesis Lectures on Artificial Intelligence and Machine Learning
Probabilistic plan recognition using off-the-shelf classical planners
Landmark-Based Heuristics for Goal Recognition
Social eye gaze in human-robot interaction: a review
J. Hum. Robot Interact.
Multi-modal intention prediction with probabilistic movement primitives
Combining planning with gaze for online human intention recognition
Eye movements and their functions in everyday tasks
Eye
A review of eye-tracking applications as tools for training
Cogn. Technol. Work
Supporting human–robot interaction based on the level of visual focus of attention
IEEE Trans. Human-Mach. Syst.
The emergence of eyeplay: a survey of eye interaction in games
Nonverbal robot-group interaction using an imitated gaze cue
Are you looking at me?: perception of robot attention is mediated by gaze type and group size
Does effective gaze behavior lead to enhanced performance in a complex error-detection cockpit task?
PLoS ONE
Differentiating different types of cognitive load: a comparison of different measures
Educ. Psychol. Rev.
The index of pupillary activity: measuring cognitive load vis-à-vis task difficulty with pupil oscillation
The promise of eye-tracking methodology in organizational research: a taxonomy, review, and future avenues
Organ. Res. Methods
Using gaze patterns to predict task intent in collaboration
Front. Psychol.
A Computational Approach for Prediction of Problem-Solving Behavior Using Support Vector Machines and Eye-Tracking Data
Effectiveness of Gaze-Based Engagement Estimation in Conversational Agents
The oculomotor control system: a review
Proc. IEEE
Eye Tracking Methodology
Identifying fixations and saccades in eye-tracking protocols
Control of goal-directed and stimulus-driven attention in the brain
Nat. Rev. Neurosci.
Cognitive heat: exploring the usage of thermal imaging to unobtrusively estimate cognitive load
IMWUT
Classifying attention types with thermal imaging and eye tracking
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.
Orbits: gaze interaction for smart watches using smooth pursuit eye movements
Cited by (33)
Learning multi-granular worker intentions from incomplete visual observations for worker-robot collaboration in construction
2024, Automation in ConstructionSTABC-IR: An air target intention recognition method based on bidirectional gated recurrent unit and conditional random field with space-time attention mechanism
2023, Chinese Journal of AeronauticsCitation Excerpt :Therefore, it is necessary to design intelligent intention recognition models that can take advantage of the efficient processing power of computers and the pattern reasoning and cognitive experiences of humans to achieve accurate recognition and reasoning of target intention in real time. Intention recognition is widely used in fields such as human behavior prediction,2–5 vehicle lane change intention6–9 and question and answer systems,10–12 and related technical methods are relatively mature. Currently, intention recognition methods are divided into two main categories: model-based intention recognition methods and data-based intention recognition methods.
Conflict Avoidance in Social Navigation—a Survey
2024, ACM Transactions on Human-Robot InteractionLogics and collaboration
2023, Logic Journal of the IGPLEvaluating the Usability of a Gaze-Adaptive Approach for Identifying and Comparing Raster Values between Multilayers
2023, ISPRS International Journal of Geo-InformationCross-View Human Intention Recognition for Human-Robot Collaboration
2023, IEEE Wireless Communications