Combining gaze and AI planning for online human intention recognition

https://doi.org/10.1016/j.artint.2020.103275Get rights and content

Abstract

Intention recognition is the process of using behavioural cues, such as deliberative actions, eye gaze, and gestures, to infer an agent's goals or future behaviour. In artificial intelligence, one approach for intention recognition is to use a model of possible behaviour to rate intentions as more likely if they are a better ‘fit’ to actions observed so far. In this paper, we draw from literature linking gaze and visual attention, and we propose a novel model of online human intention recognition that combines gaze and model-based AI planning to build probability distributions over a set of possible intentions. In human-behavioural experiments (n=40) involving a multi-player board game, we demonstrate that adding gaze-based priors to model-based intention recognition improved the accuracy of intention recognition by 22% (p<0.05), determined those intentions ≈90 seconds earlier (p<0.05), and at no additional computational cost. We also demonstrate that, when evaluated in the presence of semi-rational or deceptive gaze behaviours, the proposed model is significantly more accurate (9% improvement) (p<0.05) compared to a model-based or gaze only approaches. Our results indicate that the proposed model could be used to design novel human-agent interactions in cases when we are unsure whether a person is honest, deceitful, or semi-rational.

Introduction

Autonomous software agents and robots are becoming part of modern society [1], and therefore, the development of autonomous agents that can function as productive members of human-agent teams becomes ever important. To achieve effective interactions with humans, artificial agents must reason about the goals, intentions, and beliefs of other agents [2]. By observing the history of other agents' actions—such as physical actions or verbal utterances—agents can build a basis for recognising intentions and predicting future actions, which in turn shapes their interactions with these agents.

Recent work has successfully used automated planning [3] for model-based intention recognition of intelligent agents. These approaches rely on sequences of already performed physical (ontic) actions [4], [5] for making their plans and project possible future actions to predict intentions. Ontic actions are actions that modify the state of the world. In contrast to the popularity of ontic actions for building intention recognition models, nonverbal signals such as gaze have been relatively under-explored. Gaze is a crucial signal in human nonverbal communication and, as such, offers promising directions for enhancing interactions in human-agent teams [6] and improving intention recognition [7], [8], [9]. With decreasing cost and increasing robustness, eye trackers are entering the consumer market. Eye movements play an important role in planning and executing actions and intentions [10]—both in the short- and long-term. Eye tracking is used to understand decision making [11], improve learning and training systems [12], [13], [14], [15], in human-robot interactions [6], [16], and games [17]. Agents can monitor humans' eye movements to anticipate future actions [7], [9] and to determine their level of engagement [18], [19]. Researchers now have a better understanding of how to link eye-tracking and attention [11], [20], [13], [21], [22], [23]. These investigations suggest that there is huge potential for intelligent agents to use gaze implicitly to derive people's intentions and adapt the interaction accordingly.

However, there are limitations with existing approaches using gaze for intention recognition. First, machine learning dominates existing approaches [24], [25], [26]. Though successful, these models require sufficient data to train such models — something that we do not have in many of our applications. Second, the models learnt by machine learning algorithms limit the prospects of explaining the reasons behind the inferences due to their opaqueness. Finally, most models have been evaluated in contexts requiring prediction of a single intention, for example, [7], [24], and these intentions are usually short-term or proximal intentions.

In recent work [9], we propose a novel intention recognition approach that incorporated visual behaviour into model-based intention recognition and demonstrate how it substantially improved the recognition performance. In the current paper, we improve this model in three ways. First, we improve the model to include a more robust and realistic model of visual attention and different forms of foveal vision. Second, we extend the model to handle distal intentions, rather than just proximal intentions. Third, we extend the evaluation to show that our model is robust in that it overcomes semi-rational gaze behaviours due to non-task related gaze data, including deceptive data.

Fig. 1 shows a simple example of our model. In this example, based on the board game Ticket to Ride described in Section 4, the player is trying to navigate a path (e.g. direct a vehicle) between Santa Fe and one of the other cities in the graph. The intention recognition problem is to determine the destination city. On the left, we see that the route from Santa Fe to Denver has already been traversed. We argue that this implies that the probability of the final destination being Oklahoma is smaller than that of any other of the other nodes: a rational navigator would more likely traverse the path from Santa Fe to Oklahoma directly. Existing model-based approaches would rate this as such. However, from this single traversed route, we are unable to distinguish the probability of the outer nodes (Seattle, Calgary, Winnipeg, etc.)—they are all the same distance from Denver. However, consider the example on the right, in which we know that the person has been looking at the route from Helena to Seattle. We argue that now, this represents a potential future action and that Seattle is a more likely final destination than Calgary, Winnipeg. We argue further that Calgary, are still more likely than Oklahoma, which fits neither our observed navigation actions nor our observed gaze actions.

We evaluated our model through a human-behavioural study. The study involved 40 players playing a digital multi-agent board game while their gaze data was being recorded. We compared a model that used gaze data only (GazeOnly), a model-based approach using only in-game ontic actions (OnticAction-Only), and a model enhanced with gaze-based priors (Gaze+ OnticAction), and we evaluated the three models with the presence of uncertain (deceptive) gaze. We find the proposed Gaze+OnticAction model outperforms the other two at recognising both proximal and distal intentions and maintains this performance irrespective of gaze behaviour being natural or deceptive. We also demonstrate the potential of the enhanced gaze model to help understand the different forms of foveal vision [27]. The results provide evidence that our combined approach using gaze and ontic action for intention recognition of human behaviour is robust enough that it could be used to improve interaction design between agents and humans in cases with uncertain gaze data. Further, extending to cases in which a person is honest (stable gaze data), deceitful (highly uncertain gaze behaviour), or somewhere in between (semi-irrational gaze behaviour).

The key contributions of this work are as follows:

  • 1.

    A computational model of gaze for intention recognition inspired by existing research on eye-tracking and visual attention.

  • 2.

    An empirical evaluation of the model demonstrating the success of the model at recognising both distal and proximal intentions.

  • 3.

    An empirical evaluation of the model in the presence of uncertain gaze behaviour.

  • 4.

    A model appropriate for situations where designers do not have sufficient data to train intention recognition models and require transparent inferences with better chances of explaining the intention recognition related decisions.

Broadly, our experimental results show that gaze actions are intentional, and therefore, an indicator of humans task-related intentions. Harnessing this ability can help improve human-agent interactions by assisting agents to reason about the human counterparts quicker and more accurately, giving the agents the ability to improve its proactiveness. We provide a theoretical model for combining gaze within AI-planning based intention recognition approach, and a computational model of gaze that can be used to investigate links between eye-tracking and visual attention.

Section snippets

Background

In this section, we start with the basics of eye-tracking and eye movements and their relationship to visual attention. We highlight existing works in human-computer interaction and related fields that use eye-tracking. Finally, we describe related contributions in model-based intention recognition.

Model

In this section, we detail our system that consists of two independent components constituting the input of our intention recognition algorithm: (1) the gaze model proposed in this paper that processes the gaze information and uses the concepts of fixation count and fixation duration on area-of-interest (AOI) to determine the probabilities of different intentions; and (2) the plan-based model, which takes an action model and an observed sequence of actions, and determines the probability of

Study

In this section, we describe the study used to evaluate the proposed models. We employed the use of a multi-player game called Ticket to Ride2 (TTR) to illicit human intention in a controlled environment, as used in our previous work [47], [52]. Fig. 2 shows a screenshot of the game and its corresponding target areas. Fig. 2 shows two circles around Chicago city. These circles represent the vision outside central foveal vision implemented by fv(di)[0

Results

In this section, we first discuss the performance of our combined system (Gaze+OnticAction) when compared with the other single input systems in the natural condition. We then evaluate how the performance differs when compared when subjected to deceptive gaze behaviours elicited in deceptive condition. We tested the effects of the recognition approach on the dependent variables with a Welch's t-test. We further tested the data for normality using Shapiro-Wilks tests and did not find any

Discussion

Knowing the intentions of a partner (or competitor) can be advantageous in social settings, and gaze is a natural source of information for this purpose. In this paper, we have proposed an intention recognition approach that combined gaze with model-based intention recognition. Specifically, we have:

  • 1.

    Proposed a computational model of gaze for intention recognition inspired by existing research on eye-tracking and visual attention.

  • 2.

    Empirically validated the models demonstrating the success of the

Conclusion

In this paper, we extended our existing model [9] that combines gaze and model-based online intention recognition to infer intentions of humans by proposing an enhanced gaze model. In addition to previous work, we empirically validated the model to predict distal intentions and in the presence of semi-rational gaze behaviours.

Human-behavioural experiments demonstrated that gaze-based priors significantly improved the accuracy and quickness (horizon) of the inferences when compared with

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by two Australian Government Research Training Program Scholarship, the Microsoft Research Centre for Social NUI, and Defence Science and Technology Group CERA Grant 02.

References (72)

  • H. Geffner et al.

    A Concise Introduction to Models and Methods for Automated Planning: Synthesis Lectures on Artificial Intelligence and Machine Learning

    (2013)
  • M. Ramírez et al.

    Probabilistic plan recognition using off-the-shelf classical planners

  • R. Pereira et al.

    Landmark-Based Heuristics for Goal Recognition

    (2017)
  • H. Admoni et al.

    Social eye gaze in human-robot interaction: a review

    J. Hum. Robot Interact.

    (2017)
  • O. Dermy et al.

    Multi-modal intention prediction with probabilistic movement primitives

  • R. Singh et al.

    Combining planning with gaze for online human intention recognition

  • T. Foulsham

    Eye movements and their functions in everyday tasks

    Eye

    (2015)
  • J.L. Rosch et al.

    A review of eye-tracking applications as tools for training

    Cogn. Technol. Work

    (2013)
  • D. Das et al.

    Supporting human–robot interaction based on the level of visual focus of attention

    IEEE Trans. Human-Mach. Syst.

    (2015)
  • E. Velloso et al.

    The emergence of eyeplay: a survey of eye interaction in games

  • N. Kirchner et al.

    Nonverbal robot-group interaction using an imitated gaze cue

  • H. Admoni et al.

    Are you looking at me?: perception of robot attention is mediated by gaze type and group size

  • S. Brams et al.

    Does effective gaze behavior lead to enhanced performance in a complex error-detection cockpit task?

    PLoS ONE

    (2018)
  • A. Korbach et al.

    Differentiating different types of cognitive load: a comparison of different measures

    Educ. Psychol. Rev.

    (2018)
  • A.T. Duchowski et al.

    The index of pupillary activity: measuring cognitive load vis-à-vis task difficulty with pupil oscillation

  • M. Meißner et al.

    The promise of eye-tracking methodology in organizational research: a taxonomy, review, and future avenues

    Organ. Res. Methods

    (2018)
  • C.-M. Huang et al.

    Using gaze patterns to predict task intent in collaboration

    Front. Psychol.

    (2015)
  • R. Bednarik et al.

    A Computational Approach for Prediction of Problem-Solving Behavior Using Support Vector Machines and Eye-Tracking Data

    (2013)
  • R. Ishii et al.

    Effectiveness of Gaze-Based Engagement Estimation in Conversational Agents

    (2013)
  • D.A. Robinson

    The oculomotor control system: a review

    Proc. IEEE

    (1968)
  • A.T. Duchowski

    Eye Tracking Methodology

    (2017)
  • D.D. Salvucci et al.

    Identifying fixations and saccades in eye-tracking protocols

  • M. Corbetta et al.

    Control of goal-directed and stimulus-driven attention in the brain

    Nat. Rev. Neurosci.

    (2002)
  • Y. Abdelrahman et al.

    Cognitive heat: exploring the usage of thermal imaging to unobtrusively estimate cognitive load

    IMWUT

    (2017)
  • Y. Abdelrahman et al.

    Classifying attention types with thermal imaging and eye tracking

    Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

    (2019)
  • A. Esteves et al.

    Orbits: gaze interaction for smart watches using smooth pursuit eye movements

  • Cited by (33)

    • STABC-IR: An air target intention recognition method based on bidirectional gated recurrent unit and conditional random field with space-time attention mechanism

      2023, Chinese Journal of Aeronautics
      Citation Excerpt :

      Therefore, it is necessary to design intelligent intention recognition models that can take advantage of the efficient processing power of computers and the pattern reasoning and cognitive experiences of humans to achieve accurate recognition and reasoning of target intention in real time. Intention recognition is widely used in fields such as human behavior prediction,2–5 vehicle lane change intention6–9 and question and answer systems,10–12 and related technical methods are relatively mature. Currently, intention recognition methods are divided into two main categories: model-based intention recognition methods and data-based intention recognition methods.

    • Conflict Avoidance in Social Navigation—a Survey

      2024, ACM Transactions on Human-Robot Interaction
    • Logics and collaboration

      2023, Logic Journal of the IGPL
    View all citing articles on Scopus
    View full text