Combining gaze and AI planning for online human intention recognition

doi:10.1016/j.artint.2020.103275

Artificial Intelligence

Volume 284, July 2020, 103275

https://doi.org/10.1016/j.artint.2020.103275 Get rights and content

Abstract

Intention recognition is the process of using behavioural cues, such as deliberative actions, eye gaze, and gestures, to infer an agent's goals or future behaviour. In artificial intelligence, one approach for intention recognition is to use a model of possible behaviour to rate intentions as more likely if they are a better ‘fit’ to actions observed so far. In this paper, we draw from literature linking gaze and visual attention, and we propose a novel model of online human intention recognition that combines gaze and model-based AI planning to build probability distributions over a set of possible intentions. In human-behavioural experiments ( $n = 40$ ) involving a multi-player board game, we demonstrate that adding gaze-based priors to model-based intention recognition improved the accuracy of intention recognition by 22% ( $p < 0.05$ ), determined those intentions ≈90 seconds earlier ( $p < 0.05$ ), and at no additional computational cost. We also demonstrate that, when evaluated in the presence of semi-rational or deceptive gaze behaviours, the proposed model is significantly more accurate (9% improvement) ( $p < 0.05$ ) compared to a model-based or gaze only approaches. Our results indicate that the proposed model could be used to design novel human-agent interactions in cases when we are unsure whether a person is honest, deceitful, or semi-rational.

Introduction

Autonomous software agents and robots are becoming part of modern society [1], and therefore, the development of autonomous agents that can function as productive members of human-agent teams becomes ever important. To achieve effective interactions with humans, artificial agents must reason about the goals, intentions, and beliefs of other agents [2]. By observing the history of other agents' actions—such as physical actions or verbal utterances—agents can build a basis for recognising intentions and predicting future actions, which in turn shapes their interactions with these agents.

Recent work has successfully used automated planning [3] for model-based intention recognition of intelligent agents. These approaches rely on sequences of already performed physical (ontic) actions [4], [5] for making their plans and project possible future actions to predict intentions. Ontic actions are actions that modify the state of the world. In contrast to the popularity of ontic actions for building intention recognition models, nonverbal signals such as gaze have been relatively under-explored. Gaze is a crucial signal in human nonverbal communication and, as such, offers promising directions for enhancing interactions in human-agent teams [6] and improving intention recognition [7], [8], [9]. With decreasing cost and increasing robustness, eye trackers are entering the consumer market. Eye movements play an important role in planning and executing actions and intentions [10]—both in the short- and long-term. Eye tracking is used to understand decision making [11], improve learning and training systems [12], [13], [14], [15], in human-robot interactions [6], [16], and games [17]. Agents can monitor humans' eye movements to anticipate future actions [7], [9] and to determine their level of engagement [18], [19]. Researchers now have a better understanding of how to link eye-tracking and attention [11], [20], [13], [21], [22], [23]. These investigations suggest that there is huge potential for intelligent agents to use gaze implicitly to derive people's intentions and adapt the interaction accordingly.

However, there are limitations with existing approaches using gaze for intention recognition. First, machine learning dominates existing approaches [24], [25], [26]. Though successful, these models require sufficient data to train such models — something that we do not have in many of our applications. Second, the models learnt by machine learning algorithms limit the prospects of explaining the reasons behind the inferences due to their opaqueness. Finally, most models have been evaluated in contexts requiring prediction of a single intention, for example, [7], [24], and these intentions are usually short-term or proximal intentions.

In recent work [9], we propose a novel intention recognition approach that incorporated visual behaviour into model-based intention recognition and demonstrate how it substantially improved the recognition performance. In the current paper, we improve this model in three ways. First, we improve the model to include a more robust and realistic model of visual attention and different forms of foveal vision. Second, we extend the model to handle distal intentions, rather than just proximal intentions. Third, we extend the evaluation to show that our model is robust in that it overcomes semi-rational gaze behaviours due to non-task related gaze data, including deceptive data.

Fig. 1 shows a simple example of our model. In this example, based on the board game Ticket to Ride described in Section 4, the player is trying to navigate a path (e.g. direct a vehicle) between Santa Fe and one of the other cities in the graph. The intention recognition problem is to determine the destination city. On the left, we see that the route from Santa Fe to Denver has already been traversed. We argue that this implies that the probability of the final destination being Oklahoma is smaller than that of any other of the other nodes: a rational navigator would more likely traverse the path from Santa Fe to Oklahoma directly. Existing model-based approaches would rate this as such. However, from this single traversed route, we are unable to distinguish the probability of the outer nodes (Seattle, Calgary, Winnipeg, etc.)—they are all the same distance from Denver. However, consider the example on the right, in which we know that the person has been looking at the route from Helena to Seattle. We argue that now, this represents a potential future action and that Seattle is a more likely final destination than Calgary, Winnipeg. We argue further that Calgary, are still more likely than Oklahoma, which fits neither our observed navigation actions nor our observed gaze actions.

We evaluated our model through a human-behavioural study. The study involved 40 players playing a digital multi-agent board game while their gaze data was being recorded. We compared a model that used gaze data only (GazeOnly), a model-based approach using only in-game ontic actions (OnticAction-Only), and a model enhanced with gaze-based priors (Gaze+ OnticAction), and we evaluated the three models with the presence of uncertain (deceptive) gaze. We find the proposed Gaze+OnticAction model outperforms the other two at recognising both proximal and distal intentions and maintains this performance irrespective of gaze behaviour being natural or deceptive. We also demonstrate the potential of the enhanced gaze model to help understand the different forms of foveal vision [27]. The results provide evidence that our combined approach using gaze and ontic action for intention recognition of human behaviour is robust enough that it could be used to improve interaction design between agents and humans in cases with uncertain gaze data. Further, extending to cases in which a person is honest (stable gaze data), deceitful (highly uncertain gaze behaviour), or somewhere in between (semi-irrational gaze behaviour).

The key contributions of this work are as follows:

1.
A computational model of gaze for intention recognition inspired by existing research on eye-tracking and visual attention.
2.
An empirical evaluation of the model demonstrating the success of the model at recognising both distal and proximal intentions.
3.
An empirical evaluation of the model in the presence of uncertain gaze behaviour.
4.
A model appropriate for situations where designers do not have sufficient data to train intention recognition models and require transparent inferences with better chances of explaining the intention recognition related decisions.

Broadly, our experimental results show that gaze actions are intentional, and therefore, an indicator of humans task-related intentions. Harnessing this ability can help improve human-agent interactions by assisting agents to reason about the human counterparts quicker and more accurately, giving the agents the ability to improve its proactiveness. We provide a theoretical model for combining gaze within AI-planning based intention recognition approach, and a computational model of gaze that can be used to investigate links between eye-tracking and visual attention.

Section snippets

Background

In this section, we start with the basics of eye-tracking and eye movements and their relationship to visual attention. We highlight existing works in human-computer interaction and related fields that use eye-tracking. Finally, we describe related contributions in model-based intention recognition.

Model

In this section, we detail our system that consists of two independent components constituting the input of our intention recognition algorithm: (1) the gaze model proposed in this paper that processes the gaze information and uses the concepts of fixation count and fixation duration on area-of-interest (AOI) to determine the probabilities of different intentions; and (2) the plan-based model, which takes an action model and an observed sequence of actions, and determines the probability of

Study

In this section, we describe the study used to evaluate the proposed models. We employed the use of a multi-player game called Ticket to Ride² (TTR) to illicit human intention in a controlled environment, as used in our previous work [47], [52]. Fig. 2 shows a screenshot of the game and its corresponding target areas. Fig. 2 shows two circles around Chicago city. These circles represent the vision outside central foveal vision implemented by $f_{v} (d_{i}) \in [0$

Results

In this section, we first discuss the performance of our combined system (Gaze+OnticAction) when compared with the other single input systems in the natural condition. We then evaluate how the performance differs when compared when subjected to deceptive gaze behaviours elicited in deceptive condition. We tested the effects of the recognition approach on the dependent variables with a Welch's t-test. We further tested the data for normality using Shapiro-Wilks tests and did not find any

Discussion

Knowing the intentions of a partner (or competitor) can be advantageous in social settings, and gaze is a natural source of information for this purpose. In this paper, we have proposed an intention recognition approach that combined gaze with model-based intention recognition. Specifically, we have:

1.
Proposed a computational model of gaze for intention recognition inspired by existing research on eye-tracking and visual attention.
2.
Empirically validated the models demonstrating the success of the

Conclusion

In this paper, we extended our existing model [9] that combines gaze and model-based online intention recognition to infer intentions of humans by proposing an enhanced gaze model. In addition to previous work, we empirically validated the model to predict distal intentions and in the presence of semi-rational gaze behaviours.

Human-behavioural experiments demonstrated that gaze-based priors significantly improved the accuracy and quickness (horizon) of the inferences when compared with

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by two Australian Government Research Training Program Scholarship, the Microsoft Research Centre for Social NUI, and Defence Science and Technology Group CERA Grant 02.

References (72)

S.V. Albrecht et al.
Autonomous agents modelling other agents: a comprehensive survey and open problems
Artif. Intell.
(2018)
C.-M. Huang et al.
Anticipatory robot control for efficient human-robot collaboration
J.L. Orquin et al.
Attention and choice: a review on eye movements in decision making
Acta Psychol.
(2013)
M.-L. Lai et al.
A review of using eye-tracking technology in exploring learning from 2000 to 2012
Educ. Res. Rev.
(2013)
T. Tien et al.
Eye tracking for skills assessment and training: a systematic review
J. Surg. Res.
(2014)
E. Alemdag et al.
A systematic review of eye tracking research on multimedia learning
Comput. Educ.
(2018)
A.T. Duchowski
Gaze-based interaction: a 30 year retrospective
Comput. Graph.
(2018)
C.L. Baker et al.
Modeling human plan recognition using Bayesian theory of mind
E. Pacherie
The phenomenology of action: a conceptual framework
Cognition
(2008)
J.Y.C. Chen et al.
Human–agent teaming for multirobot control: a review of human factors issues
IEEE Trans. Human-Mach. Syst.
(2014)

H. Geffner et al.

A Concise Introduction to Models and Methods for Automated Planning: Synthesis Lectures on Artificial Intelligence and Machine Learning

(2013)

M. Ramírez et al.

Probabilistic plan recognition using off-the-shelf classical planners

R. Pereira et al.

Landmark-Based Heuristics for Goal Recognition

(2017)

H. Admoni et al.

Social eye gaze in human-robot interaction: a review

J. Hum. Robot Interact.

(2017)

O. Dermy et al.

Multi-modal intention prediction with probabilistic movement primitives

R. Singh et al.

Combining planning with gaze for online human intention recognition

T. Foulsham

Eye movements and their functions in everyday tasks

Eye

(2015)

J.L. Rosch et al.

A review of eye-tracking applications as tools for training

Cogn. Technol. Work

(2013)

D. Das et al.

Supporting human–robot interaction based on the level of visual focus of attention

IEEE Trans. Human-Mach. Syst.

(2015)

E. Velloso et al.

The emergence of eyeplay: a survey of eye interaction in games

N. Kirchner et al.

Nonverbal robot-group interaction using an imitated gaze cue

H. Admoni et al.

Are you looking at me?: perception of robot attention is mediated by gaze type and group size

S. Brams et al.

Does effective gaze behavior lead to enhanced performance in a complex error-detection cockpit task?

PLoS ONE

(2018)

A. Korbach et al.

Differentiating different types of cognitive load: a comparison of different measures

Educ. Psychol. Rev.

(2018)

A.T. Duchowski et al.

The index of pupillary activity: measuring cognitive load vis-à-vis task difficulty with pupil oscillation

M. Meißner et al.

The promise of eye-tracking methodology in organizational research: a taxonomy, review, and future avenues

Organ. Res. Methods

(2018)

C.-M. Huang et al.

Using gaze patterns to predict task intent in collaboration

Front. Psychol.

(2015)

R. Bednarik et al.

A Computational Approach for Prediction of Problem-Solving Behavior Using Support Vector Machines and Eye-Tracking Data

(2013)

R. Ishii et al.

Effectiveness of Gaze-Based Engagement Estimation in Conversational Agents

(2013)

D.A. Robinson

The oculomotor control system: a review

Proc. IEEE

(1968)

A.T. Duchowski

Eye Tracking Methodology

(2017)

D.D. Salvucci et al.

Identifying fixations and saccades in eye-tracking protocols

M. Corbetta et al.

Control of goal-directed and stimulus-driven attention in the brain

Nat. Rev. Neurosci.

(2002)

Y. Abdelrahman et al.

Cognitive heat: exploring the usage of thermal imaging to unobtrusively estimate cognitive load

IMWUT

(2017)

Y. Abdelrahman et al.

Classifying attention types with thermal imaging and eye tracking

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

(2019)

A. Esteves et al.

Orbits: gaze interaction for smart watches using smooth pursuit eye movements

Cited by (33)

Learning multi-granular worker intentions from incomplete visual observations for worker-robot collaboration in construction
2024, Automation in Construction
Construction is often considered a dangerous and labor-intensive industry, with safety and workforce shortages plaguing the industry globally. Owing to its flexibility, Worker-Robot Collaboration (WRC) could combine the strengths of humans and robots, alleviating human workers from dangerous and tedious tasks and boosting productivity. WRC envisions robots assisting workers appropriately, which necessitates the ability to understand workers' intentions. However, challenges arise when robots recognize workers' intentions in construction. First, the highly unstructured and dynamic construction workplace prevents robots from capturing consistent long-term worker activities, which is detrimental to the information accumulation and disambiguation processes in intention recognition. Second, the long-horizon tasks with implicit steps make it difficult for robots to find a suitable granularity to capture workers' intentions. Considering these challenges, this study seeks to (1) explore whether it is feasible to recognize worker intentions with temporally incomplete observations and (2) investigate which granularity is best suited for recognizing worker intentions. To tackle the first challenge, a data-and-knowledge fusion method is proposed, where the contextual information from visual data is combined with task knowledge to mitigate the information loss due to incomplete observations. To address the second challenge, the best-suited granularity is determined by comparing the recognition accuracy and intention modeling ability of different granularities after defining and recognizing multi-granular worker intentions using a hierarchical task structure and Multi-Task Learning (MTL), respectively. Results show that the proposed method can recognize multi-granular worker intentions with macro-F1 scores higher than 0.85, and that the intermediate activity is the best-suited granularity as it strikes a good balance between intention recognition accuracy and intention modeling capability. This study contributes to the existing body of knowledge by recognizing multi-granular worker intentions and determining the appropriate granularity under temporally incomplete observations. The findings will help facilitate seamless WRC in construction.
STABC-IR: An air target intention recognition method based on bidirectional gated recurrent unit and conditional random field with space-time attention mechanism
2023, Chinese Journal of Aeronautics
Citation Excerpt :
Therefore, it is necessary to design intelligent intention recognition models that can take advantage of the efficient processing power of computers and the pattern reasoning and cognitive experiences of humans to achieve accurate recognition and reasoning of target intention in real time. Intention recognition is widely used in fields such as human behavior prediction,2–5 vehicle lane change intention6–9 and question and answer systems,10–12 and related technical methods are relatively mature. Currently, intention recognition methods are divided into two main categories: model-based intention recognition methods and data-based intention recognition methods.
The battlefield environment is changing rapidly, and fast and accurate identification of the tactical intention of enemy targets is an important condition for gaining a decision-making advantage. The current Intention Recognition (IR) method for air targets has shortcomings in temporality, interpretability and back-and-forth dependency of intentions. To address these problems, this paper designs a novel air target intention recognition method named STABC-IR, which is based on Bidirectional Gated Recurrent Unit (BiGRU) and Conditional Random Field (CRF) with Space-Time Attention mechanism (STA). First, the problem of intention recognition of air targets is described and analyzed in detail. Then, a temporal network based on BiGRU is constructed to achieve the temporal requirement. Subsequently, STA is proposed to focus on the key parts of the features and timing information to meet certain interpretability requirements while strengthening the timing requirements. Finally, an intention transformation network based on CRF is proposed to solve the back-and-forth dependency and transformation problem by jointly modeling the tactical intention of the target at each moment. The experimental results show that the recognition accuracy of the jointly trained STABC-IR model can reach 95.7%, which is higher than other latest intention recognition methods. STABC-IR solves the problem of intention transformation for the first time and considers both temporality and interpretability, which is important for improving the tactical intention recognition capability and has reference value for the construction of command and control auxiliary decision-making system.
Conflict Avoidance in Social Navigation—a Survey
2024, ACM Transactions on Human-Robot Interaction
Logics and collaboration
2023, Logic Journal of the IGPL
Evaluating the Usability of a Gaze-Adaptive Approach for Identifying and Comparing Raster Values between Multilayers
2023, ISPRS International Journal of Geo-Information
Cross-View Human Intention Recognition for Human-Robot Collaboration
2023, IEEE Wireless Communications

View all citing articles on Scopus

View full text

Combining gaze and AI planning for online human intention recognition

Abstract

Introduction

Section snippets

Background

Model

Study

Results

Discussion

Conclusion

Declaration of Competing Interest

Acknowledgements

Artif. Intell.

Acta Psychol.

Educ. Res. Rev.

J. Surg. Res.

Comput. Educ.

Comput. Graph.

Cognition

Human–agent teaming for multirobot control: a review of human factors issues

IEEE Trans. Human-Mach. Syst.

A Concise Introduction to Models and Methods for Automated Planning: Synthesis Lectures on Artificial Intelligence and Machine Learning

Probabilistic plan recognition using off-the-shelf classical planners

Landmark-Based Heuristics for Goal Recognition

Social eye gaze in human-robot interaction: a review

J. Hum. Robot Interact.

Multi-modal intention prediction with probabilistic movement primitives

Combining planning with gaze for online human intention recognition

Eye movements and their functions in everyday tasks

Eye

A review of eye-tracking applications as tools for training

Cogn. Technol. Work

Supporting human–robot interaction based on the level of visual focus of attention

IEEE Trans. Human-Mach. Syst.

The emergence of eyeplay: a survey of eye interaction in games

Nonverbal robot-group interaction using an imitated gaze cue

Are you looking at me?: perception of robot attention is mediated by gaze type and group size

Does effective gaze behavior lead to enhanced performance in a complex error-detection cockpit task?

PLoS ONE

Differentiating different types of cognitive load: a comparison of different measures

Educ. Psychol. Rev.

The index of pupillary activity: measuring cognitive load vis-à-vis task difficulty with pupil oscillation

The promise of eye-tracking methodology in organizational research: a taxonomy, review, and future avenues

Organ. Res. Methods

Using gaze patterns to predict task intent in collaboration

Front. Psychol.

A Computational Approach for Prediction of Problem-Solving Behavior Using Support Vector Machines and Eye-Tracking Data

Effectiveness of Gaze-Based Engagement Estimation in Conversational Agents

The oculomotor control system: a review

Proc. IEEE

Eye Tracking Methodology

Identifying fixations and saccades in eye-tracking protocols

Control of goal-directed and stimulus-driven attention in the brain

Nat. Rev. Neurosci.

Cognitive heat: exploring the usage of thermal imaging to unobtrusively estimate cognitive load

IMWUT

Classifying attention types with thermal imaging and eye tracking

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Orbits: gaze interaction for smart watches using smooth pursuit eye movements