1 Introduction

Autonomous vehicles (AVs) are claimed by many organisations to be close to commercial reality, but their lack of human behaviour understanding is raising concerns. While robotic localisation and navigation in static environments [76] and pedestrian detection [9] are well understood, AVs do not yet have the social abilities of human drivers—who can read the intentions of other road users, predict their future behaviour and then interact with them [10]. Pedestrians, unlike other road users such as cyclists, do not usually follow specific traffic rules, in particular when crossing the road at unsigned crossing points, making them especially difficult to model, predict, and interact with. Pedestrians and human drivers communicate and interact with one another via nonverbal signals including their positions and speeds, which are used to transmit intent information as well as to make progress on the road [66]. For example, a vehicle which drives deliberately close to a pedestrian to scare them is telling them to yield, while a vehicle which maintains a larger distance from them is inviting them to cross.

Recent trials of autonomous minibuses in La Rochelle (France) and Trikala (Greece) [52], highlighted the major drawback of perfectly safe self-driving cars: it was found that pedestrians were intentionally stepping in front of the AV several times in a day, delaying their progress in the knowledge that they would always yield to the pedestrian. This abuse of perfect safety systems is known as the ‘big problem with self-driving cars’ [8], and in the limiting case of optimal pedestrian behaviour and large crowd size becomes the ‘freezing robot problem’ of vehicles making no progress at all, as they are constantly forced to yield in every interaction [78].

Fig. 1
figure 1

Road-crossing scenario

To make progress towards such understanding, we recently proposed and solved a game-theoretical mathematical model of the road-crossing scenario represented in Fig. 1, based on the famous game of ‘chicken’ and called ‘sequential chicken’ [27]. In this model, the pedestrian and vehicle compete for space in the road as they move towards one another and threaten to collide with one another, by making a temporal series of game theoretic decisions to advance or yield. The model’s utility parameters for collisions and value of time were fit to human behaviours in a series of laboratory experiments [11,12,13, 16]. We also analysed real-world pedestrian–vehicle interactions through sequence analysis [15] to learn the most important features and how their ordering could be predictive of the outcome of an interaction [14]. The simplest mathematical solution of this game theoretic model was found to require the AV to deliberately hit the pedestrian with a small probability, in order to create a credible threat which discourages other pedestrians from taking advantage of it in the rest of the interactions [27]. This is not an ethical or legal arrangement for programming AVs in practice [79]. But the model then also suggested the possibility of an alternative solution: if the rare, large penalty of collisions could be replaced with more frequent but smaller negative utilities inflicted on pedestrians, then the same average penalty could be created and progress made by AVs without having to hit any pedestrians.

This motivates a new search for ways in which an AV could inflict small negative utilities onto pedestrians. Humans have evolved a sense of comfort and discomfort around one another as part of their social interaction mechanisms, which could provide a convenient and legal source of small negative utilities. For example, two pedestrians who actually collide with one another while trying to reach their destinations will obviously experience a real, physical negative utility, but it is found empirically that they also experience discomfort—a purely internally generated, psychological negative utility—when they are close but not actually touching. The study of this relationship was named proxemics by Hall [30]. Hall classified four discrete distance zones between people—intimate, personal, social and public—corresponding to distances where most people feel distinct levels of comfort or discomfort during interactions. If humans have evolved to feel real psychological negative utilities in the presence of only a possibility of collision, without it actually having to take place, then simply invading their personal space could be sufficient to penalise them enough to satisfy the game theory requirements.

It is not necessary for the reader of the present study to understand the game theory model, which provides only the motivation for the present study rather than any methods. The key motivation, taken only from its conclusions, is that it requires a utility function to directly assign numerical utilities to agents as a function of their positions. Positions are in general continuous values so a continuous proxemic utility function is required. Section 2 reviews the proxemic literature and finds that this is not yet available, which motivates the present study to develop new methods to infer it in the required form.

The method in Sect. 3 then forms a first step towards inferring pedestrian proxemic preferences for autonomous vehicle interaction control. It consists in directly inferring the continuous proxemic utility function of pedestrians from offline data from human driver–pedestrian interactions. This is the function that could then be programmed into autonomous vehicles using the sequential chicken game theory model to provide small negative utilities.

To link continuous proxemic utility functions to the more conventional views and models of proxemics from this literature, which are mostly based on Hall’s discrete zones, Sect. 4 then introduces a new concept: ‘physical trust requirement’. We show that this concept partitions the set of possible states of the world during interactions into three subspaces, for each agent. In the first, a negative utility such as a collision will happen and there is nothing either agent can do to prevent it. In the second, the negative utility may happen but only the other agent can choose to act to prevent it—this is the ‘trust zone’. In the third, the negative utility may happen but the pedestrian themself can act to prevent it, without needing to trust the other agent. This definition of physical trust requirement may be general to many human–robot interactions in physical or abstract state spaces, but in the case of autonomous vehicle interactions with pedestrians, we provide results showing that it maps cleanly and numerically to Hall’s physical proxemic zones, offering an explanation for why they emerge as discrete zones even when the proxemic utility function itself is continuous.

Section 5 finally applies both the proxemic utility function inference and physical trust requirement concept to existing public datasets, to report a real world continuous proxemic utility and physical trust requirements for the first time.

2 Related Work

This section gives a survey of related work, to search for any existing reports of numerical proxemic functions, or for any related results which might be used to infer such functions without the need for a new experiment. In particular, Hall’s influential work has encouraged most studies to measure and report results in terms of discrete zones, discarding the continuous distance information which we now require. This motivates the present study to infer continuous proxemic utility functions for the first time.

2.1 Proxemics in Social Sciences

Measuring interpersonal distances during social interactions is a well-studied topic in the social sciences since the introduction of the concept by Hall [30]. For example, it was found for human–human interactions that the intimate space is up to 0.45 cm, the personal space is up to 1.2 m, the social space is up to 3.6 m, and the public space is beyond this [45]. Thompson et al. [75] measured individuals’ interaction preferences via the rating of videotapes. This study showed that people have a distance where they feel comfortable during their interactions and when the distance is smaller or greater than that, they feel more discomfort. Hayduk [31] showed via a study with university students that personal space is a two-dimensional noncircular and flexible space that can vary in shape and size. Hecht et al. [32] performed two laboratory experiments (including one in a virtual environment) with subjects and found that personal space has a circular shape with about a 1-m radius. However, we believe that personal space can be modelled using only one dimension in the present road-crossing scenario. Stamps [72, 73] whose work is based on the theory of permeability, i.e. how people perceive (e.g. their safety) and make preferences within an environment, studied the effects of distance on participants’ perception of threat. These results showed that the perceived threat decreases with larger distances.

2.2 Proxemics in Human–Robot Interactions

Proxemics is also an active research area in human–robot interaction (HRI), as shown in the review proposed by Rios-Martinez et al. [65] which focuses on social cues, signals and proxemics for robot navigation. A recent review on nonverbal communication for human–robot interaction was proposed in [69].

Walters et al. [81] proposed a framework that shows how to measure proxemic features in HRI. Their study involved participants interacting with different robots and their preferences were measured. It is explained that factors that may change human proxemics even by 20 to 150 mm can be significant. In [3], a mobile robot was developed with an autonomous proxemic system that could approach and avoid people using the distances from [81]. Koay et al. [42] measured participants’ proxemics preferences using comfort level device during an HRI task.

Mead et al. [55] proposed an automatic method for annotating spatial features from 3D data of indoor human–robot interactions. In [56], the same data was used to train a Hidden Markov Model (HMM) to classify the interactions either as initiating or terminating based on the extracted physical Mehrabian’s metric [59] or psychophysical Hall’s metric [30]. In [57], the same authors studied the interaction between a robot and more participants, one by one. The interactions consisted in moving the robot towards the participants and backwards several times. The results showed that individuals’ pre-interaction proxemic preference (mean = 1.14 m, std = 0.49 m) was consistent with previous studies. With a uniform performance in the robot behaviour, the proxemic preference reached a mean = 1.39m and a std = 0.63 m, the participants adapted their proxemic preferences to improve the robot performance. Mead et al. [58] also investigated the influence of proxemics on human speech and gestures and measured how that impacts on the robot speech and gesture production. Their study consisted in recruiting 20 participants interacting by pairs (10 in total) who didn’t know each other and each participant had to interact with the robot (PR2). Their result for human–human interactions (HHI), with a mean = 1.44 m and a std = 0.34 m, was consistent with previous studies but the HRI result (mean = 0.94 m, std = 0.61 m) was much larger than in previous studies, which could be explained by the presence of robot gestures.

Heenan et al. [33] used proxemics and Kendon’s greeting observations [40] for a Nao robot interacting with human encounters. They applied Takayama and Pantofarou’s [74] empirical results for proxemics, which are 0.4–0.6 m (average interpersonal distances) with a 1.35 m robot’s height. They observed a larger distance between women participants and the robot, while men kept the same distance in HHI and HRI. In these experiments, the researchers also found an improvement of the robot’s social skills thanks to the proxemic behaviour and its greeting manner. Warta et al. [83] measured levels of social presence in HRI in a hallway. Participants were given a questionnaire to complete after interacting with a robot for a navigation task. In [39], Joosse et al. used a coding system to detect a set of attitudinal (likeability, human-likeness, trust) and behavioural attributes including non-verbal behaviour (eye-gaze, proxemics, emotion etc.) from participants interacting with a robot. The study showed some strong human reactions to a robot invading their personal space.

Kostavelis et al. [44] proposed a dynamic Bayesian network on top of an interaction unit to model human behaviour for a robot. Their method takes proxemic distances into account, allowing the robot to approach people at different distances depending on their current activity. Torta et al. [77] performed two psychometric experiments with subjects interacting with a small humanoid robot and proposed a parametric model of the personal space based on the results of these experiments. The model takes into account the distance and the direction of approach, and was evaluated with a user study where subjects are sitting and approached by the robot.

Henkel et al. [35] evaluated two predefined proxemic scaling functions (linear and logarithmic) for human–robot interactions. Their approach is different from ours in that the robot computes a gain value based on the proxemic distance with the human and then moves accordingly. Their experiments with participants in a search and rescue scenario and followed by a questionnaire showed a preference for a logarithmic proxemic scaling function. Patompak et al. [63] developed an inference method to learn human proxemic preferences. Their method is based on the social force model and reinforcement learning. They argued that proxemic spaces can be limited to two zones, the first being the quality interaction area where a robot could go without creating discomfort, and the private area which is the personal space. In addition, we believe that one more area is needed to model the trust relationship between humans and robots.

2.3 Proxemics in Pedestrian–AV Interactions

A comprehensive review on pedestrian models for autonomous driving is proposed in [9, 10], ranging from low-level sensing, detection and tracking models [9] to high-level interaction and game theoretic models [10]. In the context of autonomous vehicles, more work has been focused on pedestrian crossing behaviour [53], trajectory prediction [84] and for eHMI (external Human–Machine Interface) [20, 29, 50, 54]. Very few studies have investigated interpersonal distances for pedestrian-vehicle interactions.

Risto et al. [66] studied the use of drivers’ movement to signal intent and how these signals were understood by other road users. They video recorded pedestrian–vehicle interactions at different intersections and observed that pedestrian discomfort can be created by the vehicle approaching very close to the crosswalk boundary, which leads the pedestrian to slightly change their trajectory towards the other edge of the crosswalk. It was also noted that drivers tend to stop short, i.e. those who intended to stop used to do so much earlier than required by the law (i.e. at the white line for stop or crosswalk). Interview responses and observations showed that pedestrians use to understand ‘some forms of movement from the vehicle as communicating a message’. For example, [15] and [47] showed evidence that such implicit signalling through speed and positioning are the main form of signalling used in road-crossing interactions, as explicit forms of signalling such as hand gestures and facial expressions are not often used.

Domeyer et al. [24] investigated the quantitative parameters (i.e. time) of pedestrian–vehicle interactions at four pedestrian crossings, using annotated videos. In particular, the authors were interested in the effects of vehicle stopping short time (i.e. their proximity with the pedestrians). Their results showed that the median short stop time was around 1 s. They also found that vehicles, that had higher short stop times, were creating more safety margins, thus were more delayed. However, it was found that the stopping short time did not increase the overall time that the vehicle and the pedestrian would spend at an intersection.

2.4 Trust in Human–Robot Interactions

Various definitions of trust have been used for human–robot interactions. This section introduces some of these definitions and reports findings from several studies.

For instance, Lee and See [46] reviewed the concept of trust in automation. They defined trust as an ‘attitude that an agent will help achieve an individual’s goals in a situation characterised by uncertainty and vulnerability’. In [71], Smithson described trust as ‘a psychological state that entails the willingness to take risks by placing oneself in a vulnerable position with respect to the trustee’. He described uncertainty as being prevalent to a trust relationship, there is no trust without any risks. Henschke [36] described trust as a ‘key value’ in the development of autonomous systems. This paper discussed the ethical issues with autonomous systems but also referred to trust in these systems as a complex concept which could be defined as either reliability, predictability, goodwill, affect or public trust.

Floyd et al. [26] introduced the idea of inverse trust. They proposed a mathematical decision model for an autonomous system to measure the level of trust of a human team-mate and then adapts its own behaviour accordingly. Devitt et al. [23] described that with complex and intelligent autonomous systems, humans could become ‘overly trusting or overly skeptical’, especially when robots become intelligent enough and could manipulate their trust. Agrigoroaie and Tapus [2] focused their work on human informal behaviour and proxemics. The study showed that autonomous systems that are capable of understanding the processes behind human decision-making can have better interactions with them and are more likely to be trustworthy.

In [80], van den Brule et al. argued that not only the robot performance is important but also its behavioural style can have some influence on people’s level of trust. Their experiments in video and in VR showed that task performance is key for trustworthiness but that the robot behavioural style was also significant in the videos. Lewis et al. [48] explained that trust is dynamic (i.e. changing over time), and in [62] trust towards automation is directly related to reliance.

2.5 Trust in Human–AV Interactions

The study of pedestrian trust in AVs is a recent research topic. Previous work has mainly investigated the concept of trust for passengers of autonomous vehicles during shared-driving mode [4, 19]. More often, pedestrian trust in AVs has been investigated via the design and testing of external Human–Machine Interfaces [20, 54]. Rothenbuecher et al. [67] found that pedestrians lacked trust when interacting with a vehicle ‘disguised’ into an AV because they could not see a human driver inside, but at the same time they expected to trust more the AV because of its algorithmic capabilities. Deb et al. [22] performed a study using questionnaires to evaluate pedestrian receptivity towards autonomous vehicles, showing that males trust AVs more than females. The authors also warn that pedestrians could take advantage of perfectly safe autonomous vehicles.

Saleh et al. [68] proposed a framework that relies on social cues, e.g. intent understanding, to model trust between vulnerable road users and autonomous vehicles. Reig et al. [64] studied pedestrian trust in autonomous vehicles via interviews, showing for example that participants who were favourable to AVs were more likely to trust them and that the lack of knowledge about AV technology leads to mistrust. Using the definition of trust in [46] introduced above, Jayaraman et al. [38] studied pedestrians’ trust in autonomous vehicles in a VR experiment followed by a questionnaire using a Likert scale. It is argued that human trust increases with the increase of available information, and found that the AV’s driving behaviour and the presence of light can influence the trust of pedestrians. This study also showed correlations between pedestrian behaviour (distance to collision, gaze and jaywalking time) and their trust towards the AV.

2.6 Research Aims

Despite the numerous reviewed studies on proxemics and trust from the social science and human–robot interaction research communities, many works rely on qualitative or discretized findings from human experiments using questionnaires, interviews and video analyses. Pedestrian proxemics and trust are very recent topics in the context of autonomous vehicles research. No found studies have inferred continuous valued human proxemic utilities as now required by AV controllers, or linked these to trust concepts. There is little agreement on the definition of trust and new trust concepts are regularly proposed, which are mostly informal rather than directly implementable as mathematics and software for autonomous vehicles. Thus, the rest of the paper will contribute towards filling these gaps.

Summary of contributions This paper proposes:

  • A novel Bayesian approach to infer proxemic utility functions;

  • A new concept and mathematisation of ‘physical trust requirement’ for pedestrian–AV interactions, and also applicable to more general human–robot interactions which can numerically generate and explain Hall’s proxemic zones;

  • Empirical results of our method on two public datasets to infer pedestrian proxemic utility functions and trust zones.

3 Proxemic Utility Modelling

Our method consists in inferring the proxemic utility function of pedestrians from existing public datasets from interactions between human drivers and pedestrians. No new empirical experiments are performed in this study. Bayesian theory is used to fit parameters and compare competing models. The approach is first validated on simulated data whose ground truth correct answer is available, before running on empirical data from two public datasets in Sect. 5.

3.1 Proxemic Utility Definition

It is possible to measure the utilities (i.e. perceived costs and/or benefits) which humans assign to states of the world, by asking for or otherwise observing their preferences between states. Such preference orderings for rational agents can be shown to be mathematically equivalent to the assignment of a single number to each state, which is defined as the utility. This mapping from states to numbers is called the utility function [6].

We consider utility functions U as models M with parameters \(\theta = \{a_0,\ldots ,a_n\}\),

$$\begin{aligned} U = M(X, a_0,\ldots ,a_n), \end{aligned}$$
(1)

that assigns a real value U to the state X.

We assume that human proxemic utility can be described by such a parametric model with the state X being the physical distance between the two agents. Based on our prior knowledge from Hall’s theory, we expect the size of the negative utilities to roughly reduce with distance, so we choose several candidate parametric models, M, with a variable number of parameters, \(\theta \), including a hyperbolic function (2), a Gaussian function (3) and different degrees of polynomials (4),

$$\begin{aligned}&M_{hyperbolic}(X, \theta ) = a_0X^{-1} , \end{aligned}$$
(2)
$$\begin{aligned}&M_{Gaussian}(X, \theta )= {\mathcal {N}}(X, a_1=\mu ,a_0=\sigma ^{2}), \end{aligned}$$
(3)
$$\begin{aligned}&M_{polynomial(n)}(X, \theta )= a_nX^{n} + a_{n-1}X^{n-1}+\cdots + a_1X + a_0. \end{aligned}$$
(4)

We chose these candidate functions via three considerations. First, if we assume very little about the form of the function—just that it is reasonably smooth—then we need to have at least one highly flexible generic model which is able to fit to any smooth function. This is delivered by the polynomial candidate. Second, we have a prior scientific intuition—a hypothesis to test—that the function will be roughly hyperbolic shaped, starting high and falling off with distance. We include a hyperbolic model for this reason. Finally, the Gaussian is included just because it is a common function which often emerges in solutions of physical processes and easy to include. If additional candidates are proposed in the future, they can also be tested against the ones included here.

Throughout this paper, we assume that all agents are rational and that utility can be measured in units of seconds (roughly equivalent to ‘time is money’). Human pedestrians and drivers assign a value of travel time in their journeys [1, 5, 17, 37, 82], and using this as the unit will simplify the analysis. We do not model the negative utility of a crash as an additional explicit term because the proxemic model is already able to include it as the utility of a zero distance contact.

3.2 Proxemic Utility Inference Method

A Bayesian inference method is used to infer the proxemic utility functions from observed data. It consists in fitting different parametric models to the data in order to obtain the best parameters for each model. The observations are the distances between the two agents, X, their speeds, v and \(v_{ped}\), and the outcomes of the interactions (pedestrian crossing or stopping). We used nonlinear least squares optimisation (implemented via the Python Scipy.optimize package) for the model fitting. At each optimisation iteration, we used the candidate model parameters proposed by the optimiser to compute optimal actions for the pedestrian for every possible distance X. These optimal actions are compared against the actual actions seen in the data, for the particular distances in the data, and this comparison is used to compute the probability that given the model, the proposed parameters are the true ones.

This is done using Bayes’ theorem as follows: under a given model, M, with parameters \(\theta \) and data D, we have,

$$\begin{aligned} P(\theta | M,D) = \frac{ P(D| \theta , M)P(\theta | M) }{ \sum _{\theta '} P(D| \theta ', M)P(\theta '| M) }. \end{aligned}$$
(5)

We assume a flat prior over \(\theta \) so that,

$$\begin{aligned} P(\theta | M,D) \propto P(D| \theta , M), \end{aligned}$$
(6)

which is the data likelihood, given by,

$$\begin{aligned} \displaystyle {P(D| \theta , M) = \prod _{i} P(A_{i} | x_i, {x_{ped}}_{i}, v, v_{ped}, \theta ,M' )}, \end{aligned}$$
(7)

where \(A_{i}\) is the pedestrian observed action choice, e.g. crossing or stopping, \({x_i\hbox { and }{x_{ped}}_{i}}\) are observed car and pedestrian locations at the start of an interaction and v and \(v_{ped}\) are observed car and pedestrian speeds. \(M'\) is a noisy version of the optimal model M, which plays actions from M with probability \((1-s)\) and maximum entropy random actions (0.5 probability of each speed) with probability s. This is a standard noise modification, used for example in psychological Bayesian data analysis [11, 16, 49], which allows the model to fit data where agents have made deviations from perfectly optimal strategies. Without this noise term, the model would assign probability zero to any deviation from perfect behaviour. But humans—and most other objects modelled using statistics—rarely behave exactly according to any mathematical model, so the noise term enables the models to fit approximate behaviours.

3.3 Model Comparison

To select the best fitting proxemic utility function from the set of candidate models \(M_i\), we would like to compute and take the maximum of \(P(M_i|D)\). This is computationally hard due to a required integral over the parameters of the models,

$$\begin{aligned} P(M_i|D) = P(M_i)\int _{\theta _i}P(D|M_i, \theta _i)P(\theta _i|M_i) \theta _i. \end{aligned}$$
(8)

We instead compute and use the Bayesian Inference Criterion, (BIC) [70] which is a standard approximation to this integral,

$$\begin{aligned} BIC = \log (n)K - 2\log (L) \approx P(M_i|D). \end{aligned}$$
(9)

The integral, and the BIC approximation to it, are able to correctly compare competing models \({M_i}\) in cases where the models have differently (K) sized parameter spaces, by combining the likelihood \(L=P(D|M_i, {\hat{\theta }}_i)\) of n observations in data D under the model \(M_i\) with the Occam factor arising from the prior over the model’s parameter space, \(P(\theta _i | M_i)\), assuming a flat prior on the models themselves, \(P(M_i)=P(M_j)\). This automatically and correctly penalises models with many parameters for potentially overfitting to data [70].

3.4 Validation via a Simulation Study

To validate our proxemic utility inference method, we developed a simulation with a simple crossing scenario with a pedestrian and a car on \({\hbox {a road with a width }w}\), as shown in Fig. 2. We simulate the internal reasoning of a pedestrian based on a known (ground-truth) proxemic utility function and the vehicle time utility for a crossing decision. Simulated pedestrian behaviour data is generated, and used to infer back the proxemic function. Validation occurs if the inferred proxemic function matches the input proxemic function used to generate the behaviour.

Fig. 2
figure 2

Pedestrian–vehicle interaction simulation

3.4.1 Assumptions

The purpose of the simulation is only to validate that the system is able to recover the ground truth (i.e. infer the ground truth values used as inputs to the simulation back from the output of the simulation). It does not matter which particular ground truth is used for validation. So to create the simulated data, we choose the following arbitrary settings: the car moves at a constant speed (2 m/s) and the pedestrian is standing at the edges of a crosswalk, ready to cross. The pedestrian also moves at constant speed, 1 m/s. The pedestrian is assumed to have an internal reasoning about the utility of crossing and avoid a potential crash with the car. They compare the negative utility (effects) caused by the proximity with the car with the time delay that would occur if they wait for the car. If the proximity cost (measured in seconds, assuming time is a currency) is less than the time delay, i.e. if they are able to cross before the car reaches the intersection, then they are incentivised to do so.

3.4.2 Data Generation and Inference Results

We generated data from a pedestrian–vehicle interaction simulation, using a predefined proxemic utility function. We defined random starts for the vehicle, to create 1000 different pedestrian–vehicle interactions. We then used the data collected to implement and test our inference method to recover the original proxemic utility function. Examples of functions that we tested are shown below.

Hyperbolic Function Firstly, we evaluated our inference method with a ground truth hyperbolic proxemic function,

$$\begin{aligned} M_{hyberbolic}(X, a_0) = a_0X^{-1}, \end{aligned}$$
(10)

with \(a_0=1\), as shown in Fig. 3a along with the time utility function and the crossing decision for the interactions. As we can see in the results of the model fitting, in Fig. 3b, the best model is the hyperbolic function with the maximum likelihood (loglik = \(-105.36\)) and the lowest BIC value (BIC = 217.629). All other models have a lower likelihood and a higher BIC value, for example, the second best model is the quadratic function with a likelihood of \(-107.55\) and a BIC equal to 235.839.

Fig. 3
figure 3

Simulation with a hyperbolic proxemic function

Quadratic Function Secondly, we used an arbitrary quadratic function,

$$\begin{aligned} M_{quadratic}(X, a_2, a_1, a_0) = -X^2 + 5X + 25, \end{aligned}$$
(11)

as the ground truth. Figure 4a shows the ground truth quadratic proxemic and time utility functions with the pedestrian crossing decisions. As shown in Fig. 4b, the best model is the quadratic function with the maximum likelihood (loglik = \(-1089.72\)) and the lowest BIC value (BIC = 2200.158). All other models have a lower likelihood and a higher BIC value, for example, the second best model is the cubic function with a likelihood of \(-1109.49\) and a BIC equal to 2246.615.

Fig. 4
figure 4

Simulation with a quadratic proxemic function

Quartic Function Thirdly and lastly, we evaluated our method with an arbitrary quartic function, (i.e. polynomial function of degree 4),

$$\begin{aligned} M_{polynomial(4)}(X, a_4, a_3, a_2, a_1, a_0) = {-}0{.}08X^4 {-} X^3 {+} 3X {+} 0{.}5,\nonumber \\ \end{aligned}$$
(12)

as the ground truth as shown in Fig. 5a along with the time utility function and the crossing decision for the interactions. The results of the model fitting are shown in Fig. 5b. The quartic and septic functions have the maximum likelihood (loglik = \(-122.93\)) but the quartic function is ranked as the second best model according to the BIC values with a BIC equal to 280.415. Instead, the Gaussian model (loglik = \(-129.52\), BIC = 272.875) is selected as the best model due to its lower number of parameters. However, we can note here that the shape of the ground truth function shown in Fig. 5a looks very similar to a Gaussian, so the selection of the Gaussian model for this case is perfectly understandable.

The above results show that our proposed method for inferring proxemic utility function works on simulated data and with different ground utility functions.

Fig. 5
figure 5

Simulation with a quartic proxemic function

4 Physical Trust Requirement

4.1 Trust Definition

Refining Lee and See’s concept of trust [46] reviewed above, where trust is defined as an attitude in ‘a situation characterised by uncertainty and vulnerability’, we define a new related concept: physical trust requirement (PTR), a Boolean property of the physical state of the world (not of the psychology of the agents) with respect to one agent during an interaction, true if and only if the agent’s future utility is affected by an immediate decision made by another agent.

Fig. 6
figure 6

Proxemics–trust relation in pedestrian–vehicle interaction

We thus measure the need for trust from pedestrian behaviour in uncertain situations. The PTR divides the proxemic function into three zones as shown in Fig. 6, as the PTR is true in the trust zone and false in the crash and escape zones. We made some assumptions and used numerical values to obtain specific equations and numbers for the three zones in our road crossing case:

  1. 1.

    Crash zone This is the region very close to the human agent, where they will be affected by negative consequences and no-one can prevent them from occurring, so no trust is involved. In the road-crossing case, this occurs when the pedestrian is in the road and the car is very close, with neither able to run or brake to prevent the collision.

    The crash zone, \(\{ d : 0< d < d_{crash} \}\), is the region delimited by the reaction and braking distances of the vehicle, given by the standard stopping distance equation [51],

    $$\begin{aligned} d_{crash} = v t_{driver} + \frac{v^2}{2\mu g} , \end{aligned}$$
    (13)

    where the first term depends on the human driver’s psychological thinking reaction time, \(t_{driver}\), and the second term represents the physical braking distance (depending on the physical friction between tyres and tarmac, and equal to the length of any physical skid marks left by the vehicle after the driver begins to apply the brakes), v is the vehicle speed, \(\mu \) the coefficient of friction and g the gravity of Earth.

  2. 2.

    Escape zone This defines the area where the human agent is able to choose their own action to avoid the negative utility, rather than relying on the other agent. As such, it does not need to trust the other agent. In our road-crossing case, this occurs when the vehicle is further away from the pedestrian, so that the pedestrian has time to act and save themself without trusting the vehicle to yield.

    The escape zone, \(\{ d: d_{escape} < d \}\), is the set of distances beyond which pedestrians do not fear any potential danger from the vehicle. In this zone, pedestrians can complete their crossing before the vehicle arrives. The escape distance \(d_{escape}\) is the minimum distance at which this is the case. Consider the time \(t_{cross}=w/v_{ped}\) it takes for the pedestrian to cross, during this time, the vehicle moves by distance \(wv/v_{ped}\), where \(v_{ped}\) is the pedestrian speed and w is the width of the road. When we also add the distance moved by the vehicle during \(t_{ped}\), the human pedestrian’s reaction time to make their crossing decision before starting to walk or not walk, then we obtain the escape distance,

    $$\begin{aligned} d_{escape}= & {} vt_{ped} + vt_{cross} \nonumber \\= & {} vt_{ped} + w\frac{v}{v_{ped}}. \end{aligned}$$
    (14)

    This escape distance then defines the start of the escape zone.

  3. 3.

    Trust zone We define the trust zone as the region of the proxemic function where the PTR is true. The other agent (e.g. the car) can choose (e.g. by slowing down) to prevent them from receiving negative effects (e.g. collision), but the human is incapable of making any action to affect the utility outcome themself. In the road crossing case, this occurs when the pedestrian cannot get out of the car’s way in time to avoid collision, but the car is able to brake and yield to prevent the collision if it chooses to do so. This excludes the crash zone in which neither agent has any available choice to avert collision, and also excludes the escape zone. So the trust zone is \(\{ d: d_{crash}< d < d_{escape}\}\), the intermediate space between the crash and escape zones.

When the pedestrian is in the crash zone, the vehicle has no possibility to avoid an accident, whereas in the escape zone the pedestrian can always cross safely. When the pedestrian is in the trust zone, the vehicle has the sole power to decide if a collision will occur. It is thus in the trust zone that it would be important to study whether and how people do or should trust autonomous vehicles or not.

4.2 Zones Analysis: Comparison with Hall’s Zones

We here derive some mathematical results from our zone definitions and link them to previous results on Hall’s proxemic zones. Figure 7 shows the distances \(d_{crash}\) and \(d_{escape}\) and the zones defined by equations 13 and 14, for variable vehicle speeds v. We here assume: \(w=2\) m for the road width, \(t_{driver}=1\) s as the driver reaction time [21, 28], \({v_{ped} = 1.1}\) m/s as the average walking speed of the pedestrian [25, 41, 60], \(t_{ped}=1.5s\) as the pedestrian reaction time (chosen to be similar to the driver reaction time but a little larger because drivers may be more focussed on their task than pedestrians) [18], \(\mu =1\) for the coefficient of friction [34, 61] and \(g=9.8{\text { m/s}}^2\) for the gravity of Earth.

Fig. 7
figure 7

Distances and zones predicted by the PTR model for different car speeds v (7a is a close-up of 7b)

By comparison, the related work review found that Hall zones for human–human interactions are usually reported to be around: intimate up to 0.45 cm, personal up to 1.2 m, social up to 3.6 m, and public beyond this [45].

The vertical line in Fig. 7 shows the case \(v=1.1\) m/s in which the vehicle has the same speed as the pedestrian, i.e. the vehicle is behaving as if it was a second pedestrian interacting with the first. In this case, the size of the Hall personal zone, 1.2 m, closely matches that of our crash zone in Fig. 7a, \(d_{crash}=1.16\) m when \(v=1.1\) m/s (as would be the case when the other is another human rather than a vehicle) and retaining other parameters (including, quite unrealistically, retaining the friction model and coefficient walking rather than wheels). The size of the Hall social zone, 3.6 m, also closely matches our \(d_{escape}=3.65\) m from the graph.

We also note that Fig. 7a predicts that social human–robot interactions in which the robot is slower than a human, as is the case for most humanoids, will have smaller crash and trust zones, which matches the related work reviewed in which personal and social zones were found to reduce compared to human–human proxemics. Also, the trust region in Fig. 7b gets smaller with speed, reaching zero width when linear and quadratic curves meet at around \(45{\text { m/s}}=162{\text { km/h}}\). This is quite close to official and unofficial speed limits on most countries’ motorways/freeways.

Fig. 8
figure 8

The ratio of escape zone size to crash zone size, R, decreases as the car speed v increases, showing that the crash zone dominates at high speeds

If we further define and consider R, the zone ratio given by the size of the trust zone relative to the speed of the car,

$$\begin{aligned} R= \frac{D_{escape}}{D_{crash}} = \frac{v t_{ped} +v( w/ v_{ped} )}{ v t_{driver} + v^2 /2\mu g} = \frac{t_{ped} +( w/ v_{ped} )}{ t_{driver} +v/2\mu g}. \end{aligned}$$
(15)

Then we see that as vehicle speed increases, the effect of \(t_\mathrm{driver}\) becomes negligible, and the zone ratio tends to zero, meaning that the crash zone’s size comes to dominate the others:

$$\begin{aligned} v \rightarrow \infty \Rightarrow R \rightarrow \frac{2 \mu g (t_{ped} +( w/ v_{ped} ))}{v} \rightarrow 0, \end{aligned}$$
(16)

and as vehicle speed decreases, the zone size ratio converges to a constant:

$$\begin{aligned} v \rightarrow 0 \Rightarrow R \rightarrow \frac{ t_{ped} +( w/ v_{ped} )}{t_{driver}}, \end{aligned}$$
(17)

which shows that if the ratio of zone sizes is considered rather than their absolute size, then all dependency on friction and gravity has vanished in the high and low speed limits. Thus, all road and car specific concepts have vanished to leave a more general proxemic relationship which may be of interest in general human interaction cases rather than only road-crossings. Figure 8 shows the variation of R relative to the speed of the car\({\hbox {, and that the value of }R\hbox { in Eq.}}\) (17)\({\hbox { tends to the constant }}{3.5.}\)

5 Empirical Data Study

To demonstrate the inference of empirical pedestrian proxemic utility functions, we then apply the method to data from real-world pedestrian interactions with manual driven vehicles. We used two public datasets containing tracking data from multiple road users. We only considered the interactions where the pedestrian crosses or stops for utility, i.e. when the gap is greater than the safety distance so that we can learn how the pedestrian adjusts their comfort zone. We then compute the PTR zones for these datasets.

Fig. 9
figure 9

Pedestrian intention with a vehicle, from its dashcam, in the Daimler dataset [43]

Fig. 10
figure 10

Histograms of vehicle and pedestrian speeds in Daimler dataset, showing that average speeds \(v \approx 5.25\) m/s and \(v_{ped} \approx 1.60\) m/s are good approximations

Fig. 11
figure 11

Crosswalk in inD dataset [7]

Fig. 12
figure 12

Histograms of vehicle and pedestrian speeds in inD dataset, showing that average speeds \(v \approx 4.79\) m/s and \(v_{ped} \approx 0.99\)  m/s are good approximations

5.1 Datasets

5.1.1 Daimler Pedestrian Benchmark

The Daimler dataset [43] contains 58 pedestrian–vehicle trajectory data and annotations, such as pedestrian crossing decisions. The dataset was not collected from real-world interactions, the pedestrians and drivers were actors. The authors created these interaction scenarios for their work, 44 of these were pedestrian crossing scenarios and the other 14 interactions were stopping scenarios. Figure 9 shows a dash cam image of one interaction scenario. The distribution of vehicle and pedestrian speeds in the dataset is shown in Fig. 10.

5.1.2 inD (Intersection Drone Dataset)

The inD dataset [7] is a newly released dataset which provides road users (cars, trucks, cyclists, pedestrians) tracking data. There are 32 videos recording data from 4 different intersections in the dataset, which contains thousands of real-world interactions. But as the videos were not released with the trajectory data, we decided to focus on one intersection, where there is clearly a pedestrian crosswalk, thus pedestrians crossing the road would necessarily interact with the upcoming vehicles. Twelve recordings (n\(^\circ \)18 to n\(^\circ \)29) contain data from the crosswalk shown in Fig. 11. The distribution of vehicle and pedestrian speeds in the dataset is shown in Fig. 12.

5.1.3 Criteria for interactions’ selection

As inD dataset contains multiple classes of road users but we were interested in pedestrian–vehicle interactions only, we extracted them from the rest of the data in a semi-automatic manner and annotated them. For each given pedestrian, we find the car that appeared a few frames earlier and then we select the frames where they both appear together. We only kept interactions where the vehicle and the pedestrian were encountering somewhere near the coordinates (x = 62, \(y = -27\)), to make sure the pedestrians cross at the crosswalk, not any other locations, where they would jaywalk and we would have no possibility to know the hidden factors behind that decision. We selected trajectories where cars and pedestrians followed a straight path until their encounter, in order to match with our simulation model. We kept pedestrians walking from the bottom right, we didn’t consider pedestrians coming from the top right because most of them were not crossing, as there was a car park.

In total, we used the 58 interactions from the Daimler dataset and we collected 48 more interactions from inD dataset, with 24 where the car came from the top right of the image, and the other 24 where cars came from the bottom left of the image. Figure 13 shows some examples of pedestrian–vehicles trajectories from both datasets.

Fig. 13
figure 13

Examples of interactions from the datasets

5.2 Proxemic Utility Model Selection

5.2.1 Proxemic Utility Implementation

First, we applied our proxemic utility inference method on the two datasets, similar to the simulation study in Sect. 3.4, except that here we would not know the ground truth function for final comparison. The goal here is thus to infer the unknown proxemic utility function from the data and select the best model with the lowest BIC value.

5.2.2 Proxemic Utility Results

Results of the proxemic utility inference method on the Daimler and inD datasets are shown in Figs. 14 and 15, respectively. They show that a hyperbolic function best describes pedestrian proxemic behaviour in both cases, with the lowest BIC values (Daimler BIC = 174.482, inD BIC = 62.325). The proxemic utility costs increases with shorter proxemic distances, and with a steep growth near the collision point. These results are consistent with the human experiments in [72, 73], where participants’ perception of threat (negative utilities) increases at shorter distances and decreases at longer distances.

Fig. 14
figure 14

Model fitting results for Daimler dataset

Fig. 15
figure 15

Model fitting results for inD dataset

5.3 Zones Computation

5.3.1 Zones Implementation

Second, we computed two different estimates of the zone distances, called ‘theoretical’ and ‘empirical’ zones. Both estimates make use of the data. The theoretical estimate makes use only of average speeds from the data, and the empirical estimate makes use of extreme individual behaviours from the data.

We define theoretical zones as the solutions of the equations in Sect. 4.1 given by assuming that all vehicles move at the average speed of the vehicles in the dataset, and all pedestrians move at the average speed of pedestrians in the dataset. This assumption is justified approximately by the histograms of these speeds in the datasets, as shown in Figs. 10 and 12, which show that vehicles are all moving at similar urban speeds of 0–30 km/h and pedestrians are all moving at similar walking speeds. The average speed of vehicles in Daimler was \(v \simeq 5.25 {\text { m/s}}\); and in inD: \(v \simeq 4.79 {\text { m/s}}\). The average speed of pedestrians was in Daimler: \(v_{ped} \simeq 1.60 {\text { m/s}}\); and in inD: \(v_{ped} \simeq 0.99 {\text { m/s}}\). We here use the same constants as in Sect. 4.2, with \(w=2\) m for the road width, \(t_{driver}=1\) s as the driver reaction time, \(t_{ped}=1.5s\), \(\mu =1\) for the coefficient of friction and \(g=9.8{\text { m/s}}^2\) for the gravity of Earth.

We define empirical zones by finding in the datasets the maximum distance below which pedestrians always stop and the minimum distances above which they always cross. This is intended to provide only an exploratory measure. It is not a true statistical estimator, because its error increases rather than decreases with sample size due to its dependency on only the most extreme individuals.

5.3.2 Zones Results

Results of the theoretical zone experiments are shown in dark blue in Fig. 14 for Daimler dataset and in Fig. 15 for inD dataset. The empirical zones are shown in dark red in Fig.14 for Daimler dataset and in Fig. 15 for inD dataset.

For the Daimler dataset, the theoretical trust zone is between 7–15 m and the empirical trust zone is between 14–45 m. For the inD dataset, the theoretical trust zone is between 6–17 m and the empirical trust zone is between 10–31 m.

The theoretical and empirical zones for the two data sets are roughly in agreement which suggests the effect of the actors in Daimler is not important. The boundaries of these zones, both theoretical and empirical, would change if the vehicle drives at a higher or lower speed.

The width of all of our theoretical (crash, trust and escape) zones are smaller than the empirical zones. We found that our theoretical zones were underestimated relative to the empirical zones, by about three times in Daimler dataset and by two times in inD dataset. We compute these coefficients by iteratively updating by increments the theoretical crash and trust zone boundaries. This underestimation of the theoretical zones is expected because we computed them under many simplifying assumptions, including using average speeds across the datasets and guessed other parameters such as the driver reaction time (\(t_{driver}\)), the pedestrian reaction time (\(t_{ped}\)) and the coefficient of friction (\(\mu \)). If all the interactions were performed with these average speeds and parameters (\(t_{driver}\), \(t_{ped}\) and \(\mu \)), then the theoretical zones might match the empirical zones. In fact, Figs. 16 and 17 show the time utilities and outcomes (pedestrian crossing decisions) for each interaction in the Daimler and inD datasets, respectively. In particular, the time utility graphs show the variations of vehicle speeds across the interactions. This may explain why our theoretical trust zones do not match the empirical trust zones. Moreover, if we had computed the theoretical zones for each interaction (with their corresponding speeds), it would not be possible to analyse and to make a general discussion on these zones with respect to the proxemic utility function, which was drawn from all the interactions in each dataset.

For this reason, we will base the rest of our analysis of trust on the empirical zones. We can see that the trust zone is the area of the proxemic utility function where the gradient changes more. This reflects the high uncertainty that lies inthe trust zone. The decision of a pedestrian to cross isuncertain here because the pedestrian has to rely on thevehicle to make a decision. In contrast, in the crash and escape zones, we see that the gradient of the proxemic utility function changes less, this is due to the more deterministic outcome in these areas. In the crash zone, the distance and the speed of the vehicle give enough information to the pedestrian for not crossing and in the escape zone, the vehicle behaviour does not interfere into their crossing decision because the danger cannot be perceived by the pedestrian as found in [72, 73], therefore they will cross.

Finally, for the actual average car speeds in the two datasets, equation 15, computed by the ratios of the theoretical zones \(D_{escape}/D_{crash}\) from Figs. 14 and 15 , gives for Daimler \(R=15/7=2.1\), and for inD \(R=17/6=2.8\). Using the empirical zone boundaries from the same figures, we obtain Daimler empirical \(R=45/14=3.2\); and inD empirical \(R=31/10=3.1\). These results closely match the ratio found for Hall’s zones in Sect. 4.2.

Fig. 16
figure 16

Time utility and ground truth interaction outcomes for Daimler dataset

Fig. 17
figure 17

Time utility and ground truth interaction outcomes for inD dataset

6 Discussion

Although the proxemic utility inference method has proven successful on simulation and real-world interactions, several simplifying assumptions were made in order to present and test the basic principles of the method, from which future work should try to move away in order to obtain more reliable results. In particular, we assumed that all vehicles move at an average vehicle speed rather than their individual speeds, which is a likely cause of the observed discrepancy between the theoretical and empirical trust zones. This discrepancy is a useful self-test of the model’s assumptions, so if future work brings them closed that would give some confidence in the proxemic utility results.

The basic premise of this study, as taken from the game theory model conclusions, was that a proxemic function captures the feeling of discomfort from space invasion by vehicles. However, speed considerations might be extended into the utility function itself: a pedestrian might feel comfortable standing 10 m from a car if it is moving towards them at 1 m/s, but not at 10 m/s. Including the speed of the vehicle as an additional parameter in the pedestrian’s utility function would formally move future models from being proxemic functions to include a kinesic component (i.e. involving speed as well as proximity) as suggested in [24] and this may further improve interaction control.

We assumed that the pedestrian and the vehicle were solely interacting with each other, ignoring simultaneous interactions with other individuals. We also assumed that the agents always moved along straight, orthogonal paths as in the game theory model, thus we did not include the interactions where pedestrians were not crossing straight away. We used only parametric models to infer the proxemic utility function, future work could explore the use of non-parametrics such as Gaussian Processes and compare their performance against the present models, which is possible via the BIC. Reviewed previous work on proxemics has shown that demographics, social, cultural and environmental factors can have an influence on the proxemic distances [55], therefore it would be important to incorporate some of this additional information and to build a more precise inference model on them. Reviewed previous work on human–robot interactions has shown that the physical size such as height of the other agent also affects proxemics zone sizes, which suggests a similar role for physical car sizes in modifying proxemic utilities. In particular, it provides a further explanation for how buying expensive sports utility vehicles (SUVs) can be rational via their infliction of stronger proxemic penalties onto other road users, thus allowing the driver to win more interactions and reduce their own journey times [27].

Additional future work could look into testing our method on human–AV interactions in virtual reality experiments, and demonstrates its effectiveness on a real autonomous system for better interactions with people. In these settings, it would be possible to collect causal data rather than the passive data used in the present study, as the vehicle can be actively controlled as an independent variable in order to measure the dependent behaviour of the pedestrian, more clearly separating the causal logic between the two agents during their interaction.

We have mainly focused on pedestrian–vehicle interactions, but the concepts and methods here could be applied to other human–robot interaction tasks. For example, human factory workers collaborating with a robot arm could be modelled by a trust zone in which the arm is able to hit them without time or space to escape.

We have merged Hall’s intimate and personal zones to map jointly into our collision zone, and did not attempt to explain any theory of intimacy within this zone. In general proxemics, our collision zone would be the distance at which a physical attack such as a punch or grab (analogous to the vehicle collision) may (a) have already happened or (b) be in unstoppable progress. Possibly this would subdivide with (a) as Hall’s intimate zone and (b) as Hall’s personal zone, with the width of the intimate zone being the collision area width w.

Using space invasion to inflict small negative utilities via discomfort on members of the public may still be considered unethical or illegal in some cases. In many jurisdictions, such as in the UK, this is an ongoing dilemma under active debate by authorities [79]. We hope the present study will contribute to this debate, by showing how this option trades off against other possible negative utilities, including those inflicted on passengers of such vehicles whose journeys would be delayed by overly assertive pedestrians pushing in front of them. Human drivers already use many such credible threats to encourage pedestrians to get out of the way. In many cases, these threats result in actual collisions. Replacing these threats by automated systems which only invade space rather than potentially collide would improve safety.

7 Conclusion

A previous game theoretic model has suggested that autonomous vehicles must either risk making no progress at all by yielding to all road-crossing pedestrians to stay safe, or maintain a credible threat of actually colliding with them to encourage them to yield. Neither of these are desirable outcomes. The new method developed in the present study now enables the inference of continuous pedestrian proxemic utility functions from pedestrian–driver interaction data. The game theory model shows that this can be used to make their interactions both safe and efficient. This can be done by de-escalating the severe threat of collision to much milder and legally permissible threat of merely invading their personal space to create discomfort as a weaker but still effective penalty for non-collaboration in interactions.

We also defined and mathematically formalised a new concept of trust based on the proxemic function for human–autonomous vehicle interactions. These new, quantitatively defined, zones for the physical trust requirement may assist autonomous vehicle designers in understanding what is meant and required by the concept of trust. The mathematical and empirical results of Sect. 4.2 are evidence that our concept can explain the existence of the classic Hall intimate-personal, social and public zones, quite precisely generating their sizes and ratios, which emerge as a special case for two low speed agents interacting.

Our concept generalises these Hall zones beyond their usual use in human–human interactions to allow for larger zones as the speed of the other agent increases from human to vehicle speed, and shows how trust zones become relatively smaller at higher (e.g. freeway/motorway) speeds. It also generalises to interactions with agents moving slower than humans and predicts smaller zones in these cases, which is consistent with the human–robot proxemics studies previously reviewed.