1 Introduction

The idea of interacting with animated (humanoid) machines that free mankind from labor has been fascinating ever since, starting with Leonardo da Vinci’s visions of mechanical machines [1] and becoming concrete (and frightening) in the play “Rossum’s universal robots” by Karel Capek [2]. Science fiction has been thrilled by those machines and has introduced world-famous (fictional) robots, such as the Terminator, R2-D2 and C-3PO (Star Wars), or Lieutenant Commander Data in the TV series Star Trek: The Next Generation. However, robots have not remained fictional. In 1962, the first real industrial robots were introduced in the Ford factory in Canton [3]. Since then, the implementation of robots into various work domains has become an increasing technology-driven trend [4,5,6,7]. The first generations of robots, however, had nothing to do with their fictional counterparts—looking rather technical than humanoid and working behind fences, because of safety and efficiency reasons. Compared to these earlier generations, today’s collaborative robots directly interact with the human in terms of time and space. This is facilitated by technological advances, especially regarding sensor technology, enabling more complex and adaptive robotic behavior while interacting with humans. This also implies new forms of interaction, and has resulted in a variety of robotic applications, like military search and rescue (S&R) missions, healthcare support, service assistance, therapeutic implementation to substitute animals, or the human–robot collaboration in the manufacturing domain. This widespread use of robots has come along with a plethora of robotic appearances and interaction concepts, which is also mirrored in HRI research. However, a great variety entails a challenge for creating replicable and generalizable results. Many experimental HRI studies revealed notable findings, but whether these results remain valid in case the context, the robot, or anything else changes, is questionable. To illustrate this lack of comparability and generalizability, different studies from research on an extensively investigated phenomenon in HRI—anthropomorphism—are presented below.

The term anthropomorphism is often loosely used for the tendency to attribute human characteristics to inanimate objects or animals, in order to rationalize their actions. More precisely, it goes beyond attributing life to nonliving objects, as it includes “attributing capacities that people tend to think of as distinctly human to nonhuman agents, in particular human-like mental capacities (e.g. intentionality, emotion, cognition)” [8, p 220]. Anthropomorphism is applied mainly (but not exclusively) in social robotics to support meaningful social interactions and human acceptance of the robot [9]. As depicted by the following examples, this phenomenon can be induced by characteristics of a robot’s appearance, behavior and communication, as well as by framing the robot anthropomorphically via names or stories [10].

To answer the question whether lifelike movements and an anthropomorphic framing might affect people’s empathy for a robot, Darling, Nandy and Breazeal [11] examined how humans respond to a simple robotic object when asked to strike it. As dependent variables they measured people’s hesitation to strike the robot and evaluated its relationship with their trait empathy. Results revealed that an anthropomorphic framing of robots (e.g. giving names and presenting character descriptions) affected people’s empathy and commitment towards the robot.

Most recently, Nijssen, Müller, Baaren and Paulus [15] conducted an online-experiment in which participants had to decide whether they would sacrifice an agent in moral dilemmas. The to-be-sacrificed agent was either a human, a human-like robot, or a machine-like robot. Results showed that machine-like robots were sacrificed more often compared to human-like robots and humans. These results are in line with those of Darling et al. [11] regarding the impact of anthropomorphism on the willingness to immolate robots. However, both experiments used experimental setups that differed substantially from real HRI applications. First, neither study included a de facto interaction between human and robot, as participants either just observed a real robot [11] or saw pictures of robots [15]. Second, the required decision in both experiments to destroy or sacrifice the robot represents a situation probably not appropriate in HRI, as most often a fruitful relationship is the overall aim of social HRI. Third, robots in both experiments represented extremes in terms of their morphology. While Darling and colleagues used Hexbug Nanos [11], bug-like crawling toys, Nijssen et al. [15] showed pictures of the robotic counterpart of Hiroshi Ishiguro, which represents an extremely anthropomorphic design.

In the attempt to bring the experimental setup closer to real applications, Onnasch and Roesler [16] conducted a laboratory study including a collaborative interaction with an embodied robot. Participants received either an anthropomorphic or a technical description of a humanoid NAO robot prior to a collaborative task. Afterwards, the willingness to “save” the robot from malfunctioning was assessed via donation behavior (for a pretended robot repair). In contrast to the aforementioned studies, results revealed a negative effect of anthropomorphism on the willingness to “save” the robot, as long as the robot’s functional value for the task fulfillment was not additionally mentioned. These studies already illustrate that experiments seemingly comparable at first glance might not lead to compatible results, because of different robots, contexts and interactions.

A more real-world and application-orientated implementation of anthropomorphism was examined by Kuz, Mayer, Müller and Schlick [12] and Mayer, Kuz and Schlick [13]. They investigated the impact of robot movement on HRI and revealed that anthropomorphic trajectories and speed profiles ease the prospective identification of robot actions. In contrast, Riek and colleagues [14] reported that action identification is faster with machine-like trajectories compared to smooth, human-like movement patterns. These studies, again with contrasting results, differed with respect to the robots applied in the experiments. While Kuz and colleagues [12, 13] used a (simulated) industrial assembly robot for their experimental setting, Riek et al. [14] applied a humanoid robot torso.

As the exemplified studies show, a distinct overall interpretation of experimental findings regarding the impact of anthropomorphism is not easy to draw. Although all of the discussed studies deal with anthropomorphism, the concept is operationalized on different dimensions (appearance, movement and framing), and robots are hardly comparable (see Table 1). Moreover, while attempting to compare the studies, it becomes obvious that the actual interaction between human and robot, as well as the definition of human’s role are often insufficiently described or receive not enough attention in the experimental setups. These variations and the lack of definitions complicate the comparison of results and the identification of relevant impact factors (e.g. What operationalization of anthropomorphism is most effective?). Consequently, insights remain on the level of single use cases. Therefore, the great variability of robots and their application scenarios are a main challenge for structured research, which is mandatory for knowledge accumulation and scientific progress in the HRI domain. To advance our understanding of HRI and allow experiments to be replicated, there is a need for more basic theories, tools and methodologies. In 2007, Dautenhahn has already called for an increase in effort to make research in HRI more comparable and she stated that “without a scientific culture of being able to replicate and confirm or refute other researchers’ findings, results will remain on the level of case studies” [17, p 700]. To ease the knowledge accumulation, a good blueprint is needed, i.e. a taxonomy with a certain level of abstraction to allow the classification of different HRI scenarios and enable a meta-interpretation of single-study results.

Table 1 Overview and comparison of different experimental HRI studies on the impact of anthropomorphism (robot illustrations are taken from the original studies or printed with according permission)

2 Existing HRI-Frameworks

Up to date, there are several frameworks regarding the classification of HRI. Granda, Kirkpatrick, Julien and Peterson proposed a standard for stages of HRI already in 1990 [18]. They propose five stages to categorize a robot’s functional capabilities in interaction with humans. In the first stage, the bounded autonomy (1), robots are able to function with little human intervention. Teleoperation (2) describes a human–robot system in which the human is an integral and full-time component in the control loop. When operating with supervised autonomy (3), the robot can handle certain (sub-) tasks autonomously allowing the human to perform other tasks, or even control other robots. Based on Sheridan’s supervisory control concept [19], in this third stage, the human is responsible for task planning, teaching the robot, monitoring robot activities, intervening in robotic tasks when necessary, and learning to improve all of these activities. With adaptive autonomy (4), the robot performs most of the tasks independently, whereas the human acts as an information manager, organizing the structure of expert systems and databases. In the final stage, the virtual symbiosis (5), the human and the robot exchange information based on shared knowledge. However, even with this symbiotic interaction, the ultimate responsibility is still on part of the human. A strength of this approach is that the stages can not only describe different levels of robot autonomy in a human–robot team, but also have predicted future directions of the development of HRI. The application of this model to actual scenarios, however, is not very straightforward as the stages are not mutually exclusive and often overlap with characteristics of one stage blurring into the next [18]. In particular, the supervised and adaptive autonomy merge smoothly into each other, as supervised control is already characterized by a variable span of autonomous control. It remains unclear which degree of autonomy represents the cut-off, differentiating the two categories. Additionally, the human role is described as an operator on all stages, even though the machine is characterized as a “co-worker” in the virtual symbiosis.

Twelve years later, Scholtz proposed a framework for differing human roles in interaction with robots regarding models of human–computer interaction [20]. According to Scholtz, the human may act as the supervisor of the robot, a role originally described in interaction with automated systems by Sheridan and Verplank [21]. In this passive role, the human is taken out-of-the-loop, i.e., monitoring the task performance from a superordinate position and only intervening if malfunctions or deviations occur. Further roles describe the human as an operator, directly giving control inputs to the robot, or as a mechanic or team partner—the latter implying equivalence between human–human and human–robot interactions. Scholtz [20] uses this team concept as a metaphor to underline the importance of a reciprocal adjustment between actions of the human and of the robot for task accomplishment. Moreover, she defines a non-interacting role, namely, the human as a bystander. In this context, human and robot share only the same space. Thus, a mental representation of the robot and its actions is necessary to avoid non-intended interactions (i.e. collisions). This role is especially important for the implementation of autonomous robots in close proximity of humans (without safety fences). The framework represents a sound basis for a detailed differentiation of varying human roles in interaction with robots. However, while the author states the importance of generalizing the roles between different domains, it is not considered that different roles are predominantly represented in specific domains (e.g. operator in space expedition, peer in edutainment). Furthermore, the human roles provide no particular information about the interaction itself.

This issue is addressed in the classification of Schmidtler and collegues [22], who define human–robot interaction types based on working time, workspace, aim and contact. Hence, they deduce three different forms of human–robot interactions with increasing proximity and dependency. Coexistence is characterized by overlapping working time and workspace of the human and the robot. Cooperation is additionally characterized by the same aim, whereas collaboration describes the most dependent form of interaction, with an actual contact between the human and the robot while they work together. Even though the interaction types focus on the comprehensive interaction of human and robot, a drawback is that they exclusively focus on HRI in the industrial environment and that the three interaction types might not be sufficient to describe HRI in detail (e.g. What about the robot’s autonomy?).

Other frameworks of HRI broaden the perspective as they do not only focus on the human and the interaction, but also on characteristics of the robot (e.g. information regarding the communication between human and robot). Yanco and Drury [23, 24], for instance, adopt Scholtz’ interaction roles [20] and further describe ten categories related to interaction and robot characteristics. However, the categorical specifications vary in detail. While some of these are very well described with predefined categorical levels and are easy to apply (robot morphology), other categories, like task type, are not further detailed and therefore hard to apply to actual HRI scenarios (autonomy level in percentage) or do not apply to different kinds of robots or interaction scenarios. The latter is especially true for the category decision support for operators (without predefined levels). Decision support as a cognitive task is not a primary aim of HRI as robots traditionally resume manual tasks, i.e. action implementation, whereas the cognitive parts remain with the human (in contrast to human-automation interaction [25]).

Further classification models often only account for certain aspects or domains of HRI [17, 18, 20, 22,23,24, 26,27,28,29,30,31,32,33] (for an overview see Table 2). Beer, Fisk and Rogers [27], for example, provide a very detailed framework regarding robot autonomy. However, the model is restricted to the essential aspect of autonomy and only applies to service robots. Therefore, the question if the model also applies to other HRI scenarios persists. Dautenhahn [28] differentiates a robot’s task as a persuasive machine (therapeutic playmate), a social mediator or a model social agent. The emphasis of this role definition is the robot’s function as a stimulator for (social) interactions, as the model was developed for HRI in the context of autism therapy. Moreover, Dautenhahn proposes four evaluation criteria to identify requirements for social skills of robots: the contact with humans, the robot’s functionalities, the role of the robot, and the requirements of social skills. For every criterion, the extremes are defined (e.g. contact with humans: none remote vs. repeated long-term physical), but other scale characteristics are missing (e.g. no further details on what and how many distinct intercepts are between the extremes [17]). Kahn and colleagues [31] propose to characterize HRI with nine evaluation benchmarks of success in building human-like robots (and to define essential features of being human), but remain on a conceptual level without further specification or guidance for an operationalization of those benchmarks.

Table 2 Overview of HRI framework models and focuses that form the basis of the proposed HRI taxonomy

Other approaches focus on the establishment of common metrics for HRI [34,35,36,37,38,39]. Steinfeld et al. [40], for instance, analyzed the interaction between humans and robots with regard to three aspects: the human, the robot and the system. The metrics are defined for five task categories: navigation, perception, management, manipulation and social tasks like tour guiding in a museum. A drawback of the proposed framework is its only focus on task-oriented mobile robots.

The outlined approaches represent a valuable effort to define a standardized set of dependent variables that should be evaluated in HRI experiments. What is not addressed is the actual interaction between human and robot.

To summarize, existing frameworks provide useful approaches to characterize and investigate HRI. While some frameworks are very detailed regarding the human role [20], other approaches provide sound guidance in the definition of the interaction between human and robot [e.g. 18, 22], or focus on the robot’s functions [17, 27]. The application of these models to various HRI scenarios, however, is not always easy or feasible. Sometimes, the models only account for a specific domain [e.g. autism therapy, 28], or they remain on a conceptual level without further specification or guidance for an operationalization [e.g. 23, 24, 31]. Therefore, a framework is needed which builds on these models, but also

  1. (1)

    takes into account the human, the robot, the interaction and the context of the HRI,

  2. (2)

    is applicable to various HRI scenarios and

  3. (3)

    provides predefined categories to enable structured comparisons of different HRI scenarios.

3 A New Interaction Taxonomy for HRI

The proposed HRI taxonomy is divided into three clusters specifying the HRI with different foci. In contrast to existing frameworks lacking generalizability, the first component cluster of the taxonomy allows a classification of the prevailing context of interaction, comprehensively for all domains, considering the field of application and the type of exposure regarding the human–robot interaction. After defining this macroscopic level, the characteristics of the robot can be specifically classified regarding task, morphology and autonomy. The actual team classification cluster is subdivided into human role, team composition, information about the communication channel, and proximity. The hierarchical structure of the taxonomy is an approach for top-down analyses of existing interactions from general circumstances to specific team characteristics, as well as an indicator for adapting and optimizing the design of HRI bottom-up. Moreover, the easy applicability is ensured by a graphical support highlighting the three different clusters and the related categories to specify an HRI scenario (see Fig. 1).

Fig. 1
figure 1

Graphical overview of the proposed taxonomy including the three category clusters: Interaction context (dark grey), robot (medium grey) and team classification (light grey)

The descriptive taxonomy aims to serve as a structural basis to classify various aspects of HRI. Not all categories are mutually exclusive and when applying the taxonomy to existing research, relevant information is often missing for particular categories. Nonetheless, the comparison of two studies based on the taxonomy can reveal differences in structural and functional characteristics [41], which serve as a basis to interpret varying outcomes.

The three taxonomy clusters are described in detail in the following subsections. Then, the broad applicability is illustrated first on fictional robots and then on a specific research question in HRI.

3.1 The Interaction Context Classification

The interaction context classification describes the first layer of the hierarchical structure of the HRI and aims to explicate the specific domain context. To enable a differentiated and complete domain classification, the category field of application extends the differentiation of industrial and service robots in the ISO 8373:2012 [42]. Examples for robots as service providers are professional cleaning robots for solar collectors or hoovering and lawn mowing robots for personal use. Another field of application are military and police robots, like unmanned aerial and ground vehicles which are already widely implemented in various tasks, like wildfire control, bomb disposals or search and rescue. Moreover, space exploration is added, as space robots need to fulfill special requirements, like surviving the rigors of the extraordinary environment and performing multiple (and unanticipated) tasks.

In addition to that, further specifications are incorporated to address the emerging domain of social robotics [43]. Accordingly, a robot can be applied as an assistant in a therapeutic setting. The robotic seal Paro is a well-known example for such an application. Paro is an advanced interactive robot that is an alternative for pet therapy in environments such as hospitals and extended care facilities, where pets are not allowed. Studies indicate the potential benefit of this robotic system for older adults and dementia patients [e.g. 44, 45].

Robots are also increasingly used in educational settings and research. For instance, Saerbeck et al. [46] introduced iCat, a robotic research platform, to support school children (aged 10–11 years) in remembering vocabulary when learning a language. Another example is provided by Hashimoto and colleagues [47], who implemented an android robot that collaboratively solved exercises with students in a Tokyo elementary school.

Moreover, there is the growing field of entertainment robots, such as robotic dogs or other robotic toys like Cozmo, which are also often used in research [e.g. 48, 49]. Regarding the use of robots in research, it is important to note that the categorization of the field of application depends on the actual use of the robot, not what the robot was originally designed for. Cozmo, for example, is originally thought of as an entertaining robotic toy for children. When this toy is used in research as an interaction partner for children with autistic spectrum disorder, Cozmo would have to be categorized as a therapeutic companion [48].

Additionally, there are a lot of studies that do not have such context information, e.g. pure perceptional studies. To cover those studies as well, the last category level is none.

The particular field of application often sets general conditions of specific characteristics of the human users (e.g. the homogeneity of manufacturing workers interacting with industrial robots), as well as properties of the interaction scenario (e.g. the unstructured interaction environment of a socially assistive robot).

Besides the field of application, the taxonomy further defines the kind of exposure to the robot. There is a growing body of research indicating that physically embodied robots are perceived differently than virtual two-dimensional agents and have different effects [e.g. 50,51,52]. Therefore, the taxonomy differentiates between an exposure to embodied versus depicted robots. Examples for the latter are virtual two-dimensional agents (on a computer screen), but also real robots that are only exposed to participants by showing them video clips or images of the robots. The core aspect that is addressed by these category levels is the embodiment of the robot, i.e. if the exposure enables an experience of corporality and a direct tactile interaction [51]. Moreover, this category also differentiates the setting, i.e. whether the exposure to a robot is realized in the wild (field) or in a controlled (and more artificial) laboratory setting, which could impact the perception and behavior towards a robot. A laboratory setting also includes online studies. Different settings can lead to different outcomes, e.g. Salter et al. [53] found that children enjoyed playing and were actively engaged with a robot under laboratory conditions, whereas a field experiment [54] with the same robot revealed that most children became successively bored by interacting with the robot. This surprising result was partially explained by a disappearing novelty effect and by the repeated exposure. Nonetheless, the unstructured environment additionally led to former unknown robotic problems like getting “stuck” without being able to free itself.

3.2 The Robot Classification

Besides the interaction itself, the previous mentioned contradicting results of Kuz et al. [12, 13] and Riek and colleagues [14] illustrate that the robot’s design and function have strong impact on the interaction of human and robot. Therefore, the next part of the taxonomy focuses on the robot’s work context and design with the three variables of robot task specification, degree of robot autonomy, and robot morphology.

The robot task specification describes eight abstract task types to allow a classification and standardized comparison of diverse tasks in various application domains. Based on the predominant human–robot interactions, we define the tasks as follows.

  • Information exchange this task describes the robot’s acquisition and analysis of information from the environment and the information transfer to the human. This task is often implemented when operating in hostile environments like the Mars mission or S&R missions.

  • Precision the robot performs tasks that require particularly filigree capabilities and are hard to perform for humans (e.g. robots for micro-invasive surgery like the daVinci system that suppresses the surgeon’s tremor).

  • Physical load reduction the robot resumes tasks to reduce the human’s physical workload (e.g. lifting, carrying or fixing actions). Representative for this kind of task allocation between robot and human is the use of powered exoskeletons that support the human to carry heavy loads during long missions or that allow paraplegic wearers to walk upright with little physical exertion.

  • Transport the robot is implemented to transport objects from one place to another (e.g. robots that carry parcels to different shelves in a warehouse, or robots carrying linen in hospitals from the patient rooms to the laundry).

  • Manipulation the robot physically modifies its environment (e.g. robots that perform welding actions on an object or pick&place robots).

  • Cognitive stimulation the robot’s aim is to engage the human on a cognitive level in the interaction through verbal or nonverbal communication. This task is often found in social HRI implemented in an educational setting like schools or kindergartens.

  • Emotional stimulation the robot aims at stimulating emotional expressions and reactions in an interaction. Examples for this kind of robot are the robot seal Paro or other pet-like robots.

  • Physical stimulation robots for physical stimulation are often used in a rehabilitation context. The hirob from KUKA Medical Robotics, for instance, automates the conventional hippotherapy by imitating the exact movements that a horse’s back does, while the horse is walking. Thereby, the robot enables an intensive therapeutic exercise to regain torso control and stability for patients with neurological deficits.

The robot morphology serves as another classification parameter for HRI, also proposed by Fong and colleagues [30] and Yanco and Drury [23]. From a psychological perspective, the morphology is of special interest as it shapes a user’s expectations of the robot’s functioning, as well as communication style and modalities. Therefore, to create an intuitive interaction with a robot, its design can be used to activate associations with already known objects [55]. Humanoid robots represent a typical example of this. The more human-like a robot appears, the more a user will expect an intuitive communication via natural speech. Moreover, human likeness could raise expectations regarding the robot competence, knowledge and autonomy [56].

Therefore, the differentiation of robot morphology (anthropomorphic, zoomorphic, technical) represents a fundamental determinant for HRI, especially for first-contact interactions with robots. In line with Yanco and Dury [23] and in contrast to Fong and colleagues [30], this taxonomy elides a cartoon-like or caricaturized category, as it is not distinctly adaptable and based on either of the other categories. Taking a closer look at the robots in Table 1, this problem becomes obvious, as for instance the Hexbug Nano used by Darling and colleagues [11] is a caricaturized cockroach and therefore coincidentally zoomorphic. The NAO robot used by Onnasch and Roesler [16] is a small caricaturized human and therefore anthropomorphic.

However, this categorization of morphology goes beyond appearance, and therefore deviates from existing classifications [23, 30], as it further subdivides robot morphology into four dimensions: appearance (How does it look like?), communication (e.g. communication by speech, written input/output, signals, non-verbal communication, gestures; implicit or explicit communication), movement (e.g. joint/smooth, functional movement) or context information (e.g. framing). On each dimension, robot morphology can be classified as anthropomorphic, zoomorphic or technical. To perceive a robot as being humanoid (anthropomorphic design), not every detail has to be human-like, i.e. it might be already sufficient if the robot has a human-like body and a head, even though legs and feet are missing, or if the communication style and context is anthropomorphic. However, the appearance might be more task driven and therefore technical. Same accounts for a zoomorphic design, where a typical pet name like “Kitty” or “Buster” and certain sounds might lead to a zoomorphic perception of the robot even though the appearance might be more like a “box”. With a technical robotic appearance, framing, movement and communication can fundamentally change the way it is perceived. As mentioned before, the manifestations of all three classifications on the four dimensions can be assigned without the total transfer of every detail of anthropomorphic or zoomorphic characteristics or even as stated beforehand in a caricaturized manner. As the manifestations of the technical, zoomorphic and anthropomorphic categories are gradual, we advise a differentiated classification of each aspect (appearance, communication style, movement, and context). However, current frequently used questionnaires like the GOODSPEED [57] and the revised GODSPEED [58] questionnaire are not appropriate for this purpose, as they mainly focus on the development of uncanny valley indices. Furthermore, the semantic differentials distinguish between unlively technical aspects and natural human aspects, like consciousness or being alive. Even though robotic design utilizes the human tendency to anthropomorphize robots [59], it is not assumed that humans perceive clearly non-living objects as holding uniquely human characteristics and as being actually alive. In addition, the animal-likeness (zoomorphism), which is especially important for social HRI, cannot be measured with these instruments. For these reasons, we cannot suggest an existing questionnaire to rate the robots’ morphology. Future research should develop a questionnaire that can reliably measure the degree to which a robot’s morphology is technical, zoomorphic and anthropomorphic.

The degree of robot autonomy represents another classification variable which defines the need for human intervention during interaction. In this context, Beer and colleagues [27] define the level of autonomy with regard to perception, action planning and action implementation. However, they do not consider information analysis and aggregation as a separate task that could be performed by either human or robot. Thus, we propose to classify robot autonomy according to models applied in human-automation interaction, referring to Wickens, Hollands, Banbury and Parasuraman [60]. Accordingly, the degree of robot autonomy is subdivided into four stages: information acquisition, information analysis, action selection and action implementation. For each stage, the level of robot autonomy can vary from low/none to high/complete, therefore implying the level of human intervention and which parts of a task concern the human at the same time. This way of classifying autonomy is based on a well-established framework by Parasuraman, Sheridan and Wickens [25] for stages and levels of automation (equivalent to autonomy in this case, see Fig. 2). Based on this original work [25], the categorization is made on a 10-point scale with higher scores representing higher autonomy of the technology over human action. Due to practical reasons of a synoptic taxonomy with intuitive applicability, we suggest a threefold division into low, medium and high autonomy.

Fig. 2
figure 2

Stages and Levels of automation [18]. Two automated systems are depicted, system A representing higher levels of automation and system B lower automation

In sum, the three variables of robot task specification, robot morphology and degree of robot autonomy enable the systematic comparison of different robots and their effectiveness. These variables form a basis for design guidelines for beneficial HRI.

3.3 The Team Classification

Applying variables of team classification to HRI scenarios characterizes the structure of interaction, aspects of composition and teamwork that are addressed with four variables: human role, team composition, communication channel and proximity (physical and temporal).

The human role is based on Scholtz’ role descriptions [20], except the role as a mechanic. This role does not represent an actual interaction of human and robot, but rather an action of the human on the robot in terms of repair and maintenance work on a functional level (hardware and software). Instead, based on Schmidtler et al. [22], the peer role is further differentiated in cooperator and collaborator to depict different levels of interaction on the same hierarchical level as the robot. According to this, five distinct human roles are defined: the supervisor monitors the robot and gives instructions on how to perform the task. The operator controls the robot. This role is an extension of Scholtz’ definition, as the operator not only “is called upon to modify internal software or models when the robot behavior is not acceptable” [20, p 10–11], but explicitly is on a higher hierarchical level as the robot can be directly controlled by the human. For example, operating a bomb disposal robot can be described as this role: human and robot explicitly work together. Both parties pursue the same overall goal as well as the same sub goals (“approach bomb, fire a high-pressure jet of water at the wires, …). This operation is characterized by synergy effects. Neither the human nor the robot could do the task alone, as it is too dangerous for the human and too complex for the robot (up to now). Other examples for a human operator are the interaction with the surgical daVinci robot, which is directly operated by the surgeon but at the same time controls for the surgeon’s tremor, or exoskeletons worn by humans. In these scenarios, the human role is always higher in hierarchy compared to the robot (which is controlled).

The collaborator has the same sub- and overall goals as the robot, but is dependent on the robot’s actions; they work together for a joint task completion. As a collaborator the human has no managerial responsibility. This role can be thought of as a teammate on the same hierarchical level. An example is an industrial robot which is holding and rotating heavy workpieces to support the simultaneous and continuous work of the human. In this team composition there is no hierarchical difference between the team partners.

The cooperator also works with the robot to fulfil a shared overall goal. However, both partners do not directly depend on each other, because of a strict task allocation between human and robot. However, task completion from both parties is still needed to fulfil a shared overall goal. The application of pick and place robots in manufacturing often represents an example for this human role. Robot and human are co-workers, they work together (but on their own) at an assembly line pursuing a shared goal—the production of a specific product.

In the role of a bystander, the human does not interact with the robot but shares the same space. Therefore, even this human role requires a mental representation of the robot and its actions to avoid collisions. The aim of the human role is avoidance. An example of a human as a bystander is the interaction of visitors and a transport robot in a hospital setting. Both parties do not pursue the same goal, the robot’s goal might be to transport laundry from one place to another, the visitors’ goal is to visit someone in the hospital. They still have to short-term coordinate their actions to avoid a possible collision.

Another variable, which received little attention in previous classification approaches, is team composition. Essentially, there are three possible options: equal number of humans and robots (NH = NR), more humans than robots (NH > NR) and more robots than humans (NH < NR). In order to ensure comparability between different HRI scenarios, it might be reasonable to further specify the concrete number of humans and robots when applying the taxonomy. For instance, an interaction scenario consisting of one human and two robots is very different compared to a scenario with one human and 100 robots like it is the case when controlling robot swarms.

The communication channel is further subdivided into input and output to highlight the interactional component of human–robot teamwork. Input describes how information from the human is “perceived” by the robot, i.e. how humans can provide information to the robot. This can be done either using an electronic (e.g. remote control via control device), a mechanical (e.g. the kinematic movement of the robot arm), an acoustical (e.g. verbal commands) or an optical channel (e.g. gesture control). The robot’s output can be perceived by the human senses through a tactile communication channel (e.g. haptic using vibrations), an acoustical (e.g. animal sounds) and a visual channel (e.g. artificial eye movements), respectively.

This communication takes place in the dimensions of spatial and temporal proximity. The physical proximity describes the distance between the human’s and the robot’s work area. Huttenrauch and Eklundh [61] propose five categories that are extended with the category “none” by Yanco and Drury [23], which is also applied in this taxonomy. The category “none” is especially important for a remotely controlled HRI (e.g. the Mars rover Curiosity) which is represented in our taxonomy through the combination of human role = supervisor, and physical proximity = none. The categories are defined as follows.

  • Following human and robot have stable physical contact over a prolonged time. This is realized with specific interfaces at the robot (e.g. joystick, force/torque sensors) or using/manipulating the objects held by the robot.

  • Touching human and robot share the same workspace. They directly interact in close proximity. Human and robot have physical contact.

  • Approaching human and robot work in the same space. There is no physical contact but both parties work closely together.

  • Passing the human’s and robot’s workspaces partly or completely overlap. However, contact is prevented.

  • Avoiding human and robot do not work in striking distance and avoid direct contact. The human tries to stay out of the robot’s operating distance.

  • None human and robot are not in the same working environment.

Adjacent to the physical proximity, we further describe the temporal proximity between human and robot, which is either synchronous or asynchronous. Synchronous implies that human and robot are working at the same time. In an asynchronous interaction, human and robot are working at different times, like in the case of shift working.

3.4 Graphical Support for Classifying HRI Scenarios in Practice

To ease the application of the HRI taxonomy to actual scenarios, we offer a compact canvas-representation including all variables with according categories (see Fig. 1). The idea of this kind of graphical representation stems from the business and management domain and was first introduced as the Business Model Canvas by Osterwalder [62]. In its original use, the canvas representation is a template for developing new or documenting existing business models. The canvas lists and combines all elements describing a company’s or a product’s value proposition, infrastructure, customers and finances, and therefore provides a very condensed and scalable overview of complex circumstances.

When applied in the context of HRI, at least one category should be selected per variable. Whereas it might be appropriate to select more than one category for some variables there are also categories that are mutually exclusive. These are exposure to (interaction context classification), morphology, autonomy (robot classification), and human role (team classification). The graphical support depicts the hierarchical structure of the taxonomy. In the upper right, the two variables defining the interaction classification are depicted in dark grey. The variables concerning the robot are organized in middle grey, and the team classification variables are grouped in light grey in the lower part of the canvas template. The left part of the canvas is a placeholder for a description of the specific HRI categorized using the canvas. Information should be provided about the robot, the robot’s task, as well as the specific HRI. To further ease the application of the taxonomy we provide an interactive canvas pdf that can be downloaded at: https://hu.berlin/ingpsy

4 Applying the Taxonomy

4.1 Is interacting with R2-D2 comparable to interacting with C-3PO?

To exemplify the use and the benefits of the proposed taxonomy, we first apply the taxonomy to two very popular robots: R2-D2 and C-3PO. Even though both robots are fictional, we chose them to present a comprehensible example that most people can relate to and that is easy to understand. Kahn [63] conducted interviews regarding attitudes towards intelligent service robots and revealed that most people would prefer interacting with R2-D2 because this robot seems to be more likeable than C-3PO. To answer the question of why people prefer R2-D2 over C-3PO, we categorize both robots using the canvas template (see Figs. 3, chosen specifications per variable are written in italic bold).

Fig. 3
figure 3

Application of the HRI taxonomy for R2-D2 and C-3PO using the canvas representation. Chosen specifications per variable are written in italic bold

R2-D2 is an astromech droid developed for navigational and maintenance tasks. The R2-class of robots is common in the Star Wars universe and they are part of many starships. R2-D2 is able to resume the astronavigation of a starship. Moreover, it executes diverse tasks as a mechanic. The field of application is therefore service, as these robots support their pilots in a professional environment in the aforementioned tasks. The exposure of the human to the robot is embodied and takes place in the field.

The robot’s tasks are defined as precision and manipulation. Navigating the spaceship requires precise inputs and interventions for the most efficient and safe route. Even nowadays, the navigation and flying of commercial aircrafts is mainly supported or resumed by automated systems as these outperform human pilots. Moreover, as a mechanic, R2-D2 directly manipulates its environment. Regarding R2-D2’s morphology, a differentiated perspective is needed. R2-D2’s appearance and movement are clearly technical. The communication was designed in a technical style in the first place because R2-D2 communicates with other robots. However, in communication with humans, it recalls zoomorphic associations as information is not only explicated, but also what might be interpreted as mood or feelings through changing pitch. Based on context information on how people talk about R2-D2 and what information is shared about it (e.g. what adventures it has been part of), the robot context is classified as anthropomorphic. It is seen as a real teammate, it attends mission briefings and is seen as a valuable partner because of its experience and character. As the earlier description of R2-D2’s task shows, its autonomy is high on every level: information acquisition and analysis, action selection and action implementation.

The human role is that of a supervisor who monitors the robot and gives instructions, and the team composition typically consists of a pilot and R2-D2, therefore NH = NR. R2-D2 receives information from the human using mainly the acoustic channel (e.g. speech input), and output can be perceived by the human through the acoustic (e.g. beeps) and visual channel (e.g. projections). The physical proximity is defined as passing or none. R2-D2’s and the human’s workspace partly or completely overlap, e.g. during maintenance work. When navigating, R2-D2 is physically separated from the pilot, they are not sharing the same working environment. Temporal proximity can be either synchronous or asynchronous.

C-3PO is a protocol droid. The 3PO units are specialized on language, translation and diplomacy and can be found on nearly every planet and every starship in the galaxy. They work for kings and queens, senators or business professionals. They not only translate between different languages but also act as robot-human interpreters. The field of application is service and the interaction represents an embodied exposure in the field. Acting as an interpreter, C-3PO’s task is categorized as information exchange. Like R2-D2, C-3PO is highly autonomous on all levels of autonomy but varies significantly regarding the robot’s morphology, which is anthropomorphic in appearance, communication, movement and context.

The human role is categorized as a collaborator because the human is dependent on C-3PO’s actions, they work together for a joint task completion: the human having a conversation and creating content, the robot translating what is said between the two parties. The team composition is NH < NR or NH > NR, as the HRI scenario typically includes either two humans and C-3PO, or one human, one robot and C-3PO. Regarding the hierarchical level, one could argue that the human is on a higher level (i.e. an operator). However, the relation between the two agents can be seen as the relation between a human interpreter and his/her customer. C-3PO can do the translation or refuse to do so. Moreover, the robot decides what and how to translate. For example, bad language is not translated by C-3PO which then tries to rephrase into acceptable terms or just refuses to translate. Information can be passed from human to C-3PO using an acoustic (e.g. speech) or an optical channel (e.g. written text), and output is given by the robot using acoustic and visual communication (gestures). The physical proximity can be described as approaching, as C-3PO and the human for which it is interpreting share the same space, working closely together but with no physical contact. Most of the time, the temporal proximity is synchronous but as it is possible to send C-3PO to deliver messages, the proximity might be asynchronous as well, which again illustrates the dependency on the specific task for categorizing robots.

By applying the taxonomy to both robots, we can now try to find variables of HRI that might cause the perception that R2-D2 is more likeable than C-3PO (although this is a post hoc explanation that has to be handled with caution). The comparison of the two canvas representations reveals the similarities and differences of both HRI scenarios. While the field of application, exposure, robot autonomy, communication channel and temporal proximity are comparable, there are also remarkable differences. First of all, the robots differ regarding the task for which they were developed. Moreover, the human role is different. In interaction with C-3PO, the human is a collaborator and dependent on the robot’s actions. For the human, this might also imply a (perceived) shared responsibility entailing higher expectations of the robot’s ability. Working with R2-D2, the human is not directly involved in the task fulfillment but has to supervise the robot, i.e., the action implementation is completely resumed by the robot, whereas responsibility remains with the human. Together with R2-D2’s morphology combining functional, zoomorphic and humanoid aspects, this might foster the caregiver effect [64, 65]. When we interact with a machine that presents itself as dependent, e.g. has to be supervised and recalls associations with a pet, we have the tendency to nurture this machine, which in turn creates significant social attachments. This might explain why most people would be more willing to interact with R2-D2. As R2-D2 is highly autonomous and often meets bystanders when navigating to its workplace as a mechanic, the chosen morphology might help to enhance its acceptance: it does not look dangerous or scary, is smaller than a human adult and with its technical appearance it does not look very complex. In contrast, C-3PO was designed as a humanoid robot on all levels: appearance, communication, movement and context. This design supports notions of seriousness, expertise and trustworthiness, especially as C-3PO looks like an adult, is tall and has no scheme of childlike characteristics. Therefore, the chosen morphology is fitting for the human role, C-3PO’s ability and task (interpreting in a business or political context).

In summary, applying the taxonomy explicates that interacting with R2-D2 is fundamentally different from interacting with C-3PO. Based on the detailed categorization and a post hoc analysis of both robots, a hypothesis can be proposed on the reasons why people think R2-D2 is more likeable: the likability of a robot is dependent on its morphology and the accompanying expectations, as well as the human role in interaction with the robot.

4.2 Anthropomorphism in HRI: Different Results or Different Scenarios?

The application to fictional HRI aimed to demonstrate how to apply the taxonomy by using examples most people are familiar with. In a second application, we refer to the set of studies on anthropomorphic movement discussed in the Introduction. While Kuz et al. [12] have demonstrated that anthropomorphic trajectories and speed profiles ease the prospective identification of robot actions, Riek’s research group [14] reports that action identification is faster with machine-like trajectories compared to smooth, human-like movement patterns. To disentangle possible reasons for these different results, the taxonomy is applied to both experimental setups (see Fig. 4).

Fig. 4
figure 4

Application of the HRI taxonomy for the robots used by Kuz et al. [12] and Riek et al. [14] using the canvas representation. Chosen specifications per variable are written in italic bold

Kuz and colleagues [12, 13] used a virtual simulation environment with a single-arm assembly robot and its workplace. The robot’s task was to place an object to certain fields on a black and white grid. The experimental setup and the simulated robot represent a scenario of an industrial application, and participants were exposed to a depicted robot in a laboratory setting. The robot’s task of placing objects is classified as a manipulation. The morphology regarding appearance is described as technical, and there was no communication. The movement, however, was the independent variable of the experiment and implemented as either technical or anthropomorphic. No additional information was given regarding the context. The overall degree of robot autonomy for the experimental setup can be described as high because from the participants’ perspective the robot was performing completely autonomously without any external impact. During experimental trials, participants had to observe the robot and predict the robot arm’s end position, i.e. where the robot was going to place the object. Consequently, the human role was that of a bystander with no control or supervisory task. The team (if this term is appropriate at all for the experimental scenario) consisted of one human and one robot, therefore NH= NR. The communication channels for input might be electronic (not specified in the experimental description), and the output was visual. The temporal proximity was synchronous, but there was no physical proximity as participants were just passively observing a simulated robot at its workplace.

The application of the taxonomy to the HRI study of Riek et al. [14] reveals the very artificial character of the chosen interaction scenario. Because of the abstraction level, the field of application cannot be specified and is therefore categorized as none. Participants were exposed to a depicted robot (videos) within a laboratory setting. The robot’s only task in the experiment is described as gesturing, i.e. an explicit non-verbal communication and thus can be categorized as information exchange. The robot’s morphology regarding its appearance and communication style (explicit non-verbal communication) was anthropomorphic. No context information was provided to participants. Again, the robot’s movement was the independent variable and implemented as either technical or anthropomorphic. Comparable to Kuz et al. [12], the overall degree of robot autonomy can be described as high because the robot seemed to perform completely autonomously from a participant’s perspective. The human role was that of a bystander as human and robot were not working together, nor had a shared goal, but the scenario implied some basic communication affordances. With an interaction of one robot and one participant at a time, the team composition is defined as NH= NR. Based on the given information in the experimental description, the communication input channel might be electronic. However, as participants had to interpret the robot’s gestures, it might also be possible that they assumed a reciprocity in this kind of communication, i.e. an optical input. This is not discussed in the original paper. The robot communicated with participants using gestures which resembles a visual communication output. The proximity is classified as being synchronous in time and none regarding the physical proximity (because of using video material).

The application of the taxonomy reveals similarities and differences between the two studies. The interaction context and exposure setting are categorized similarly and there are no differences within the taxonomy part of team characteristics. The categorization of degree of robot autonomy is also comparable in both studies. However, there are also at least three aspects in which the studies differ essentially: the field of application, the robot task specification and the morphology. These differences are important when interpreting results. Both studies compare a robot’s movement being anthropomorphic or technical. However, while in Kuz et al.’s [12] experiment the robot’s movement is a means to an end, i.e. the movement is needed in order to fulfill the task of manipulation, the robot movement in the study of Riek et al. [14] is the central variable, as this actually is the robot’s task: information exchange via gestures. Moreover, the robots differ greatly regarding their appearance (technical vs. anthropomorphic design). These differences might explain the contradicting results as they show that even though both focused on a robot’s movement, they, in fact, did research on completely different HRI scenarios which cannot be assessed as equal, since movement had dissimilar functions.

So, to answer the question: “Different results or different HRI scenarios?”, the taxonomy illustrates that a comparison of the two study results would resemble a comparison of apples and oranges because, in fact, these are completely different scenarios.

5 Outlook

The exemplified application of the taxonomy to both fictional and real examples shows its strength by enabling a structured comparison of very different HRIs revealing similarities and differences. Hence, the taxonomy has a broad practical applicability to top-down classifications of existing HRI scenarios which can be the basis for precise interventions in order to optimize HRI. Moreover, the taxonomy serves as a basis to decide whether studies are comparable and if an overall interpretation is valid. Additionally, the taxonomy enables the identification of dimensions on which HRI studies differ. This is the foundation for more generalized insights, as such structured comparisons and categorizations allow meta-analyses. There already exists a rich and rising research body. However, the interaction context, the robot and the team characteristics of the individual studies differ considerably. Are implications of results obtained in studies with a therapeutic zoomorphic robot still valid for studies utilizing a single-arm industrial robot in a manufacturing domain? The generalizability of insights from single studies is always problematic. The proposed descriptive HRI taxonomy facilitates HRI research, as it illustrates similarities and highlights differences of actual scenarios. Furthermore, the taxonomy offers a framework for experimental study setups, as the predefined categories propose a structured graduation on each HRI-dimension. By following such structured research actions, it will be possible to tell which detailed aspects of HRI might have a beneficial impact to enable a safe and efficient teamwork between humans and robots.