1 Introduction

Animals are kept for a variety of reasons including as companions for children and adults and for Animal Assisted Interventions (‘AAI’). AAI is a term given to planned, goal-directed activities that involve the use of animals as therapeutic adjuncts for the benefit of the human recipient, and includes both Animal Assisted Activities (AAA) and Animal Assisted Therapies (AAT) [1]. Studies have indicated that AAI can be beneficial in increasing feelings of psychological wellbeing, decreasing stress and anxiety, and increasing motivation [2]. Dogs are the most commonly used species in AAI due to their training potential and generally social nature [3], but there is a growing awareness of potentially negative impacts on the welfare of dogs in AAI [4]. Indicators of stress and anxiety in the dogs are often missed by pet dog owners [5] and in AAI [6,7,8] which could lead to chronic welfare issues. Further concerns exist about cultural diversity in attitudes towards animals, the transmission of zoonotic diseases, triggering of human allergies, and liabilities issues [9].

One way to ameliorate the welfare concerns associated with pet dogs and therapy dogs is the use of social robots in Robot Assisted Interventions (RAI). Like AAI, the aim is to increase human health and psychological wellbeing [10, 11]. Often these social robots are used with clinical populations: “Paro” the robotic seal has been used as a companion for the elderly [12], whilst other social robots have been used for rehabilitation after a stroke or injury [13], to teach words to young children [14], or to facilitate play, social communication and learning in autistic children [15]. Social robots may be advantageous in that they can be thoroughly cleaned, are able to work for longer periods of time, and may be more cost effective [16]. Though early tests have found social robots to be as effective as therapy dogs, their use requires further investigation [17]. People’s attitudes have been shown to be generally more negative towards robots than dogs [18] and it is currently unclear whether our social cognition is sufficiently flexible to fully accept social robots as pets or therapeutic aids [19].

Despite the increasing popularity of dogs in AAI and robots in RAI, the underpinning mechanisms of their effects remain unclear. A major theory purported to explain the outcomes of AAI is the Biophilia Hypothesis [20], which states that humans have an innate attraction to life and lifelike processes and therefore derive benefits from interacting with the other animals and the natural world. Another proposed mechanism of action in AAI is social interaction. This suggests the highly interactive and responsive nature of non-human animals is key to the positive outcomes in AAI [21,22,23,24].

Social robots are becoming increasingly biomimetic; complex, interactive, responsive and life-like [25], facilitating research into both of these proposed mechanisms by comparing the living animal with social robots and other, less realistic, controls [26]. They can be used as “interactive probes” (p. 2295) to dismantle and assess the subtle underlying sensory and motor mechanisms in human–human and -animal interactions [27]. However, participants’ responses may be mediated by individual differences, such as familiarity with dogs and robots [28, 29] or animistic beliefs. The extent to which people attribute worth, mental capacities, biological properties, emotions, social rapport, and moral standing to animals and robots (i.e. hold animistic beliefs) are likely to influence their enjoyment and evaluations of the experience and, thereby, impact any observed effects of AAI and RAI [30, 31].

Therefore, we aimed to explore the potential effects of social interaction levels and biophilic beliefs on participant evaluations of two potential therapy adjuncts, an animal and a robot, specifically a dog and biomimetic robot the MiRo-E. These are referred to in this paper as TD and TR (Therapy Dog and Therapy Robot) and collectively as TAaR (Therapy Animal and Robot). To assess underpinning mechanisms, the MiRo-E was used as a comparison condition to the therapy dog condition, due to its novelty for participants, biomimetic quality and its proposed suitability for use in RAI [32]. Additionally, the repeated measures design reduced baseline variability and the controlled setting addressed limitations in the confounding methodology of previous AAI studies [33]. In this study, children aged 11–12 years engaged in free-play sessions with both the dog and MiRo-E. We compared participants’ evaluations of the two experiences and, based on the biophilia hypothesis, predicted that participants would evaluate the experimental TD condition more favourably than the TR condition. We also predicted this would be mediated by animistic beliefs, with participants attributing higher levels of animism, emotion and mental capacity to the TD and TR providing more positive evaluations of both conditions. Finally, we compared participants’ behavioural interactions with TD and TR, predicting that sessions with greater social interaction would be evaluated more favourably than sessions with low social interaction within TD and TR conditions, as suggested by the social interaction theory [22].

2 Methods

2.1 Participants

Thirty-four individuals participated in the study: 18 males and 16 females, aged 11–12 years (M = 11.64, SD = 0.49). The participants were a volunteer sample from a Year 7 cohort in a mainstream secondary school in West Sussex, UK. Written consent to test in the school was obtained from the school’s headteacher. Information letters were sent to parents. Pupils who agreed to participate and whose parents had given signed consent, were recruited. Since participants were volunteers, no additional rewards or motivations to participate were necessary.

2.2 The ‘Therapy Robot’ (TR)

The robot used was MiRo-E (Fig. 1), a biomimetic robot designed by Consequential Robotics for use in education and in human–robot interaction research (https://consequentialrobotics.com/). MiRo-E’s mammal-like appearance and behaviour was designed to be appealing to the user [34] and its control system is modelled on animal brains, with three layers of increasingly sophisticated processing (from simple reflex-like behaviours to high-level loops that can implement cognitive competencies) [29, 30]. This results in competing influences on displayed behaviour for the robot, which means that its actions are not fully contingent on the human’s inputs, akin to interacting with a real animal. The robot contains auditory, tactile and visual sensors, moves freely, can move its head, open and shut its eyelids and turn its ears. The MiRo-E reacts to noise and movement from people and objects (such as squeaky toys) by following them with its eye cameras, orienting to, and moving towards them [35]. To indicate its ‘mood’, MiRo-E can emote by wagging its tail and changing the colour of lights on the lateral body panels (green for happy, white for neutral, red for angry, and orange when asleep) [37]. Positive mood can be induced by positive social touch, and negative mood by rough handling. A speaker produces chirping noises, the pitch and frequency of which indicates mood (faster and higher pitched: a good mood, lower pitched: angry, and slow ‘snoring’: asleep). After a cycle of approximately five minutes, sleep mode is activated, indicated by the robot closing its eyes, colour change of lights to orange and becoming unresponsive to interaction. MiRo-E wakes up after a period of three minutes. Although this sleep function could be turned off, it was left enabled, as it was deemed more comparable with the living dogs who may choose to lay down and rest rather than interact all the time.

Fig. 1
figure 1

Therapy dogs and robot used in this study. From left to right: Jack RussellxPoodle (TD), MiRo-E (TR), Labrador (TD)

2.3 The Therapy Dogs (TD)

A small three-year-old Jack RussellxPoodle (tri-coloured) and a medium-sized 12-year-old Labrador Retriever (Fox Red) were included in the study (Fig. 1). Both dogs were registered as qualified therapy dogs with Pets as Therapy (a UK charity that provides therapeutic visits to pedagogic and healthcare institutions [38]) and were experienced at visiting a range of settings and meeting unfamiliar people. Two dogs were used in order to make the results more generalisable to different breeds and sizes of dog.

2.4 Ethics

Ethical approval was attained from the University of Southampton’s Faculty of Social Sciences’ Human and Animal Ethics Committee (ERGO 49069) and the study adhered to British Psychological Society guidelines and the Society for Companion Animal Studies’ Animal Assisted Interventions’ Code of Practice [39]. All potential participants and their parents/guardians were given an information sheet describing the study and stipulating that participants could not be allergic to or fearful of dogs. Consent was attained from both parents/guardians and the recruited participants. The dogs were used for a maximum of two hours per day, with a half hour break after the first hour and 5-min breaks between each participant. The dogs had access to water throughout the sessions. The researcher, OB, who was familiar with the UK Government Department for Environment, Food and Rural Affairs (DEFRA) guidance on dog-specific signs of stress, monitored the behaviour of both dog and child, and could terminate sessions if stress occurred. No sessions were terminated early.

2.5 Procedure

The study comprised 4 separate sessions: Introductory session, Test sessions 1 & 2, and Preference session. A summary of the procedure is given in Table 1. An empty classroom in the school, unfamiliar to the participants, was used for the introductory and test sessions. For the test sessions, a rectangle, measuring 5 m by 4 m and divided into 1 m squares, was marked on the floor with masking tape as indicated in Fig. 2, to enable measuring the proximity of the participant to TD/TR. Two GoPro wide angle cameras were set up one metre away from the edge of the grid, perpendicular to each other to capture interactions from front and side view. Two adults were present for the test sessions, a ‘handler’ and an observer. Each session began with the participant, handler and TD/TR positioned as indicated. Within arm’s reach of the participant were three dog toys, which the participants were informed they could use to facilitate interaction with the TAaRs: a 10 cm-tall furry bear that squeaked when squeezed, a 10 cm-tall squid toy that also squeaked when squeezed or bounced, and a pink fluffy pig that rattled when shaken (all Kong brand). As noted above, MiRo-E can react to noise and movement from objects, such as squeaky dog toys, by following them with its eye cameras, orienting to, and moving towards them [35]. Since the toys were present in both conditions, the novelty of the toys to the participants was counterbalanced across conditions.

Table 1 Timeline of experimental procedure
Fig. 2
figure 2

Layout of test room

2.5.1 Introductory Session

This familiarisation session occurred a week before the test sessions and involved the MiRo robot (TR) and one therapy dog. The session lasted 30 min and involved small groups of 10 or fewer, seated participants. The researcher OB introduced the TR and Jack Russell x Poodle TD to the group of participants, explaining they may be meeting one of two TDs. Participants were also informed of a number of different behaviours that the TAaRs might exhibit and ideas of how they may interact with the TAaR, such playing with toys, performing tricks and petting the TAaRs. These behaviours were discussed in order to familiarise the participants with the TAaRs and to facilitate interaction during the free play session. This was important because participants were unlikely to have known the capabilities of the TR. OB then facilitated introductions with each seated participant by taking the TAaRs round to each individual and asking a neutral question about whether they would like to greet the TAaR. Finally, the participants were told about the meaning of the different emotion words, which were to be used in the Emotion Word Checklist [40] later in the study (see Measures). Finally, participants were informed when the test sessions would occur.

2.5.2 Pre-interaction Measures, Free-Play Interaction, and Post-interaction Measures

Participants had been randomly divided into two groups, to counterbalance TD first or TR first and thus control for order effects. Test sessions were conducted with each participant individually interacting with one of the TDs and the TR on consecutive days (days 2 and 3). Both sessions comprised 3 sections: pre-interaction measures, free-play interaction and post-interaction measures.

  1. (i)

    Pre-interaction Measures

The participant completed an online Google Form questionnaire on either an iPad Mini or laptop PC. This took place in the space outside the experiment room. The questionnaire provided demographic and baseline data. Demographic questions asked about the participant’s age, gender, and whether they had dogs, other pets or robots at home to test for existing familiarity, and were included as potential predictors in GLMM models. This was then followed by the Emotional Word Checklist, and the relevant TD or TR versions of the Belief in Animal Mind (BAM) scale and the Trait Animacy Scale (TASc). The participant then entered the experimental room for the free-play interaction.

  1. (ii)

    Free-Play Interactions

Once the questionnaires were complete, the participant was invited into the room to the centre of the grid, where TD/TR and handler had been positioned. The researcher told the participant that they would have five minutes to spend with the TD/TR, but they could stop the time early if they wanted. The researcher started the cameras and the stopwatch, informing the participant at the start and end of the time. The handler was positioned to the side and slightly behind the pet, kneeling or sitting on the ground. The handler was present for both TAaR sessions to monitor the TAaR’s safety and wellbeing of TD and replicate the presence of a TD handler in real-world AAI activities. The handler could give short replies to any questions from participants so that the participant was not confused by a completely unresponsive adult, but neither started conversations nor reacted to the interaction. Neither TD nor TR were restrained with a lead during the interaction and could move about freely.

  1. (iii)

    Post-session Test Measures

Following the 5-min free-play session the participant was taken back out of the room and given a second questionnaire in the lobby to evaluate their experience. This comprised the Perceived Attributes of Non-Humans (PAN-H) scale, the Emotion Word Checklist and the Enjoyment ratings respectively.

2.5.3 Preference Session

The day after the participants had completed both conditions, OB met them as a group in their normal tutor classroom. They were given a printed single forced-choice question asking students to circle the answer in response to the statement “I preferred spending time with the: Dog/Robot/I don’t know” to state their preference of TAaR and to provide a short explanation of their answer, to be completed without discussion with peers.

2.6 Measures and Analysis Rationale

The measures used in this study were questionnaires and observations.

2.6.1 Questionnaires

The following standardised and customised questionnaires were used. They are reported in the order in which they were initially presented to the participants (see Table 1). Customised questionnaires are provided in Online Resource 1.

To assess comparability of the two TDs on questionnaire results (Enjoyment Qs 1–3, BAM, TASc, PAN-H) and behavioural measures, Mann–Whitney U was used.

  1. (i)

    Demographics

This comprised 4 questions: age, gender (male, female, other, do not want to say); “Do you have a pet at home?”, “if yes, please say what types” and “Do you have a robot at home?”, “if yes, please say what types”. The influence of demographic information supplied by participants on enjoyment ratings was tested in GLMM models.

  1. (ii)

    Emotion Word Checklist [40]

This was used to indicate the participant’s emotional state. The Emotion Word Checklist consisted of a list of 26 emotion words potentially relevant to the experience (consisting of 14 positive words including: “calm”, “confident”, “loved”, “interested”; and 12 negative words including: “bored”, “disappointed”, “embarrassed”, “lonely”). Participants were informed that they could choose as many emotion words as they wanted to in order to describe how they were feeling at the time. The number of positive and negative emotion words selected were non-normally distributed, so the Friedman test was used to compare pre- versus post-interaction and also post-interaction TD versus post-interaction TR. Post hoc analysis with Wilcoxon signed-rank tests was conducted with Bonferroni correction applied, resulting in a significance level set at p < 0.013.

  1. (iii)

    Belief in Animal Mind (BAM) scale [20] and the Trait Animacy Scale (TASc) [21]

These were used to measure the participants’ attributions of mental states, abilities and other properties to dogs and to robots. BAM is a 4 item scale and TASc has 25 items, with responses for each item scored between 1 and 5 on both scales. Total scales for BAM could range between 4 and 20 and TASc between 24 and 120. Higher scores on either scale represented greater levels of perceived animacy attributions to the TAaR. The word ‘animals’ in both scales was replaced with either “dog” or “robot” as appropriate for the Test session. The TASc was demonstrated as being reliable for TR (α = 0.91) and TD (α = 0.85) [41]. There was an issue with reliability with the BAM scale (TR α = 0.12; TD α = 0.26) that could not be improved by removing scale items, although it has been reported as reliable in past studies [42, 43]. BAM and TASc scores were compared using repeated-measures t-tests. To assess their effect on participant enjoyment, perceived pet enjoyment and friendship formation, a series of generalised linear mixed models (GLMM) with a normal distribution and an identity link function, were created in SPSS [34]. A number of candidate models were generated, adding factors in a hierarchical manner based on previous theory [35]. These included an intercept only model, and additive models in which experimental condition (TD/TR), duration of social interaction, pets at home (Y/N), robot at home (Y/N), TASc score, and BAM score were added as fixed effects and subject was included as a random factor. Best fit models were chosen based on Akaike’s Information Criterion, with models scoring ≤ 2.0 of the best fit model considered to also have explanatory value [36].

  1. (iv)

    Perceived Attributes of Non-Humans scale (PAN-H) [44]

Originally called the GODSPEED series I to V, this scale comprises 24 semantic differential 5-point ratings, divided into 5 sub-scales measuring human perceptions of the five concepts relevant to human–robot interactions, namely: anthropomorphism, animacy, likeability, perceived intelligence, and human’s perceived safety/emotional state when interacting with the TR [44]. It was considered that these concepts could be applied to other non-human entities, including non-robotic toys, cars, plants and animals. The scale was therefore modified to extend its applicability to other non-human entities and to make some of the descriptors clearer for older children (such as “apathetic” in original scale changed to “unresponsive”). Consequently, the scale is re-named in this paper as the Perceived Attributes of Non-Humans scale (PAN-H), to reflect the broadening of scope and to make the title informative for researchers working in the fields of robot- and animal-human interaction studies. Respondents were required to rate the applicability of various descriptors to the non-human, in this study TD or TR. Descriptors were dichotomous, such as “machinelike” to “humanlike” and “unresponsive” to “responsive”. Descriptors that represented more lifelike, likeable, and intelligent were scored 5, whereas their opposites were scored 1. The possible range of overall scores was 24 to 120 with higher scores indicating greater perceived attribution of human-like qualities. Internal reliability was found to be good for TD (α = 0.84) and TR (α = 0.91). Repeated-measures t-tests were used to compare scores for TD and TR.

  1. (v)

    Enjoyment Rating

This questionnaire used a series of three 5-point Likert questions (from ‘strongly disagree’, scored 1, to ‘strongly agree’, scored 5) to measure (i) participant’s own enjoyment, (ii) perception of the TD/TR’s enjoyment, and (iii) whether they felt they made friends with the TD/TR. The participants were also asked to explain each of these answers using free text. As responses on the Likert scales were non-normally distributed, Wilcoxon signed-rank tests were used to compare across conditions for each rating measure.

  1. (vi)

    Preference

Participants were given a single forced choice question to indicate whether they preferred interacting with TD or TR or were undecided, and asked to explain their answer in an open text box. Having removed ‘I don’t know’ responses, a binomial probability test was used to compare preference for TD and TR.

  1. (vii)

    Qualitative response to preference and enjoyment ratings

Themes were extracted from the participants’ answers using a thematic analysis [45] and were grouped together based on emerging similarities throughout all the question responses using the process of clustering [46].

2.7 Behavioural Data

Continuous sampling of frequency and duration of specific behaviours performed by the child participant, the TR and the TD during all free-play videos were coded using Mangold Interact [47] (see Online Resource 2 for detailed description of the coding scheme). Specific behaviours were distinct behavioural events performed by the child or TD/TR towards the other individual in the interaction (for example child engaging in positive social touch with TD/TR; dog sniffing child; robot approaching child). The reaction of the interaction partner was also recorded and was categorised as ‘positive’ if the recipient moved towards or displayed positive emotional displays or vocalisations directed at the initiator, ‘neutral’ if the recipient failed to respond within one second, or ‘negative’ if the recipient rejected the interaction by moving away from the initiator or displayed negative emotional displays or vocalisations. The duration of time spent in ‘social interaction’ was calculated from periods when the child or TD/TR initiated an interaction and the response from the social partner was either positive or neutral. Vocalisation towards the handler by the child participant and negative responses by the child or TAaR were not included in the ‘social interaction’ measure and interaction with the toys was only coded if the participant used toys in relation to the TAaR to control for interest in the toys for their own sake. To assess coding reliability, a random selection of 25% of the videos (9 from the TR and 8 from the TD condition) were coded for all behavioural events and responses by a second researcher naive to the hypotheses. ICC inter-rater agreement measures (as calculated by Mangold Interact) were found to be good for both conditions (TD ICC = 0.79, F (21, 42) = 3.14, p < 0.001; TR ICC = 0.93, F (17, 34) = 3.83 p < 0.001 [48]).

Comparisons were made between the duration (in sec) of specific behaviours and responses (positive, negative, neutral) by TD, TR and children during the TD versus TR conditions. All behavioural events were not normally distributed, so the Wilcoxon test was used to compare between the TR/TD conditions and Spearman’s rho correlations tested the associations between enjoyment rating (5-point Likert scale) and behavioural measures.

3 Results

Participant questionnaire responses and behavioural measures were compared between the two TDs to assess comparability. There was no difference between the two TDs on any of the measures (see Online Resource 3), so results from the two TDs were pooled for all further analyses.

3.1 Behavioural Interactions

There were several significant differences in the way the participants behaved towards the TD versus the TR (Table 2, Section A). Most notably, participants vocalised more towards TD than TR, and more towards the handler in TD sessions. Participants used techniques of making noises and throwing the toy to attract the TR’s attention more than they did to the TD (Table 2, Section B).

Table 2 Behaviour of the child and pet during the TD and TR play sessions

The TR initiated interactions by approaching the participant more frequently than the TD. Comparisons of the participants’ responses to attempted initiations revealed that they spent more time responding positively to the TR’s initiations than those of the TD. This may reflect the increased time the TR spent “initiating” interactions. Compared to the TD, the TR responded positively to the participants’ attempts at interaction for a significantly longer duration. In addition, there were trends towards the TR spending less time in neutral and negative responses compared to the TD. The mean duration of social interaction was significantly greater with the TR than the TD, equating to 71.84% and 17.2% of the free-play, respectively.

Participants’ enjoyment ratings were not contingent on social interaction for either condition, suggesting that participants enjoyed the sessions with both TD and TR regardless of how much social interaction occurred (Table 2, Section C). There was also no association between most of the specific actions and enjoyment for either condition, apart from weak negative correlations between enjoyment and offering the toy in hand in the TD condition and between enjoyment and vocalising to the TR in the TR condition.

3.2 Evaluation of Experience

Participants reported a significant preference for their free-play session with the live TD compared to the TR (N = 30, K = 28, p < 0.001), with two participants not expressing a preference. In both conditions, the majority of participants strongly agreed with the statement “I enjoyed spending time with the TD/TR” (TD N = 32, TR N = 21), indicating high levels of participant enjoyment for both TAaR conditions. However, overall enjoyment ratings after the TD condition (M = 4.85 ± 0.61) were significantly higher than after the TR condition (M = 4.32 ± 1.03); Z = -2.42, p = 0.02). Participants also reported high levels for both TD and TR in respect of perceived enjoyment (TD: M = 4.29 ± 0.94; TR: M = 4.03 ± 1.08) and friendship (TD: M = 4.02 ± 1.14; TR: M = 3.91 ± 1.06), and these did not differ across conditions (TD/TR enjoyment Z = -1.35, p = 0.18; friendship with TD/TR Z = − 0.43, p = 0.67).

Overall, there was a significant difference in the number of positive emotion words describing how the participant felt at the different measurement points in the study, χ2 (3) = 12.96, p = 0.005. Post hoc tests indicated that the number of positive emotion words selected were comparable across TAaR conditions at baseline (TD: M = 8.00 ± 4.02; TR: M = 8.08 ± 5.72; Z = 0.00, p > 0.99) and remained similar after spending time with the TR (post-session: M = 8.84 ± 4.02; Z = -1.11, p = 0.27) but decreased after spending time with the TD (post-session: M = 5.69 ± 3.11; Z = -2.86, p = 0.004; Fig. 3a). Very few negative emotion words were selected by participants at any time point (preliminary test session, TD: Σ = 3, TR: Σ = 7; post-session test, TD: Σ = 2, TR: Σ = 3; Fig. 3b), thus statistical analysis could not be performed [49].

Fig. 3
figure 3

Frequency of a positive and b negative emotion words selected before interaction and c positive and d negative emotion word selected after interaction with TD and TR

Thematic Analysis revealed five themes and ten subthemes from the participant responses to the four open questions related to preference, enjoyment, perceived pets’ enjoyment and friendship formation (Table 3). The main themes are considered in the discussion; for a more detailed discussion of the subthemes, see Online Resource 4.

Table 3 Themes and subthemes identified from the open text answers to questions regarding preference, enjoyment, TD/TRs’ enjoyment, and perceived friendship

3.3 Effects of Participant Beliefs

Total scores for the PAN-H scale were significantly higher for TD (M = 70.71 ± 5.08) than for TR (M = 55.97 ± 11.22) (t (33) = 6.81, p < 0.001). The TD scored significantly higher than the TR on all constituting description words (Fig. 4), indicating that participants perceived TD as significantly more lifelike, likeable and having greater intelligence than TR.

Fig. 4
figure 4

Mean (± 1 SD) ratings of TD and TR on PAN-H scale

Belief in Animal Minds and Trait Animacy questionnaire scores were higher for the TD (BAM: M = 15.38 ± 2.56; TASc: M = 109.32 ± 9.33) than for TR (BAM: M = 12.29 ± 2.77; TASc: M = 70.24 ± 17.27), and indicated significantly higher attributions of mental capacity and animacy to TDs than to TRs, (BAM: t (34) = 6.07, p < 0.01; TASc: t (34) = 11.46, p < 0.01). In the GLMMs, TASc and BAM were both positive predictors of participant enjoyment (F (1, 68) = 31.28, p < 0.001) and perceived TD/TR enjoyment (F (1, 68) = 14.75, p < 0.001). The TAaR condition was also a significant factor in one of the top models for participant enjoyment, with enjoyment being higher in the TD condition than the TR condition (F (1, 68) = 6.79, p = 0.01). BAM was the only factor retained in the best-fit model predicting perceived friendship (F (1, 68) = 18.55, p < 0.001). No other predictors were retained in the best-fit models (see Online Resource 5 for details of model selection).

4 Discussion

In this study, we aimed to investigate the effect of animistic beliefs and levels of social interaction on the evaluations of two potential therapy adjuncts: a living dog (TD) and the MiRo-E robot (TR). Participants reported a preference and a degree of greater enjoyment for the living therapy dog over the robot, supporting previous research and the stringent interpretation of the Biophilia Hypothesis as referring principally to living organisms [20, 50]. Likewise, participants viewed the TDs as more lifelike, likeable and possessing greater intelligence than the TR on the PAN-H Scale. Despite this, participants spent more time interacting with the TR, chose a greater number of positive interaction words to describe their experience with the TR than the TD, and reported high enjoyment of the TR sessions. This suggests there may be other mediating factors in respect of the overall preference for the TD. These may include potential participant bias and familiarity effect, with there being far greater familiarity with dogs than social robots. Familiarity (even mere exposure) can influence attitudes [51]. Thus, in future this difference may reduce as biomimetic robots became more widely available and depicted in domestic contexts in the media.

There were many similarities in the way that the participants interacted with the TAaRs but also some interesting differences. Despite their knowledge of the categorical difference between the TD and TR, the children initiated more interactions with the TR than the TD, and spent more time responding positively to approaches from the TR. The TR’s behaviour was also more contingent on the positive interactions of the participant than the TD. The high number of initiations by the TR reflects the programming of the robot and is therefore a realistic, ecologically valid measure of how it would operate during any RAIs. The variations in behaviour between the TR and the TD also reflect a benefit of using TRs, in that they can be used in RAI for a longer period of time without compromised welfare of the TAaR and are more contingent on the child’s actions and therefore, may reduce frustration and sustain the prolonged attention of the child [52, 53]. Thus, in the TR condition these positive participant behaviours were likely reinforced more strongly. Previous research has indicated that the movements of robots are automatically interpreted as being social interaction cues by human users [54] and even brief interactions with a robot can foster feelings of affinity and connectedness [55]. However, the increased time spent in social interaction with the TR did not translate to increased enjoyment rating, suggesting that it may not be a strong predictor of the perceived positive outcomes in AAI/RAI [56]. It is recognised that this investigation was a free interaction and did not have specific therapeutic goals, therefore it was not possible to assess the success of AAI/RAI therapy per se. Future research could explore the association between therapeutic outcomes, enjoyment and social interaction.

Interestingly, participants spent more time interacting with the handler during the TD than the TR sessions. The content of the conversations was not analysed so the precise reason for this difference is unclear, however, the child may have identified the handler as the TD’s owner, living with it and having a relationship with it and thus having more to say about it if questioned compared to the TR. Studies of the benefits of dog ownership have noted the “social catalyst effect”, whereby the presence of a dog often increases the amount of social engagement an individual receives from other people resulting in a sense of increased wellbeing [57, 58], and increased communication has similarly been evidenced in TR research [59]. Future work could profitably explore the role of social interaction within the AAI/RAI framework, ensuring that the children consider both TAaRs as owned by the handler.

Positive social touch was the activity that occurred for the longest duration for both conditions. This may be because positive social touch was soothing to the participant or positive social touch had tangible feedback effects on both TD and TR. Physical contact has been indicated as a significant factor in human users’ ratings of the quality of interaction with robots [60] and for the most part, people are comfortable sharing their physical space with robots and will often approach them at a closer physical distance than living beings [61]. Positive social touch may have also have functioned as a “fall-back” option when the participants did not know how interact with the TR. The stress-reducing effect of physical contact with an animal has been suggested by previous studies as one way in which positive outcomes for humans [40,41,42] and dogs [42] are derived, but further investigation into the specifics of robotic haptics is required [61]. The emphasis on relaxation effects in the sessions was mirrored in the positive word emotion selection and participant responses. The relaxing effect was reported by several participants after spending time with the TD (based on open questions). The calming effect of interaction with therapy dogs has been identified across a range of related studies (e.g. [2, 25, 39, 62]). Interestingly, only one person reported the sessions with the TR as relaxing, but analysis of the Emotion Word Checklist, showed that more subjects reported feeling calm after the TR compared to the TD session. This may reflect some difference in participant understanding of the terms “relaxing” and “calm”. Regardless, future research could assess the physiological (stress-reducing) effects of spending time with a biomimetic TR versus a living TD.

Participants also mentioned that both sessions were fun and enjoyable. A number of participants expressed an existing familiarity with and love of dogs, thus prior experience of dogs may have contributed to the positive experience during the live TD session [29, 42, 63], although the results of the GLMMs did not suggest an effect of dog or robot ownership on enjoyment. This may be partly because ownership is not the only way that participants could become familiar with dogs (for example, extended family members or friends may own dogs) and the questionnaire did not quantify overall exposure to dogs nor directly ask whether participants liked dogs or robots. The voluntary recruitment process meant participants likely self-selected due to an existing familiarity with dogs, and results are unlikely to be applicable to individuals who do not like dogs. For these individuals, a therapy robot may be more beneficial and therefore, future studies should directly ask participants whether they like dogs and allow participants access to the TR that is not contingent on them interacting with the TD. Conversely, the participants were unfamiliar with the TR (“very unique TR and I’ve never seen one before” Participant 16) suggesting that it may have been the novelty of the interaction that was enjoyable in the TR sessions. It has been suggested that the positive outcomes reported in animal assisted therapies using less familiar species or activities may also be attributed to “the novelty effect” rather than the animal per se [64]. The behaviour of the TAaR was frequently reported as a positive feature of the experience, including both general statements of interactivity and specific actions, such as playing or doing tricks. Participants often referred to the dyadic interaction between themselves and the TD/TR, describing the initiation they performed and the TD/TR’s reaction.

Regarding animistic and biophilic beliefs, the following was found. A total of 33 emotional attributions were made regarding the TD compared to 24 for the TR, supporting significant differences between the TD and TR in the scores on the PAN-H scale. The Belief in Animal Mind (BAM) and Trait Animacy (TASc) scale scores also revealed differences in the participants’ animistic beliefs towards the TDs and TR. TDs were afforded a higher level of mental capacities such as intellect, the ability to reason and experience emotion, than the TR, reflecting the readiness of humans to attribute mental states and cognition to some animal species [65, 66]. As predicted participants also viewed the TD as being more alive and animate than the TR. However, BAM and TASc scores for the TR were arguably higher than may be expected for a machine, and a similar number of participants afforded social status (i.e. an ability to form a friendship with them) to the TR as to the TD. By invoking the appearance of animacy, the TR influenced the participants into believing that it was capable of mental processes and could attribute biological properties, mental states, social and moral standing to the TR [65, 67,68,69,70]. Higher BAM and TASc scores were predictive of higher participant ratings of enjoyment, perceived TD/TR’s enjoyment, and friendship. This was a significant finding, as it demonstrated that how the participants thought about the TAaR was more important than how they interacted with them: participants who believed the TD or TR to have greater biological traits of animacy and mental capacity were likely to evaluate the TD/TR more favourably [24, 71, 72].

The physical appearance of the TDs but not the TR seemed an important, and positive, factor in the experience for many participants, with subjects commenting on the “fluffiness” or “cuteness” of the live TDs despite neither of the TDs having particularly fluffy coats. In contrast, two participants did not like the appearance of the MiRo (e.g. the MiRo was “weird because there was no fur”—Participant 17). The smooth surface of social TRs is often marketed as an advantage over living TDs for use in AAI, as the TRs can be cleaned between users [16] and as one participant perceptively suggested, “people wouldn’t be allergic to it” (Participant 7; preferred TR). However, the tactile sense is important to human perceptions of comfort and relaxation [73, 74]. Thus, future research should investigate the effect of hard and textured TRs on physiological measures of relaxation and perceptions of enjoyment and relaxation. Despite some negative comments regarding the appearance of the TR, for most participants the experience was overwhelmingly positive and this enjoyment of MiRo may support the Biophilia Hypothesis in its broadest sense. The inclusion of “life and life-like processes” in the Biophilia Hypothesis’ definition broadened the scope of the affiliation beyond living organisms to artificial design that has “life-like features” [11, 75, 76]. MiRo was a good example of this principle; the fundamental design of the MiRo TR was to be as life-like as possible, and is arguably as life-like as current technology allows without being alive. A naturalistic appearance and behaviour is an important factor in users’ liking of social robots and a lack thereof may be a determining factor when people are not able to form an emotional bond with them [77, 78]. Future studies could compare the MiRo TR with a social TR that is less life-like in appearance and behaviour, or more so, to explore the extent to which the concept of “life-like” can be stretched.

The study design specifically addressed previous concerns of research into AAI by providing a suitable comparison condition in the use of the MiRo TR, a repeated method design to remove problems with baseline variability, and use of varied measurement methods, analysed with statistical tests with effect sizes. It must be noted however, that this study represents only a small sample of self-selecting volunteers and the results, while generally positive for both the live TD and TR conditions, are mixed. For example, a substantial number of participants mentioned the relaxation effects of the TD in the open-ended questions but not for the TR, whereas the word “calm” in the emotion checklist was chosen considerably more often in relation to the TR than the TD. This study therefore highlights the effect of different question formats on the responses of participants. The forced-choice preference question revealed an overwhelming preference for the TD, yet the majority of participants also rated their enjoyment of the TR sessions very highly. In addition, while participants reported more positive words following the TR session than the TD session, the positive emotion word frequency did not significantly increase from baseline. This was likely to be due to the fact that a large number of positive words were already selected at baseline, and a simple “presence/absence” measure of emotion was not sensitive enough to detect more subtle changes in the degree to which positive emotions were felt. Additionally, for the TD condition there was a drop in positive emotion word frequency post free-play. This may reflect differences in the TD’s behaviour across sessions, differentially affecting participants. Though not analysed, it may be that, in later sessions in the 2-h working, the TD engaged less due to tiredness or lowered motivation, factors which would not affect the TR. More broadly, effects of demand characteristics, the reliability of self-report measures and subtle changes in the animal’s behaviour must also be considered in future research. It should also be noted that internal reliability of the BAM was low. This could be due to the complexity of concepts and wording of scale items, so future studies should ensure the scale is modified to be age appropriate.

Despite these limitations, positive participant evaluations were evident after a very short 5-min exposure to the TD/TR. Some differences in the responses of the participants to TD and TR were observed but these did not directly translate to differences in the enjoyment ratings. The existing attitudes and beliefs the children held in relation to TDs and TRs appeared to mediate positive outcomes, and subtle differences in self-report measures point to potentially different mechanisms by which these positive outcomes were achieved. Taken together these results suggest that the MiRo biomimetic TR can be used as a robust control measure for research into the use of TDs in AAI and may also be a viable alternative for dogs as pets and for the field of TR assisted interventions.