Skip to main content
Log in

Empathetic Speech Synthesis and Testing for Healthcare Robots

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

One of the major factors that affect the acceptance of robots in Human-Robot Interaction applications is the type of voice with which they interact with humans. The robot’s voice can be used to express empathy, which is an affective response of the robot to the human user. In this study, the aim is to find out if social robots with empathetic voice are acceptable for users in healthcare applications. A pilot study using an empathetic voice spoken by a voice actor was conducted. Only prosody in speech is used to express empathy here, without any visual cues. Also, the emotions needed for an empathetic voice are identified. It was found that the emotions needed are not only the stronger primary emotions, but also the nuanced secondary emotions. These emotions are then synthesised using prosody modelling. A second study, replicating the pilot test is conducted using the synthesised voices to investigate if empathy is perceived from the synthetic voice as well. This paper reports the modelling and synthesises of an empathetic voice, and experimentally shows that people prefer empathetic voice for healthcare robots. The results can be further used to develop empathetic social robots, that can improve people’s acceptance of social robots.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Anthropomorphism refers to the tendency of humans to see human-like characteristics, emotions, and motivations in non-human entities such as animals, gods, and objects.

  2. The focus is on elderly as the Healthbots - which is the application on which this study is based, is developed for aged-care facilities.

  3. Affect is a concept used in psychology to describe the experiencing of feeling or emotion. The addition of emotions/feelings into the robot’s speech is explained here. This has lead to a field called affective computing, which includes developing systems that can recognise, interpret and respond to emotions, and also produce them.

  4. In this study, neutral voice is defined as voice spoken naturally (i.e. without stress). For the robot with expressive voice, stress was included to express urgency.

  5. Empathy is the ability to understand and share the feelings of another.

  6. The study is approved by the University of Auckland Human Participants Ethics Committee (UAHPEC) on 20/10/2017 for 3 years. Ref. No. 019845.

  7. https://auckland.au1.qualtrics.com/jfe/form/SV_2hn68L1Df9lMXIh

  8. First language and second language speakers distinction is based on New Zealand English. Participants were classified as L1 if they were living in New Zealand since age seven at least.

  9. Version XM of Qualtrics. Copyright 2019 Qualtrics. Qualtrics and all other Qualtrics product or service names are registered trademarks or trademarks of Qualtrics, Provo, UT, USA. https://www.qualtrics.com.

  10. Primary emotions are emotions that are innate to support reactive response behaviour (Eg. angry, happy, sad, fear). The basic/primary emotions are based on the studies by Ekman [37]. Secondary emotions arise from higher cognitive processes, based on an ability to evaluate preferences over outcomes and expectations (Eg. relief, hope) [38, 39]. Various theories define primary and secondary emotions [40], but here we will be looking at emotions that were studied as part of human-robot interaction and speech synthesis studies.

  11. JLCorpus contains five primary and five secondary emotions. “Assertive” was one of the secondary emotions. The actors of the corpus were instructed to speak seriously and confidently while recording this emotion. In a previous paper, the reviewers strongly criticised the use of “assertive” as an emotion, and asked to reconsider it. From the existing list of emotions in Russel’s circumplex model of emotions (Fig. 6), “confident” was the best match. This journey is a clear indication of the difficulty in analysing and classifying secondary emotions as they can be difficult to define. The corpus is available at: github.com/tli725/JL-Corpus.

  12. Approved by the University of Auckland Human Participants Ethics Committee (UAHPEC) on 20/10/2017 for 3 years. Ref. No. 019845.

  13. https://auckland.au1.qualtrics.com/jfe/form/SV_9tvDP800i4oLmXX

  14. r indicates the effect size.

  15. Using Taguette [62]; mind map plot using Miro [63]

References

  1. Tapus A (2009) Assistive robotics for healthcare and rehabilitation. In: Int. conf. on control systems and computer science, Romania, pp 1–7

  2. Toh LPE, Causo A, Tzuo P, Chen I, Yeo SH (2016) A review on the use of robots in education and young children. Educ Technol Soc 19:148–163

    Google Scholar 

  3. Triebel R, Arras K, Alami R, Beyer L, Breuers S, Chatila R, Chetouani M, Cremers D, Evers V, Fiore M (2016) Spencer: a socially aware service robot for passenger guidance and help in busy airports. In: Field and service robotics, pp 607–622

  4. Pineau Joelle, Montemerlo Michael, Pollack Martha, Roy Nicholas, Thrun Sebastian (2002) Towards robotic assistants in nursing homes: challenges and results. Robot Auton Syst 42(271–281):6

    MATH  Google Scholar 

  5. Chu M, Khosla R, Khaksar SMS, Nguyen K (2017) Service innovation through social robot engagement to improve dementia care quality. Assist Technol 29(1):8–18

    Article  Google Scholar 

  6. Centre for automation and robotic engineering science-Healthbots. https://cares.blogs.auckland.ac.nz/research/healthcare-assistive-technologies/healthbots/. Accessed 29 Oct 2019

  7. Broadbent E, Stafford R, MacDonald B (2009) Acceptance of healthcare robots for the older population: review and future directions. Int J Social Robot 1(4):319

    Article  Google Scholar 

  8. Igic A, Watson CI, Stafford RQ, Broadbent E, Jayawardena C, MacDonald BA (2010) Perception of synthetic speech with emotion modelling delivered through a robot platform: an initial investigation with older listeners. In: Australasian int. conf. on speech science and technology, Australia, pp 189–192

  9. Igic A (2010) Synthetic speech for a healthcare robot: investigation, issues and implementation. Master’s thesis, The University of Auckland, New Zealand

  10. Fussell SR, Kiesler S, Setlock LD, Victoria Y (2008) How people anthropomorphize robots. In: ACM/IEEE int. conf. on human-robot interaction, Netherlands, pp 145–152

  11. Heerink Marcel, Kröse Ben, Evers Vanessa, Wielinga Bob (2010) Assessing acceptance of assistive social agent technology by older adults: the Almere model. Int J Social Robot 2(4):361–375

    Article  Google Scholar 

  12. Heerink M (2011) Exploring the influence of age, gender, education and computer experience on robot acceptance by older adults. In: Int. conf. on Human-robot interaction, Switzerland, pp 147–148

  13. Duffy Brian R (2003) Anthropomorphism and the social robot. Robot Auton Syst 42(3–4):177–190

    Article  Google Scholar 

  14. Marcel Heerink, Ben Krose, Vanessa Evers, Bob Wielinga (2006) The influence of a robot’s social abilities on acceptance by elderly users. IEEE Int. Symposium on Robot and Human Interactive Communication, UK, pp 521–526

  15. Markowitz J (2017) Speech and language for acceptance of social robots: an overview. Voice Interact Design 2:1–11

    Google Scholar 

  16. Breazeal C, Scassellati B (1999) A context-dependent attention system for a social robot. In: Int. joint conf.s on artificial intelligence, USA, pp 1146–1151

  17. Chella A, Barone RE, Pilato G, Sorbello R (2008) An emotional storyteller robot. Emotion, personality, and social behavior, USA. In: AAAI spring symposium, pp 17–22

  18. Mavridis Nikolaos (2015) A review of verbal and non-verbal human-robot interactive communication. Robot Auton Syst 63:22–35

    Article  MathSciNet  Google Scholar 

  19. Ivar Nass Clifford, Brave Scott (2005) Wired for speech: how voice activates and advances the human–computer relationship. MIT press, Cambridge

    Google Scholar 

  20. Goetz J, Kiesler S, Powers A (2003) Matching robot appearance and behavior to tasks to improve human-robot cooperation. In: IEEE int. workshop on robot and human interactive communication, USA, pp 55–60

  21. Scheutz M, Schermerhorn P, Kramer J, Middendorff C (2006) The utility of affect expression in natural language interactions in joint human–robot tasks. In: ACM conf. on human–robot interaction. USA 2:226–233

  22. Eyssel F, Ruiter L, Kuchenbrandt D, Bobinger S, Hegel F (2012) If you sound like me, you must be more human: on the interplay of robot and user features on human-robot acceptance and anthropomorphism. In: ACM/IEEE int. conf. on human–robot interaction, USA, pp 125–126

  23. Fung P, Bertero D, Wan Y, Dey A, Chan RHY, Siddique F, Yang Y, Wu C, Lin R (2016) Towards empathetic human-robot interactions. In: Int. conf. on intelligent text processing & computational linguistics, Turkey, pp 173–193

  24. James J, Watson CI, MacDonald B (2018) Artificial empathy in social robots: an analysis of emotions in speech. In: IEEE int. symposium on robot and human interactive communication, China, pp 632–637

  25. Cuff Benjamin MP, Brown Sarah J, Taylor Laura, Howat Douglas J (2016) Empathy: a review of the concept. Emot Rev 8(2):144–153

    Article  Google Scholar 

  26. Asada Minoru (2015) Towards artificial empathy. Int J Social Robot 7(1):19–33

    Article  Google Scholar 

  27. Taylor P (2009) Text-to-speech synthesis. Cambridge university press, Cambridge

    Book  Google Scholar 

  28. Crumpton J, Bethel CL (2015) Validation of vocal prosody modifications to communicate emotion in robot speech. In: Int. conf. on collaboration technologies and systems, USA, pp 39–46

  29. Alam Firoj, Danieli Morena, Riccardi Giuseppe (2018) Annotating and modeling empathy in spoken conversations. Computer Speech Lang 50:40–61

    Article  Google Scholar 

  30. Li X, Watson CI, Igic A, MacDonald B (2009) Expressive speech for a virtual talking head. In: Australasian conf. on robotics and automation, Australia, pp 5009–5014

  31. Moore Lisa A (2006) Empathy: a clinician’s perspective. ASHA Leader 11(10):16–35

    Article  Google Scholar 

  32. Niculescu Andreea, van Dijk Betsy, Nijholt Anton, Li Haizhou, See Swee Lan (2013) Making social robots more attractive: the effects of voice pitch, humor and empathy. Int J Social Robot 5(2):171–191

    Article  Google Scholar 

  33. Watson C, Liu W, MacDonald B (2013) The effect of age and native speaker status on synthetic speech intelligibility. In: ISCA workshop on speech synthesis, Spain, pp 195–200

  34. Broadbent E, Tamagawa R, Kerse N, Knock B, Patience A, MacDonald B (2009) Retirement home staff and residents preferences for healthcare robots. In: IEEE int. symposium on robot and human interactive communication, Japan, pp 645–650

  35. Moyers TB, Martin T, Manuel JK, Miller WR, Ernst D (2003) The motivational interviewing treatment integrity (miti) code: Version 2.0. http://casaa.unm.edu/download/miti.pdf. Accessed 29 Oct 2019

  36. Field A, Miles J, Field Z (2012) Discovering statistics using R. Sage, Thousand Oaks, pp 666–673

    Google Scholar 

  37. Ekman Paul (1992) An argument for basic emotions. Cogn Emotion 6(3–4):169–200

    Article  Google Scholar 

  38. Damasio A (1994) Descartes error, emotion reason and the human brain. Avon books, New York

    Google Scholar 

  39. Becker-Asano Christian, Wachsmuth Ipke (2010) Affective computing with primary and secondary emotions in a virtual human. Auton Agent Multi-Agent Syst 20(1):32

    Article  Google Scholar 

  40. Kemper Theodore D (1987) How many emotions are there? wedding the social and the autonomic components. Am J Sociol 93(2):263–289

    Article  Google Scholar 

  41. Ochs Magalie, Sadek David, Pelachaud Catherine (2012) A formal model of emotions for an empathic rational dialog agent. Auton Agent Multi-Agent Syst 24(3):410–440

    Article  Google Scholar 

  42. Boukricha H, Wachsmuth I, Carminati MN, Knoeferle P (2013) A computational model of empathy: empirical evaluation. In: Humaine association conf. on affective computing and intelligent interaction, USA, pp 1–6

  43. Schröder M (2001) Emotional speech synthesis: a review. In: Eurospeech, Scandinavia, pp 561–64

  44. Breazeal C (2001) Emotive qualities in robot speech. In: IEEE/RSJ IROS, USA, pp 1389–1394. IEEE

  45. Crumpton Joe, Bethel Cindy L (2016) A survey of using vocal prosody to convey emotion in robot speech. Int J Social Robot 8(2):271–285

    Article  Google Scholar 

  46. Paltoglou Georgios, Thelwall Michael (2012) Seeing stars of valence and arousal in blog posts. IEEE Trans Affect Comput 4(1):116–123

    Article  Google Scholar 

  47. James J, Tian L, Watson CI (2018) An open source emotional speech corpus for human robot interaction applications. In: Interspeech, India, pp 2768–2772

  48. James J, Watson CI, Stoakes H(2019) Influence of prosodic features and semantics on secondary emotion production and perception. In: Int. congress of phonetic sciences, Australia, pp 1779–1782

  49. Kisler T, Schiel F, Sloetjes H (2012) Signal processing via web services: the use case webmaus. In: Digital humanities conf, Germany, pp 30–34

  50. James J, Mixdorff H, Watson CI (2019) Quantitative model-based analysis of \(f_0\) contours of emotional speech. In: Int. congress of phonetic sciences, Australia, pp 72–76

  51. Mixdorff H, Cossio-Mercado C, Hönemann A, Gurlekian J, Evin D, Torres H(2015) Acoustic correlates of perceived syllable prominence in German. In: Annual conf. of the int. speech communication association, Germany, pp 51–55

  52. Mixdorff H (2000) A novel approach to the fully automatic extraction of fujisaki model parameters. In: IEEE int. conf. on acoustics, speech, and signal processing. Proceedings, Turkey, pages 1281–1284

  53. Schröder Marc, Trouvain J”urgen (2003) The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int J Speech Technol 6(4):365–377

    Article  Google Scholar 

  54. Watson CI, Marchi A (2014) Resources created for building New Zealand english voices. In: Australasian int. conf. of speech science and technology, New Zealand, pp 92–95

  55. Jain S (2015) Towards the creation of customised synthetic voices using Hidden Markov Models on a Healthcare Robot. Master’s thesis, The University of Auckland, New Zealand

  56. Paul Boersma (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Inst Phonetic Sci 17:97–110

    Google Scholar 

  57. Liaw Andy, Wiener Matthew (2002) Classification and regression by Random Forest. R news 2.3 23:18–22

    Google Scholar 

  58. Yoav Freund, Schapire Robert E (1996) Experiments with a new boosting algorithm. In: Int. conf. on machine learning, Italy, pp 148–156

  59. Eide E, Aaron A, Bakis R, Hamza W, Picheny M, Pitrelli J (2004) A corpus-based approach to expressive speech synthesis. In: ISCA ITRW on speech synthesis, USA, pp 79–84

  60. Ming H, Huang D, Dong M, Li H, Xie L, Zhang S (2015) Fundamental frequency modeling using Wavelets for emotional voice conversion. In: Int. conf. on affective computing and intelligent interaction, China, pp 804–809

  61. Robinson C, Obin N, Roebel A (2019) Sequence-to-sequence modelling of \(F_0\) for speech emotion conversion. In: Int. conf. on acoustics, speech, and signal processing, UK, pp 6830–6834

  62. Taguette version: 0.9. https://www.taguette.org. Publisher: Zenodo

  63. Miro. https://miro.com/app/

  64. Powers A, Kiesler S, Fussell S, Torrey C (2007) Comparing a computer agent with a humanoid robot. In: Proceedings of the ACM/IEEE int. conf. on human-robot interaction, pp 145–152

  65. McGinn C, Torre I (2019) Can you tell the robot by the voice? an exploratory study on the role of voice in the perception of robots. In: 2019 14th ACM/IEEE int. conf. on human–robot interaction (HRI), pp 211–221. IEEE

  66. Anzalone Salvatore M, Boucenna Sofiane, Ivaldi Serena, Chetouani Mohamed (2015) Evaluating the engagement with social robots. Int J Social Robot 7(4):465–478

    Article  Google Scholar 

  67. Leite Iolanda, Castellano Ginevra, Pereira André, Martinho Carlos, Paiva Ana (2014) Empathic robots for long-term interaction. Int J Social Robot 6(3):329–341

    Article  Google Scholar 

  68. Tamagawa Rie, Watson Catherine I, Han Kuo I, MacDonald Bruce A, Broadbent Elizabeth (2011) The effects of synthesized voice accents on user perceptions of robots. Int J Social Robot 3(3):253–262

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the Centre for Automation and Robotic Engineering Science, University of Auckland Seed funding. The authors would like to thank the professional actors who recorded their voices for the JLCorpus and the perception test participants for their time and effort.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesin James.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical Standard

The authors also thank the very detailed review provided by the reviewers of the journal that helped improve this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix-Scale of Empathy Questionnaire

Appendix-Scale of Empathy Questionnaire

Table 5 Empathy scale in MITI and its extension to HRI

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

James, J., Balamurali, B.T., Watson, C.I. et al. Empathetic Speech Synthesis and Testing for Healthcare Robots. Int J of Soc Robotics 13, 2119–2137 (2021). https://doi.org/10.1007/s12369-020-00691-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-020-00691-4

Keywords

Navigation