The Role of Speech Technology in User Perception and Context Acquisition in HRI,International Journal of Social Robotics

当前位置： X-MOL 学术 › Int. J. Soc. Robotics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Role of Speech Technology in User Perception and Context Acquisition in HRI
International Journal of Social Robotics ( IF 3.8 ) Pub Date : 2020-08-04 , DOI: 10.1007/s12369-020-00682-5
Jorge Wuth , Pedro Correa , Tomás Núñez , Matías Saavedra , Néstor Becerra Yoma

The role and relevance of speech synthesis and speech recognition in social robotics is addressed in this paper. To increase the generality of this study, the interaction of a human being with one and two robots when executing tasks was considered. By making use of these scenarios, a state-of-the-art speech synthesizer was compared with non-linguistic utterances (1) from the human preference and (2) perception of the robots’ capabilities, (3) speech recognition was compared with typed text to input commands regarding the user preference, and (4) the importance of knowing the context of robots and (5) the role of synthetic voice to acquire this context were evaluated. Speech synthesis and recognition are different technologies but generating and understanding speech should be understood as different dimensions of the same spoken language phenomenon. Also, robot context denotes all the information about operating conditions and completeness status of the task that is being executed by the robot. Two robotic setups for online experiments were built. With the first setup, where only one robot was employed, our findings indicate that: highly natural synthetic speech is preferred over beep-like audio; users also prefer to enter commands by voice rather than by typing text; and, the robot voice has a more important effect on the perceived robot’s capability than the possibility to input commands by voice. The analysis presented here suggests that when the users interacted with a single robot, its voice as a social cue and cause of anthropomorphization lost relevance while the interaction was carried out and the users could evaluate better the robot’s capability with respect to its task. In the experiment with the second setup, a two-robot collaborative testbed was employed. When the robots communicated to each other to sort out the problems while they were trying to accomplish a mission, the user observed the situation from a more distanced position and the “reflective” perspective dominated. Our results indicate that to acquire the robots’ context was perceived as essential for a successful human–robot collaboration to accomplish a given objective. For this purpose, synthesized speech was preferred over text on a screen for context acquisition.

中文翻译：

语音技术在HRI中的用户感知和上下文获取中的作用

本文探讨了语音合成和语音识别在社交机器人中的作用和相关性。为了提高这项研究的普遍性，考虑了执行任务时人与一个和两个机器人的交互。通过利用这些场景，将最先进的语音合成器与非语言话语进行了比较（1）来自人类的喜好，（2）对机器人功能的感知，（3）语音识别与键入文本以输入有关用户偏好的命令，以及（4）了解机器人上下文的重要性，以及（5）合成语音在获取此上下文中的作用。语音合成和识别是不同的技术，但生成和理解语音应理解为同一口语现象的不同维度。此外，机器人上下文表示有关机器人正在执行的任务的操作条件和完成状态的所有信息。建立了两个用于在线实验的机器人装置。在仅使用一个机器人的第一个设置中，我们的发现表明：高自然的合成语音比类似蜂鸣声的音频更可取；用户还更喜欢通过语音输入命令，而不是键入文本。并且，机器人语音对感知的机器人功能的影响比通过语音输入命令的可能性更为重要。此处进行的分析表明，当用户与单个机器人进行交互时，在进行交互时，其作为社交提示和拟人化原因的声音就失去了相关性，用户可以更好地评估机器人在执行任务方面的能力。在第二种设置的实验中，使用了两机器人协作测试平台。当机器人在试图完成任务时互相交流以解决问题时，用户会从更远的位置观察情况，并且以“反射”视角为主导。我们的结果表明，获取机器人的环境被认为是成功完成人机协作以实现给定目标的关键。为此，对于上下文获取，合成语音比屏幕上的文本更可取。用户从更远的位置观察情况，并且以“反射”视角为主。我们的结果表明，获取机器人的环境被认为是成功完成人机协作以实现给定目标的关键。为此，对于上下文获取，合成语音比屏幕上的文本更可取。用户从更远的位置观察情况，并且以“反射”视角为主。我们的结果表明，获取机器人的环境被认为是成功完成人机协作以实现给定目标的关键。为此，对于上下文获取，合成语音比屏幕上的文本更可取。

更新日期：2020-08-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11