Turn-taking in Conversational Systems and Human-Robot Interaction: A Review,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Turn-taking in Conversational Systems and Human-Robot Interaction: A Review
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-12-16 , DOI: 10.1016/j.csl.2020.101178
Gabriel Skantze

The taking of turns is a fundamental aspect of dialogue. Since it is difficult to speak and listen at the same time, the participants need to coordinate who is currently speaking and when the next person can start to speak. Humans are very good at this coordination, and typically achieve fluent turn-taking with very small gaps and little overlap. Conversational systems (including voice assistants and social robots), on the other hand, typically have problems with frequent interruptions and long response delays, which has called for a substantial body of research on how to improve turn-taking in conversational systems. In this review article, we provide an overview of this research and give directions for future research. First, we provide a theoretical background of the linguistic research tradition on turn-taking and some of the fundamental concepts in theories of turn-taking. We also provide an extensive review of multi-modal cues (including verbal cues, prosody, breathing, gaze and gestures) that have been found to facilitate the coordination of turn-taking in human-human interaction, and which can be utilised for turn-taking in conversational systems. After this, we review work that has been done on modelling turn-taking, including end-of-turn detection, handling of user interruptions, generation of turn-taking cues, and multi-party human-robot interaction. Finally, we identify key areas where more research is needed to achieve fluent turn-taking in spoken interaction between man and machine.

中文翻译：

会话系统中的转机和人机交互：回顾

轮流是对话的基本方面。由于很难同时讲话和听声音，因此参与者需要协调当前正在讲话的人以及下一个人何时可以讲话。人类非常擅长这种协调，并且通常能够以很小的间隙和很少的重叠实现流畅的转向。另一方面，会话系统（包括语音助手和社交机器人）通常会出现中断频繁且响应延迟长的问题，这要求对如何改善会话系统的转弯进行大量研究。在这篇评论文章中，我们提供了这项研究的概述，并提供了未来研究的方向。第一，我们提供了转弯语言研究传统的理论背景，以及转弯理论中的一些基本概念。我们还提供了对多模式提示（包括口头提示，韵律，呼吸，凝视和手势）的广泛综述，这些提示有助于在人与人互动中协调转弯动作，并可用于转弯动作。进入对话系统。此后，我们将回顾在转弯建模方面所做的工作，包括转弯终点检测，用户中断处理，转弯提示的产生以及多方人机交互。最后，我们确定了关键领域，需要进行更多研究才能实现人机对话中的口语交流。我们还提供了对多模式提示（包括口头提示，韵律，呼吸，凝视和手势）的广泛综述，这些提示有助于在人与人互动中协调转弯动作，并可用于转弯动作。进入对话系统。此后，我们将回顾在转弯建模方面所做的工作，包括转弯终点检测，用户中断处理，转弯提示的产生以及多方人机交互。最后，我们确定了关键领域，需要进行更多研究才能实现人机对话中的口语交流。我们还提供了对多模式提示（包括口头提示，韵律，呼吸，凝视和手势）的广泛综述，这些提示有助于在人与人互动中协调转弯动作，并可用于转弯动作。进入对话系统。此后，我们将回顾在转弯建模方面所做的工作，包括转弯终点检测，用户中断处理，转弯提示的产生以及多方人机交互。最后，我们确定了关键领域，需要进行更多研究才能实现人机对话中的口语交流。并且可以用于会话系统中的转弯。此后，我们将回顾在转弯建模方面所做的工作，包括转弯终点检测，用户中断处理，转弯提示的产生以及多方人机交互。最后，我们确定了关键领域，需要进行更多研究才能实现人机对话中的口语交流。并且可以用于会话系统中的转弯。此后，我们将回顾在转弯建模方面所做的工作，包括转弯终点检测，用户中断处理，转弯提示的产生以及多方人机交互。最后，我们确定了关键领域，需要进行更多研究才能实现人机对话中的口语交流。

更新日期：2020-12-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文