当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Web-based environment for user generation of spoken dialog for virtual assistants
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2018-11-16 , DOI: 10.1186/s13636-018-0142-8
Ryota Nishimura , Daisuke Yamamoto , Takahiro Uchiya , Ichi Takumi

In this paper, a web-based spoken dialog generation environment which enables users to edit dialogs with a video virtual assistant is developed and to also select the 3D motions and tone of voice for the assistant. In our proposed system, “anyone” can “easily” post/edit contents of the dialog for the dialog system. The dialog type corresponding to the system is limited to the question-and-answer type dialog, in order to avoid editing conflicts caused by editing by multiple users. The spoken dialog sharing service and FST generator generates spoken dialog content for the MMDAgent spoken dialog system toolkit, which includes a speech recognizer, a dialog control unit, a speech synthesizer, and a virtual agent. For dialog content creation, question-and-answer dialogs posted by users and FST templates are used. The proposed system was operated for more than a year in a student lounge at the Nagoya Institute of Technology, where users added more than 500 dialogs during the experiment. Images were also registered to 65% of the postings. The most posted category is related to “animation, video games, manga.”The system was subjected to open examination by tourist information staff who had no prior experience with spoken dialog systems. Based on their impressions of tourist use of the dialog system, they shortened the length of some of the system’s responses and added pauses to the longer responses to make them easier to understand.


基于 Web 的环境,用于用户生成虚拟助手的语音对话

在本文中,开发了一种基于网络的语音对话生成环境,使用户能够使用视频虚拟助手编辑对话,并为助手选择 3D 动作和语气。在我们提议的系统中,“任何人”都可以“轻松”发布/编辑对话系统的对话内容。系统对应的对话类型仅限于问答式对话,以避免多人编辑造成编辑冲突。口语对话共享服务和 FST 生成器为 MMDAgent 口语对话系统工具包生成口语对话内容,该工具包包括语音识别器、对话控制单元、语音合成器和虚拟代理。对于对话内容创建,使用用户发布的问答对话和 FST 模板。提议的系统在名古屋工业大学的学生休息室运行了一年多,用户在实验过程中添加了 500 多个对话。65% 的帖子也注册了图片。发布最多的类别与“动画、视频游戏、漫画”相关。该系统接受了之前没有使用语音对话系统经验的旅游信息人员的公开检查。根据他们对游客使用对话系统的印象,他们缩短了一些系统响应的长度,并在较长的响应中添加了停顿,使它们更容易理解。发布最多的类别与“动画、视频游戏、漫画”相关。该系统接受了之前没有使用语音对话系统经验的旅游信息人员的公开检查。根据他们对游客使用对话系统的印象,他们缩短了一些系统响应的长度,并在较长的响应中添加了停顿,使它们更容易理解。发布最多的类别与“动画、视频游戏、漫画”相关。该系统接受了之前没有使用语音对话系统经验的旅游信息人员的公开检查。根据他们对游客使用对话系统的印象,他们缩短了一些系统响应的长度,并在较长的响应中添加了停顿,使它们更容易理解。