Admitting the addressee detection faultiness of voice assistants to improve the activation performance using a continuous learning framework,Cognitive Systems Research

当前位置： X-MOL 学术 › Cogn. Syst. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Admitting the addressee detection faultiness of voice assistants to improve the activation performance using a continuous learning framework
Cognitive Systems Research ( IF 2.1 ) Pub Date : 2021-08-02 , DOI: 10.1016/j.cogsys.2021.07.005
Ingo Siegert ₁ , Norman Weißkirchen ₂ , Julia Krüger ₃ , Oleg Akhtiamov _{4,

5} , Andreas Wendemuth ₂

Affiliation

The main promise of voice assistants is their ability to correctly interpret and learn from user input as well as the ability to utilize this knowledge to achieve specific goals and tasks. These systems need predetermined activation actions to start a conversation. Unfortunately, the typically used solution, wake-words, force an unnatural interaction. Furthermore, this method can also confuse when the wake-word, or a phonetically similar phrase, has been said but no interaction with the system is intended by the user. Thereby, the system not only lacks the adequacy of interpersonal interaction, it moreover suffers from an addressee detection faultiness. Although various aspects have already been investigated in this field of acoustic addressee detection research, we demonstrated that the test data used so far rely on ideal conditions: The dialog complexity between human–human and human–device interactions is essentially different while in reality, the behavior of each individual addressing either another human or a device is of large variation. Thus the problem of addressee detection is simplified too much. Our approach works with a specifically designed dataset comprising of human–human and human–computer interactions of similar dialog complexity. Our proposed addressee detection faultiness framework actively communicates the system’s uncertainty that may arise. In connection with a continuous learning framework, this enables a voice assistant system to adapt itself to the users’ individual addressee behavior. This approach achieves significantly improved classification rates of 85.77%, which gives an absolute improvement of 32.22% in comparison to similar experiments employing human annotations as ground truth.

中文翻译：

使用持续学习框架承认语音助手的收件人检测错误以提高激活性能

语音助手的主要承诺是它们能够正确解释和学习用户输入，以及利用这些知识来实现特定目标和任务的能力。这些系统需要预先确定的激活动作来开始对话。不幸的是，通常使用的解决方案唤醒词会强制进行不自然的交互。此外，当唤醒词或语音相似的短语已被说出但用户不打算与系统进行交互时，此方法也会造成混淆。因此，该系统不仅缺乏人际交互的充分性，而且还存在收件人检测错误。尽管已经在声学收件人检测研究领域对各个方面进行了调查，但我们证明迄今为止使用的测试数据依赖于理想条件：人与人之间和人与设备交互之间的对话复杂性本质上是不同的，而在现实中，每个人针对另一个人或设备的行为差异很大。因此，收件人检测的问题被简化了太多。我们的方法适用于一个专门设计的数据集，该数据集包括具有相似对话复杂性的人-人和人-机交互。我们提出的收件人检测故障框架主动传达系统可能出现的不确定性。结合持续学习框架，这使语音助手系统能够适应用户的个人收件人行为。这种方法显着提高了 85.77% 的分类率，绝对提高了 32。

更新日期：2021-08-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11