Room-localized speech activity detection in multi-microphone smart homes,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Room-localized speech activity detection in multi-microphone smart homes
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2019-08-27 , DOI: 10.1186/s13636-019-0158-8
Panagiotis Giannoulis , Gerasimos Potamianos , Petros Maragos

Voice-enabled interaction systems in domestic environments have attracted significant interest recently, being the focus of smart home research projects and commercial voice assistant home devices. Within the multi-module pipelines of such systems, speech activity detection (SAD) constitutes a crucial component, providing input to their activation and speech recognition subsystems. In typical multi-room domestic environments, SAD may also convey spatial intelligence to the interaction, in addition to its traditional temporal segmentation output, by assigning speech activity at the room level. Such room-localized SAD can, for example, disambiguate user command referents, allow localized system feedback, and enable parallel voice interaction sessions by multiple subjects in different rooms. In this paper, we investigate a room-localized SAD system for smart homes equipped with multiple microphones distributed in multiple rooms, significantly extending our earlier work. The system employs a two-stage algorithm, incorporating a set of hand-crafted features specially designed to discriminate room-inside vs. room-outside speech at its second stage, refining SAD hypotheses obtained at its first stage by traditional statistical modeling and acoustic front-end processing. Both algorithmic stages exploit multi-microphone information, combining it at the signal, feature, or decision level. The proposed approach is extensively evaluated on both simulated and real data recorded in a multi-room, multi-microphone smart home, significantly outperforming alternative baselines. Further, it remains robust to reduced microphone setups, while also comparing favorably to deep learning-based alternatives.

中文翻译：

多麦克风智能家居中房间本地化语音活动检测

家庭环境中的语音交互系统最近引起了极大的兴趣，成为智能家居研究项目和商用语音助手家庭设备的重点。在此类系统的多模块管道中，语音活动检测 (SAD) 是一个关键组件，为其激活和语音识别子系统提供输入。在典型的多房间家庭环境中，除了传统的时间分割输出外，SAD 还可以通过在房间级别分配语音活动，将空间智能传递给交互。例如，这种房间本地化的 SAD 可以消除用户命令所指对象的歧义，允许本地化系统反馈，并允许不同房间中的多个主体进行并行语音交互会话。在本文中，我们研究了一种用于智能家居的房间本地化 SAD 系统，该系统配备分布在多个房间的多个麦克风，显着扩展了我们早期的工作。该系统采用两阶段算法，结合了一组专门设计用于在第二阶段区分室内和室外语音的手工特征，通过传统的统计建模和声学前沿改进在第一阶段获得的 SAD 假设- 结束处理。两个算法阶段都利用多麦克风信息，在信号、特征或决策级别将其组合。所提出的方法在多房间、多麦克风智能家居中记录的模拟和真实数据上进行了广泛评估，显着优于替代基线。此外，它对减少麦克风设置仍然很健壮，

更新日期：2019-08-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文