当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic segmentation of infant cry signals using hidden Markov models
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2018-01-26 , DOI: 10.1186/s13636-018-0124-x
Gaurav Naithani , Jaana Kivinummi , Tuomas Virtanen , Outi Tammela , Mikko J. Peltola , Jukka M. Leppänen

Automatic extraction of acoustic regions of interest from recordings captured in realistic clinical environments is a necessary preprocessing step in any cry analysis system. In this study, we propose a hidden Markov model (HMM) based audio segmentation method to identify the relevant acoustic parts of the cry signal (i.e., expiratory and inspiratory phases) from recordings made in natural environments with various interfering acoustic sources. We examine and optimize the performance of the system by using different audio features and HMM topologies. In particular, we propose using fundamental frequency and aperiodicity features. We also propose a method for adapting the segmentation system trained on acoustic material captured in a particular acoustic environment to a different acoustic environment by using feature normalization and semi-supervised learning (SSL). The performance of the system was evaluated by analyzing a total of 3 h and 10 min of audio material from 109 infants, captured in a variety of recording conditions in hospital wards and clinics. The proposed system yields frame-based accuracy up to 89.2%. We conclude that the proposed system offers a solution for automated segmentation of cry signals in cry analysis applications.

中文翻译:

使用隐马尔可夫模型自动分割婴儿哭声信号

从真实临床环境中捕获的录音中自动提取感兴趣的声学区域是任何哭声分析系统中必要的预处理步骤。在这项研究中,我们提出了一种基于隐马尔可夫模型 (HMM) 的音频分割方法,以从自然环境中使用各种干扰声源进行的录音中识别哭声信号的相关声学部分(即呼气和吸气相)。我们通过使用不同的音频特性和 HMM 拓扑来检查和优化系统的性能。特别是,我们建议使用基频和非周期性特征。我们还提出了一种方法,通过使用特征归一化和半监督学习 (SSL),使在特定声学环境中捕获的声学材料上训练的分割系统适应不同的声学环境。该系统的性能通过分析 109 名婴儿在医院病房和诊所的各种录音条件下采集的总共 3 小时 10 分钟的音频材料进行评估。所提出的系统产生高达 89.2% 的基于帧的准确率。我们得出结论,所提出的系统为哭声分析应用中的哭声信号的自动分割提供了一种解决方案。所提出的系统产生高达 89.2% 的基于帧的准确率。我们得出结论,所提出的系统为哭声分析应用中的哭声信号的自动分割提供了一种解决方案。所提出的系统产生高达 89.2% 的基于帧的准确率。我们得出结论,所提出的系统为哭声分析应用中的哭声信号的自动分割提供了一种解决方案。
更新日期:2018-01-26
down
wechat
bug