当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2018-03-20 , DOI: 10.1109/taslp.2017.2740000
Geoffrey S Meltzner 1 , James T Heaton 2 , Yunbin Deng 3 , Gianluca De Luca 4 , Serge H Roy 4 , Joshua C Kline 4
Affiliation  

Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speech-to-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, 8 individuals at least 6 months after total laryngectomy were recorded using 8 sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2,500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full 8-sensor set (averaging 9.5% for the top 4 participants), and 13.6% when reducing the sensor set to 4 locations per individual (n=7). This study provides a compelling proof-of-concept for sEMG-based alaryngeal speech recognition, with the strong potential to further improve recognition performance.

中文翻译:

沉默语音识别是喉切除术患者的一种替代性交流设备。

每年,成千上万的人由于外伤或疾病而需要手术切除其喉部(语音盒),因此需要替代的语音源或辅助装置进行口头交流。尽管喉切除术后自然声音会丢失,但大多数控制语音清晰度的肌肉仍保持完整。可以从颈部和面部记录语音肌肉组织的表面肌电图(sEMG)活动,并将其用于自动语音识别,以提供语音转文本或合成语音作为替代的交流方式。即使以无声(人声)的方式来讲话或讲话,也是如此,这使其成为喉切除术后的适当交流平台。在这项研究中,在完全喉切除术后至少6个月内,使用8个sEMG传感器在其脸部(4)和脖子(4)上记录了8个人,同时阅读了由2500字的词汇构成的短语。一组独特的短语用于为英语的39个常用音素中的每一个训练基于音素的识别模型,其余的短语用于基于从运行语音中识别出的音素来测试模型的单词识别。完整的8个传感器组的单词错误率平均为10.3%(前4个参与者平均为9.5%),而将传感器组的位置降低为每个人4个位置时,则为13.6%(n = 7)。这项研究为基于sEMG的语音识别提供了令人信服的概念证明,具有进一步提高识别性能的强大潜力。一组独特的短语用于为英语的39个常用音素中的每一个训练基于音素的识别模型,其余的短语用于基于从运行语音中识别出的音素来测试模型的单词识别。完整的8个传感器组的单词错误率平均为10.3%(前4个参与者平均为9.5%),而将传感器组的位置降低为每个人4个位置时,则为13.6%(n = 7)。这项研究为基于sEMG的语音识别提供了令人信服的概念证明,具有进一步提高识别性能的强大潜力。一组独特的短语用于为英语的39个常用音素中的每一个训练基于音素的识别模型,其余的短语用于基于从运行语音中识别出的音素来测试模型的单词识别。完整的8个传感器组的单词错误率平均为10.3%(前4个参与者平均为9.5%),而将传感器组的位置降低为每个人4个位置时,则为13.6%(n = 7)。这项研究为基于sEMG的语音识别提供了令人信服的概念证明,具有进一步提高识别性能的强大潜力。其余的短语用于基于运行语音的音素识别来测试模型的单词识别。完整的8个传感器组的单词错误率平均为10.3%(前4个参与者平均为9.5%),而将传感器组的位置降低为每个人4个位置时,则为13.6%(n = 7)。这项研究为基于sEMG的语音识别提供了令人信服的概念证明,具有进一步提高识别性能的强大潜力。其余的短语用于基于运行语音的音素识别来测试模型的单词识别。完整的8个传感器组的单词错误率平均为10.3%(前4个参与者平均为9.5%),而将传感器组的位置降低为每个人4个位置时,则为13.6%(n = 7)。这项研究为基于sEMG的语音识别提供了令人信服的概念证明,具有进一步提高识别性能的强大潜力。
更新日期:2019-11-01
down
wechat
bug