当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis Tool for Singers
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-05-01 , DOI: arxiv-2105.00173
Daniel Szelogowski

Current computational-emotion research has focused on applying acoustic properties to analyze how emotions are perceived mathematically or used in natural language processing machine learning models. With most recent interest being in analyzing emotions from the spoken voice, little experimentation has been performed to discover how emotions are recognized in the singing voice -- both in noiseless and noisy data (i.e., data that is either inaccurate, difficult to interpret, has corrupted/distorted/nonsense information like actual noise sounds in this case, or has a low ratio of usable/unusable information). Not only does this ignore the challenges of training machine learning models on more subjective data and testing them with much noisier data, but there is also a clear disconnect in progress between advancing the development of convolutional neural networks and the goal of emotionally cognizant artificial intelligence. By training a new model to include this type of information with a rich comprehension of psycho-acoustic properties, not only can models be trained to recognize information within extremely noisy data, but advancement can be made toward more complex biofeedback applications -- including creating a model which could recognize emotions given any human information (language, breath, voice, body, posture) and be used in any performance medium (music, speech, acting) or psychological assistance for patients with disorders such as BPD, alexithymia, autism, among others. This paper seeks to reflect and expand upon the findings of related research and present a stepping-stone toward this end goal.

中文翻译:

歌声的情感识别:面向歌手的实时分析工具

当前的计算情感研究集中于应用声学特性来分析情感在数学上如何被感知或如何在自然语言处理机器学习模型中使用。由于最近的兴趣是分析口头语音中的情绪,因此几乎没有进行实验来发现在语音中如何识别情绪-无论是在无噪音还是嘈杂的数据中(即,不准确,难以解释的数据都有损坏/失真/无意义的信息,例如这种情况下的实际噪音,或可用/不可用信息的比率低)。这不仅忽略了在更主观的数据上训练机器学习模型并使用噪声更大的数据对其进行测试的挑战,但是,在促进卷积神经网络的发展与情感认知人工智能的目标之间也存在明显的脱节。通过训练一种新模型以包含具有丰富的心理声学特性的此类信息,不仅可以训练模型以识别极其嘈杂的数据中的信息,而且可以朝着更复杂的生物反馈应用(包括创建一个该模型可以识别任何人类信息(语言,呼吸,声音,身体,姿势)下的情绪,并且可以用于任何表现媒介(音乐,言语,表演)或心理帮助中,用于BPD,失语症,自闭症等疾病的患者其他。
更新日期:2021-05-04
down
wechat
bug