当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis Tool for Singers
arXiv - CS - Sound Pub Date : 2021-05-01 , DOI: arxiv-2105.00173 Daniel Szelogowski
arXiv - CS - Sound Pub Date : 2021-05-01 , DOI: arxiv-2105.00173 Daniel Szelogowski
Current computational-emotion research has focused on applying acoustic
properties to analyze how emotions are perceived mathematically or used in
natural language processing machine learning models. With most recent interest
being in analyzing emotions from the spoken voice, little experimentation has
been performed to discover how emotions are recognized in the singing voice --
both in noiseless and noisy data (i.e., data that is either inaccurate,
difficult to interpret, has corrupted/distorted/nonsense information like
actual noise sounds in this case, or has a low ratio of usable/unusable
information). Not only does this ignore the challenges of training machine
learning models on more subjective data and testing them with much noisier
data, but there is also a clear disconnect in progress between advancing the
development of convolutional neural networks and the goal of emotionally
cognizant artificial intelligence. By training a new model to include this type
of information with a rich comprehension of psycho-acoustic properties, not
only can models be trained to recognize information within extremely noisy
data, but advancement can be made toward more complex biofeedback applications
-- including creating a model which could recognize emotions given any human
information (language, breath, voice, body, posture) and be used in any
performance medium (music, speech, acting) or psychological assistance for
patients with disorders such as BPD, alexithymia, autism, among others. This
paper seeks to reflect and expand upon the findings of related research and
present a stepping-stone toward this end goal.
中文翻译:
歌声的情感识别:面向歌手的实时分析工具
当前的计算情感研究集中于应用声学特性来分析情感在数学上如何被感知或如何在自然语言处理机器学习模型中使用。由于最近的兴趣是分析口头语音中的情绪,因此几乎没有进行实验来发现在语音中如何识别情绪-无论是在无噪音还是嘈杂的数据中(即,不准确,难以解释的数据都有损坏/失真/无意义的信息,例如这种情况下的实际噪音,或可用/不可用信息的比率低)。这不仅忽略了在更主观的数据上训练机器学习模型并使用噪声更大的数据对其进行测试的挑战,但是,在促进卷积神经网络的发展与情感认知人工智能的目标之间也存在明显的脱节。通过训练一种新模型以包含具有丰富的心理声学特性的此类信息,不仅可以训练模型以识别极其嘈杂的数据中的信息,而且可以朝着更复杂的生物反馈应用(包括创建一个该模型可以识别任何人类信息(语言,呼吸,声音,身体,姿势)下的情绪,并且可以用于任何表现媒介(音乐,言语,表演)或心理帮助中,用于BPD,失语症,自闭症等疾病的患者其他。
更新日期:2021-05-04
中文翻译:
歌声的情感识别:面向歌手的实时分析工具
当前的计算情感研究集中于应用声学特性来分析情感在数学上如何被感知或如何在自然语言处理机器学习模型中使用。由于最近的兴趣是分析口头语音中的情绪,因此几乎没有进行实验来发现在语音中如何识别情绪-无论是在无噪音还是嘈杂的数据中(即,不准确,难以解释的数据都有损坏/失真/无意义的信息,例如这种情况下的实际噪音,或可用/不可用信息的比率低)。这不仅忽略了在更主观的数据上训练机器学习模型并使用噪声更大的数据对其进行测试的挑战,但是,在促进卷积神经网络的发展与情感认知人工智能的目标之间也存在明显的脱节。通过训练一种新模型以包含具有丰富的心理声学特性的此类信息,不仅可以训练模型以识别极其嘈杂的数据中的信息,而且可以朝着更复杂的生物反馈应用(包括创建一个该模型可以识别任何人类信息(语言,呼吸,声音,身体,姿势)下的情绪,并且可以用于任何表现媒介(音乐,言语,表演)或心理帮助中,用于BPD,失语症,自闭症等疾病的患者其他。