当前位置: X-MOL 学术IEEE J. Sel. Top. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies
IEEE Journal of Selected Topics in Signal Processing ( IF 7.5 ) Pub Date : 2010-12-01 , DOI: 10.1109/jstsp.2010.2076013
Vikramjit Mitra 1 , Hosung Nam , Carol Y Espy-Wilson , Elliot Saltzman , Louis Goldstein
Affiliation  

Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed “speech-inversion.” This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.

中文翻译:

从声学中检索区域变量:不同机器学习策略的比较

许多不同的研究声称发音信息可用于提高自动语音识别系统的性能。不幸的是,在典型的说话者-听者情况下不容易获得这样的发音信息。因此,必须在通常称为“语音反转”的过程中从声学信号中估计此类信息。本研究旨在提出和比较语音反转的各种机器学习策略:轨迹混合密度网络(TMDNs)、前馈人工神经网络(FF-ANN)、支持向量回归(SVR)、自回归人工神经网络(AR-ANN)、和远端监督学习(DSL)。此外,使用 Haskins Laboratories 语音生成模型生成的数据库,
更新日期:2010-12-01
down
wechat
bug