Performance analysis of ASR system in hybrid DNN-HMM framework using a PWL euclidean activation function,Frontiers of Computer Science

当前位置： X-MOL 学术 › Front. Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Performance analysis of ASR system in hybrid DNN-HMM framework using a PWL euclidean activation function
Frontiers of Computer Science ( IF 4.2 ) Pub Date : 2021-06-05 , DOI: 10.1007/s11704-020-9419-z
Anirban Dutta , Gudmalwar Ashishkumar , Ch V. Rama Rao

Automatic Speech Recognition (ASR) is the process of mapping an acoustic speech signal into a human readable text format. Traditional systems exploit the Acoustic Component of ASR using the Gaussian Mixture Model — Hidden Markov Model (GMM-HMM) approach. Deep Neural Network (DNN) opens up new possibilities to overcome the shortcomings of conventional statistical algorithms. Recent studies modeled the acoustic component of ASR system using DNN in the so called hybrid DNN-HMM approach. In the context of activation functions used to model the non-linearity in DNN, Rectified Linear Units (ReLU) and maxout units are mostly used in ASR systems. This paper concentrates on the acoustic component of a hybrid DNN-HMM system by proposing an efficient activation function for the DNN network. Inspired by previous works, euclidean norm activation function is proposed to model the non-linearity of the DNN network. Such non-linearity is shown to belong to the family of Piecewise Linear (PWL) functions having distinct features. These functions can capture deep hierarchical features of the pattern. The relevance of the proposal is examined in depth both theoretically and experimentally. The performance of the developed ASR system is evaluated in terms of Phone Error Rate (PER) using TIMIT database. Experimental results achieve a relative increase in performance by using the proposed function over conventional activation functions.

中文翻译：

使用 PWL 欧几里德激活函数在混合 DNN-HMM 框架中的 ASR 系统性能分析

自动语音识别 (ASR) 是将声学语音信号映射为人类可读文本格式的过程。传统系统使用高斯混合模型 - 隐马尔可夫模型 (GMM-HMM) 方法来利用 ASR 的声学分量。深度神经网络 (DNN) 为克服传统统计算法的缺点开辟了新的可能性。最近的研究在所谓的混合 DNN-HMM 方法中使用 DNN 对 ASR 系统的声学组件进行了建模。在用于对 DNN 中的非线性进行建模的激活函数的上下文中，整流线性单元 (ReLU) 和 maxout 单元主要用于 ASR 系统。本文通过为 DNN 网络提出有效的激活函数，专注于混合 DNN-HMM 系统的声学组件。受之前作品的启发，提出了欧几里得范数激活函数来模拟 DNN 网络的非线性。这种非线性被证明属于具有不同特征的分段线性 (PWL) 函数系列。这些函数可以捕获模式的深层层次特征。该提案的相关性在理论上和实验上都得到了深入研究。使用 TIMIT 数据库根据电话错误率 (PER) 评估开发的 ASR 系统的性能。实验结果通过使用所提出的函数而不是传统的激活函数实现了性能的相对提高。该提案的相关性在理论上和实验上都得到了深入研究。使用 TIMIT 数据库根据电话错误率 (PER) 评估开发的 ASR 系统的性能。实验结果通过使用所提出的函数而不是传统的激活函数实现了性能的相对提高。该提案的相关性在理论上和实验上都得到了深入研究。使用 TIMIT 数据库根据电话错误率 (PER) 评估开发的 ASR 系统的性能。通过使用所提出的函数，实验结果实现了性能的相对提高，而不是传统的激活函数。

更新日期：2021-06-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>