A novel privacy-preserving speech recognition framework using bidirectional LSTM,Journal of Cloud Computing

当前位置： X-MOL 学术 › J. Cloud Comp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A novel privacy-preserving speech recognition framework using bidirectional LSTM
Journal of Cloud Computing ( IF 3.418 ) Pub Date : 2020-07-08 , DOI: 10.1186/s13677-020-00186-7
Qingren Wang , Chuankai Feng , Yan Xu , Hong Zhong , Victor S. Sheng

Utilizing speech as the transmission medium in Internet of things (IoTs) is an effective way to reduce latency while improving the efficiency of human-machine interaction. In the field of speech recognition, Recurrent Neural Network (RNN) has significant advantages to achieve accuracy improvement on speech recognition. However, some of RNN-based intelligence speech recognition applications are insufficient in the privacy-preserving of speech data, and others with privacy-preserving are time-consuming, especially about model training and speech recognition. Therefore, in this paper we propose a novel Privacy-preserving Speech Recognition framework using Bidirectional Long short-term memory neural network, namely PSRBL. On the one hand, PSRBL designs new functions to construct security activation functions by combing with an additive secret sharing protocol, namely a secure piecewise-linear Sigmoid and a secure piecewise-linear Tanh respectively, to achieve privacy-preserving of speech data during speech recognition process running on edge servers. On the other hand, in order to reduce the time spent on both the training and the recognition of the speech model while keeping high accuracy during speech recognition process, PSRBL first utilizes secure activation functions to refit original activation functions in the bidirectional Long Short-Term Memory neural network (LSTM), and then makes full use of the left and the right context information of speech data by employing bidirectional LSTM. Experiments conducted on the speech dataset TIMIT show that our framework PSRBL performs well. Specifically compared with the state-of-the-art ones, PSRBL significantly reduces the time consumption on both the training and the recognition of the speech model under the premise that PSRBL and the comparisons are consistent in the privacy-preserving of speech data.

中文翻译：

使用双向LSTM的新颖的隐私保护语音识别框架

将语音用作物联网（IoT）的传输介质是减少延迟并提高人机交互效率的有效方法。在语音识别领域，递归神经网络（RNN）具有显着的优势，可以提高语音识别的准确性。但是，某些基于RNN的智能语音识别应用程序在语音数据的隐私保护方面是不够的，而其他具有隐私保护的应用程序则很耗时，尤其是在模型训练和语音识别方面。因此，在本文中，我们提出了一种新的使用双向长短期记忆神经网络（PSRBL）的隐私保护语音识别框架。一方面，PSRBL设计了新功能，通过与附加秘密共享协议（分别为安全分段线性Sigmoid和安全分段线性Tanh）相结合来构造安全激活功能，以在边缘服务器上运行的语音识别过程中实现语音数据的隐私保护。。另一方面，为了减少在训练和识别语音模型时花费的时间，同时又在语音识别过程中保持较高的准确性，PSRBL首先利用安全激活功能在双向长短期中重新拟合原始激活功能。记忆神经网络（LSTM），然后通过使用双向LSTM充分利用语音数据的左右上下文信息。在语音数据集TIMIT上进行的实验表明，我们的框架PSRBL表现良好。具体而言，与最新技术相比，在PSRBL和比较在语音数据的隐私保护方面保持一致的前提下，PSRBL大大减少了训练和语音模型识别方面的时间消耗。

更新日期：2020-07-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>