
显示样式: 排序: IF: - GO 导出
-
Supervised Speech Separation Based on Deep Learning: An Overview. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2019-06-22 DeLiang Wang,Jitong Chen
Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation
-
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2019-05-06 Yi Luo,Nima Mesgarani
Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude
-
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2018-10-15 Ke Tan,Jitong Chen,DeLiang Wang
For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture
-
Acoustic Denoising using Dictionary Learning with Spectral and Temporal Regularization. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2018-10-03 Colin Vaz,Vikram Ramanarayanan,Shrikanth Narayanan
We present a method for speech enhancement of data collected in extremely noisy environments, such as those obtained during magnetic resonance imaging (MRI) scans. We propose an algorithm based on dictionary learning to perform this enhancement. We use complex nonnegative matrix factorization with intra-source additivity (CMF-WISA) to learn dictionaries of the noise and speech+noise portions of the
-
Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM Neural Network. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2018-10-03 Myungjong Kim,Beiming Cao,Ted Mau,Jun Wang
Silent speech recognition (SSR) converts non-audio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing
-
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2018-09-17 Yan Zhao,Zhong-Qiu Wang,DeLiang Wang
In real-world situations, speech reaching our ears is commonly corrupted by both room reverberation and background noise. These distortions are detrimental to speech intelligibility and quality, and also pose a serious problem to many speech-related applications, including automatic speech and speaker recognition. In order to deal with the combined effects of noise and reverberation, we propose a two-stage
-
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2018-08-17 Donald S Williamson,DeLiang Wang
In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing
-
Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2018-03-20 Geoffrey S Meltzner,James T Heaton,Yunbin Deng,Gianluca De Luca,Serge H Roy,Joshua C Kline
Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from
-
Deep Learning Based Binaural Speech Separation in Reverberant Environments. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2017-10-24 Xueliang Zhang,DeLiang Wang
Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning problem, and we employ deep learning to map from both spatial and spectral features to a training target. With binaural inputs, we first apply
-
Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2017-07-25 Ashwin Bellur,Mounya Elhilali
Parsing natural acoustic scenes using computational methodologies poses many challenges. Given the rich and complex nature of the acoustic environment, data mismatch between train and test conditions is a major hurdle in data-driven audio processing systems. In contrast, the brain exhibits a remarkable ability at segmenting acoustic scenes with relative ease. When tackling challenging listening conditions
-
The Impact of Data Dependence on Speaker Recognition Evaluation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2017-07-01 Jin Chu Wu,Alvin F Martin,Craig S Greenberg,Raghu N Kacker
The data dependency due to multiple use of the same subjects has impact on the standard error (SE) of the detection cost function (DCF) in speaker recognition evaluation. The DCF is defined as a weighted sum of the probabilities of type I and type II errors at a given threshold. A two-layer data structure is constructed: target scores are grouped into target sets based on the dependency, and likewise
-
Robust Harmonic Features for Classification-Based Pitch Estimation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2017-05-01 Dongmei Wang,Chengzhu Yu,John H L Hansen
Pitch estimation in diverse naturalistic audio streams remains a challenge for speech processing and spoken language technology. In this study, we investigate the use of robust harmonic features for classification-based pitch estimation. The proposed pitch estimation algorithm is composed of two stages: pitch candidate generation and target pitch selection. Based on energy intensity and spectral envelope
-
A Deep Ensemble Learning Method for Monaural Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2016-12-06 Xiao-Lei Zhang,DeLiang Wang
Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between
-
The Hearing-Aid Audio Quality Index (HAAQI). IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2016-05-03 James M Kates,Kathryn H Arehart
This paper presents an index designed to predict music quality for individuals listening through hearing aids. The index is "intrusive", that is, it compares the degraded signal being evaluated to a reference signal. The index is based on a model of the auditory periphery that includes the effects of hearing loss. Outputs from the auditory model are used to measure changes in the signal time-frequency
-
Complex Ratio Masking for Monaural Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2016-04-14 Donald S Williamson,Yuxuan Wang,DeLiang Wang
Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider
-
Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer. IEEE ACM Trans. Audio Speech Lang. Process. (IF 3.398) Pub Date : 2016-04-12 Daryush D Mehta,Jarrad H Van Stan,Robert E Hillman
Monitoring subglottal neck-surface acceleration has received renewed attention due to the ability of low-profile accelerometers to confidentially and noninvasively track properties related to normal and disordered voice characteristics and behavior. This study investigated the ability of subglottal neck-surface acceleration to yield vocal function measures traditionally derived from the acoustic voice