样式: 排序: IF: - GO 导出 标记为已读
-
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-14 Rui Wang, Li Li, Tomoki Toda
-
Active Discovering New Slots for Task-Oriented Conversation IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-13 Yuxia Wu, Tianhao Dai, Zhedong Zheng, Lizi Liao
-
Unsupervised Disentanglement Learning Model for Exemplar-Guided Paraphrase Generation IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-13 Linjian Li, Yi Cai, Xin Wu
-
Question-Directed Reasoning With Relation-Aware Graph Attention Network for Complex Question Answering Over Knowledge Graph IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-13 Geng Zhang, Jin Liu, Guangyou Zhou, Kunsong Zhao, Zhiwen Xie, Bo Huang
-
HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-11 Aidan O. T. Hogg, Mads Jenkins, He Liu, Isaac Squires, Samuel J. Cooper, Lorenzo Picinali
-
KGAgent: Learning a Deep Reinforced Agent for Keyphrase Generation IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-11 Yu Yao, Peng Yang, Guangzhen Zhao, Guoshun Yin
-
BaSFormer: A Balanced Sparsity Regularized Attention Network for Transformer IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-06 Shuoran Jiang, Qingcai Chen, Yang Xiang, Youcheng Pan, Xiangping Wu
-
Let Topic Flow: A Unified Topic-guided Segment-wise Dialogue Summarization Framework IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-06 Qinyu Han, Zhihao Yang, Hongfei Lin, Tian Qin
-
Reverberant Source Separation using NTF with Delayed Subsources and Spatial Priors IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-06 Mieszko Fraś, Konrad Kowalczyk
-
A User-centric Approach for Deep Residual-Echo Suppression in Double-talk IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-06 Amir Ivry, Israel Cohen, Baruch Berdugo
-
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-04 Yingming Gao, Peter Birkholz, Ya Li
-
Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-04 Luciana M. X. de Souza, Márcio H. Costa, Renata C. Borges
-
Decomposed Meta-Learning for Few-Shot Sequence Labeling IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-03-04 Tingting Ma, Qianhui Wu, Huiqiang Jiang, Jieru Lin, Börje F. Karlsson, Tiejun Zhao, Chin-Yew Lin
-
On Local Temporal Embedding for Semi-Supervised Sound Event Detection IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-28 Lijian Gao, Qirong Mao, Ming Dong
-
Hierarchical Multi-granularity Interaction Graph Convolutional Network for Long Document Classification IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-28 Tengfei Liu, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin
-
Efficient Joint Optimization of Sampling Rate Offsets Using Entire Multichannel Signal IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-23 Yoshiki Masuyama, Kouei Yamaoka, Takao Kawamura, Nobutaka Ono
-
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-23 Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas
-
Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-23 Taihui Wang, Feiran Yang, Jun Yang
-
Time-domain Speech Super-resolution with GAN based Modeling for Telephony Speaker Verification IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-23 Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak
-
EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-23 Chenfeng Miao, Qingying Zhu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao
-
Acoustic Imaging with Circular Microphone Array: a new Approach for Sound Field Analysis IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-23 Marco Olivieri, Amy Bastine, Mirco Pezzoli, Fabio Antonacci, Thushara Abhayapala, Augusto Sarti
-
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-23 Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari
-
Speech Enhancement for Cochlear Implant Recipients using Deep Complex Convolution Transformer with Frequency Transformation IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-22 Nursadul Mamun, John H. L. Hansen
-
R 2: A Novel Recall & Ranking Framework for Legal Judgment Prediction IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-19 Yuquan Le, Zhe Quan, Jiawei Wang, Da Cao, Kenli Li
-
Please donate to save a Life: Inducing Politeness to handle Resistance in Persuasive Dialogue Agents IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-19 Kshitij Mishra, Mauajama Firdaus, Asif Ekbal
-
NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-19 Adrián Barahona-Ríos, Tom Collins
-
Accented Text-to-Speech Synthesis with Limited Data IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-16 Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li
-
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-16 Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian
-
Cross-Domain Aspect-based Sentiment Classification with Tripartite Graph Modeling IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-14 Xiaotong Jiang, Ruirui Bai, Zhongqing Wang, Guodong Zhou
-
Constant Elevation-Beamwidth Beamforming with Concentric Ring Arrays IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2024-02-13 Orel Peretz, Israel Cohen
-
Distribution Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2022-02-01 Weijie Yu, Chen Xu, Jun Xu, Liang Pang, Ji-Rong Wen
Projecting the input text pair into a common semantic space where the matching function can be readily learned is an essential step for asymmetrical text matching. In the practice, it is often observed that the feature vectors from asymmetrical texts show a tendency to be gradually undistinguishable in the semantic space as the model is trained. However, the phenomenon is overlooked in existing studies
-
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2022-01-25 Ying Zhou, Xuefeng Liang, Yu Gu, Yifei Yin, Longshan Yao
In recent years, speech emotion recognition technology is of great significance in widespread applications such as call centers, social robots and health care. Thus, the speech emotion recognition has been attracted much attention in both industry and academic. Since emotions existing in an entire utterance may have varied probabilities, speech emotion is likely to be ambiguous, which poses great challenges
-
Cross-Domain Slot Filling as Machine Reading Comprehension: A New Perspective IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2022-01-05 Jian Liu, Mengshi Yu, Yufeng Chen, Jinan Xu
With intelligent dialogue systems becoming more and more important in our daily lives, slot filling, one of the most important components of an intelligent dialogue system, has gotten a lot of attention from academia and industry. Despite many advancements in the single-domain learning paradigm for slot filling, leveraging resources from different domains to boost learning for a target domain remains
-
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2022-01-26 Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen
Enhancementof a desired speech signal in the presence of competing or interfering speech remains an unsolved problem, as it can be hard to determine which of the speech signals is the one of interest. In this paper, we propose a multichannel noise reduction algorithm which uses the presence of the user’s own voice signal, e.g. during conversations with the target speaker, as an asset to efficiently
-
SSAP: Storylines and Sentiment Aware Pre-Trained Model for Story Ending Generation IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2022-01-25 Yongkang Liu, Qingbao Huang, Jing Li, Linzhang Mo, Yi Cai, Qing Li
As an interesting but under-explored task, story ending generation aims at generating an appropriate ending for an incomplete story. The challenges of the task are to deeply understand the story context, mine the storylines hidden in the story, and generate rational endings in logic and sentiment. Although existing pre-trained approaches have been proven effective to this task, how to learn to generate
-
Retrieve-and-Edit Domain Adaptation for End2End Aspect Based Sentiment Analysis IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2022-01-25 Zhuang Chen, Tieyun Qian
End-to-end aspect based sentiment analysis (E2E-ABSA) aims to jointly extract aspect terms and predict aspect-level sentiment for opinion reviews. Though supervised methods show effectiveness for E2E-ABSA tasks, the annotation cost is extremely high due to the necessity of fine-grained labels. Recent attempts alleviate this problem using the domain adaptation technique to transfer the word-level common
-
Improving Unsupervised Extractive Summarization by Jointly Modeling Facet and Redundancy IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2021-01-01 Xinnian Liang,Jing Li,Shuangzhi Wu,Mu Li,Zhoujun Li
-
Audio object classification using distributed beliefs and attention. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2020-01-15 Ashwin Bellur,Mounya Elhilali
One of the unique characteristics of human hearing is its ability to recognize acoustic objects even in presence of severe noise and distortions. In this work, we explore two mechanisms underlying this ability: 1) redundant mapping of acoustic waveforms along distributed latent representations and 2) adaptive feedback based on prior knowledge to selectively attend to targets of interest. We propose
-
Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM Neural Network. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-10-03 Myungjong Kim,Beiming Cao,Ted Mau,Jun Wang
Silent speech recognition (SSR) converts non-audio information such as articulatory movements into text. SSR has the potential to enable persons with laryngectomy to communicate through natural spoken expression. Current SSR systems have largely relied on speaker-dependent recognition models. The high degree of variability in articulatory patterns across different speakers has been a barrier for developing
-
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-08-17 Donald S Williamson,DeLiang Wang
In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing
-
Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-20 Geoffrey S Meltzner,James T Heaton,Yunbin Deng,Gianluca De Luca,Serge H Roy,Joshua C Kline
Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from
-
Deep Learning Based Binaural Speech Separation in Reverberant Environments. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2017-10-24 Xueliang Zhang,DeLiang Wang
Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning problem, and we employ deep learning to map from both spatial and spectral features to a training target. With binaural inputs, we first apply
-
Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2017-07-25 Ashwin Bellur,Mounya Elhilali
Parsing natural acoustic scenes using computational methodologies poses many challenges. Given the rich and complex nature of the acoustic environment, data mismatch between train and test conditions is a major hurdle in data-driven audio processing systems. In contrast, the brain exhibits a remarkable ability at segmenting acoustic scenes with relative ease. When tackling challenging listening conditions
-
The Impact of Data Dependence on Speaker Recognition Evaluation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2017-07-01 Jin Chu Wu,Alvin F Martin,Craig S Greenberg,Raghu N Kacker
The data dependency due to multiple use of the same subjects has impact on the standard error (SE) of the detection cost function (DCF) in speaker recognition evaluation. The DCF is defined as a weighted sum of the probabilities of type I and type II errors at a given threshold. A two-layer data structure is constructed: target scores are grouped into target sets based on the dependency, and likewise
-
Robust Harmonic Features for Classification-Based Pitch Estimation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2017-02-13 Dongmei Wang,Chengzhu Yu,John H L Hansen
Pitch estimation in diverse naturalistic audio streams remains a challenge for speech processing and spoken language technology. In this study, we investigate the use of robust harmonic features for classification-based pitch estimation. The proposed pitch estimation algorithm is composed of two stages: pitch candidate generation and target pitch selection. Based on energy intensity and spectral envelope
-
A Deep Ensemble Learning Method for Monaural Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2016-12-06 Xiao-Lei Zhang,DeLiang Wang
Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between
-
The Hearing-Aid Audio Quality Index (HAAQI). IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2016-05-03 James M Kates,Kathryn H Arehart
This paper presents an index designed to predict music quality for individuals listening through hearing aids. The index is "intrusive", that is, it compares the degraded signal being evaluated to a reference signal. The index is based on a model of the auditory periphery that includes the effects of hearing loss. Outputs from the auditory model are used to measure changes in the signal time-frequency
-
Complex Ratio Masking for Monaural Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2016-04-14 Donald S Williamson,Yuxuan Wang,DeLiang Wang
Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider
-
Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer. IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2016-04-12 Daryush D Mehta,Jarrad H Van Stan,Robert E Hillman
Monitoring subglottal neck-surface acceleration has received renewed attention due to the ability of low-profile accelerometers to confidentially and noninvasively track properties related to normal and disordered voice characteristics and behavior. This study investigated the ability of subglottal neck-surface acceleration to yield vocal function measures traditionally derived from the acoustic voice
-
Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-08-01 Chien-Yao Wang,Jia-Ching Wang,Andri Santoso,Chin-Chin Chiang,Chung-Hsien Wu
Automatic sound event recognition (SER) has recently attracted renewed interest. Although practical SER system has many useful applications in everyday life, SER is challenging owing to the variations among sounds and noises in the real-world environment. This paper presents a novel feature extraction and classification method to solve the problem of SER. An audio–visual descriptor, called the aud
-
Joint POS Tagging and Dependence Parsing With Transition-Based Neural Networks IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-08-01 Liner Yang,Meishan Zhang,Yang Liu,Maosong Sun,Nan Yu,Guohong Fu
While part-of-speech (POS) tagging and dependency parsing are observed to be closely related, existing work on joint modeling with manually crafted feature templates suffers from the feature sparsity and incompleteness problems. In this paper, we propose an approach to joint POS tagging and dependency parsing using transition-based neural networks. Three neural network based classifiers are designed
-
3D Room Geometry Inference Based on Room Impulse Response Stacks IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-05-01 Youssef El Baba,Andreas Walther,Emanuel A. P. Habets
Room geometry inference is concerned with the localization of reflective boundaries in an enclosed space. This paper outlines a method for inferring room geometry based on the positions of loudspeakers and real or image microphones, which are computed using sets of times of arrival (TOAs) obtained from room impulse responses (RIRs). These RIRs describe the acoustic propagation between the loudspeakers
-
Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Van Hai Do,Nancy F. Chen,Boon Pang Lim,Mark A. Hasegawa-Johnson
It is challenging to obtain large amounts of native (matched) labels for speech audio in underresourced languages. This challenge is often due to a lack of literate speakers of the language, or in extreme cases, a lack of universally acknowledged orthography as well. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak
-
Spread Spectrum Audio Watermarking Using Multiple Orthogonal PN Sequences and Variable Embedding Strengths and Polarities IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Yong Xiang,Iynkaran Natgunanathan,Dezhong Peng,Guang Hua,Bo Liu
Copyright protection of audio data is a serious problem and spread spectrum (SS) based audio watermarking is a promising technology to tackle this problem. Although a number of SS-based audio watermarking methods have been reported in the literature, they cannot achieve high robustness and embedding capacity at the same time. In this paper, we propose a novel SS-based audio watermarking method that
-
Binaural Speaker Localization Integrated Into an Adaptive Beamformer for Hearing Aids IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Mehdi Zohourian,Gerald Enzner,Rainer Martin
In this paper, we present and compare novel algorithms to localize simultaneous speakers using four microphones distributed on a pair of binaural hearing aids. The framework consists of two groups of localization algorithms, namely, beamforming-based and statistical model based localization algorithms. We first generalize our previously proposed methods based on beamforming techniques to the binaural
-
Optimization of RNN-Based Speech Activity Detection IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Gregory Gelly,Jean-Luc Gauvain
Speech activity detection (SAD) is an essential component of automatic speech recognition systems impacting the overall system performance. This paper investigates an optimization process for recurrent neural network (RNN) based SAD. This process optimizes all system parameters including those used for feature extraction, the NN weights, and the back-end parameters. Three cost functions are considered
-
Boundary Matching Filters for Spherical Microphone and Loudspeaker Arrays IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Cesar D. Salvador,Shuichi Sakamoto,Jorge Trevino,Yoiti Suzuki
Conversion of microphone array signals into loudspeaker array signals is an essential process in high-definition spatial audio. This paper presents the theory of boundary matching filters (BMFs) for spherical array signal conversion. BMFs adapt the physical boundary conditions used during recording to the ones required for reproduction by relying on a theoretical framework provided by the Kirchhoff–Helmholtz
-
Context-Aware Answer Sentence Selection With Hierarchical Gated Recurrent Neural Networks IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Chuanqi Tan,Furu Wei,Qingyu Zhou,Nan Yang,Bowen Du,Weifeng Lv,Ming Zhou
In this paper, we study the task of reading comprehension style answer sentence selection that aims to select the best sentence from a given passage to answer a question. Unlike most previous works that match the question and each candidate sentence separately, we observe that the context information among sentences in the same passage plays a vital role in this task. We propose modeling context information
-
Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Ahmed Hussen Abdelaziz
Audiovisual fusion is one of the most challenging tasks that continues to attract substantial research interest in the field of audiovisual automatic speech recognition (AV-ASR). In the last few decades, many approaches for integrating the audio and video modalities were proposed to enhance the performance of automatic speech recognition in both clean and noisy conditions. However, very few studies
-
Suppression by Selecting Wavelets for Feature Compression in Distributed Speech Recognition IEEE ACM Trans. Audio Speech Lang. Process. (IF 5.4) Pub Date : 2018-03-01 Syu-Siang Wang,Payton Lin,Yu Tsao,Jeih-Weih Hung,Borching Su
Distributed speech recognition (DSR) splits the processing of data between a mobile device and a network server. In the front-end, features are extracted and compressed to transmit over a wireless channel to a back-end server, where the incoming stream is received and reconstructed for recognition tasks. In this paper, we propose a feature compression algorithm termed suppression by selecting wavelets