样式: 排序: IF: - GO 导出 标记为已读
-
Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challenge EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-29 Javier Tejedor, Doroteo T. Toledano
The vast amount of information stored in audio repositories makes necessary the development of efficient and automatic methods to search on audio content. In that direction, search on speech (SoS) has received much attention in the last decades. To motivate the development of automatic systems, ALBAYZIN evaluations include a search on speech challenge since 2012. This challenge releases several databases
-
Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-26 Serhat Hizlisoy, Recep Sinan Arslan, Emel Çolakoğlu
Analyzing songs is a problem that is being investigated to aid various operations on music access platforms. At the beginning of these problems is the identification of the person who sings the song. In this study, a singer identification application, which consists of Turkish singers and works for the Turkish language, is proposed in order to find a solution to this problem. Mel-spectrogram and octave-based
-
Sound field reconstruction using neural processes with dynamic kernels EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-20 Zining Liang, Wen Zhang, Thushara D. Abhayapala
Accurately representing the sound field with high spatial resolution is crucial for immersive and interactive sound field reproduction technology. In recent studies, there has been a notable emphasis on efficiently estimating sound fields from a limited number of discrete observations. In particular, kernel-based methods using Gaussian processes (GPs) with a covariance function to model spatial correlations
-
Automatic classification of the physical surface in sound uroflowmetry using machine learning methods EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-16 Marcos Lazaro Alvarez, Laura Arjona, Miguel E. Iglesias Martínez, Alfonso Bahillo
This work constitutes the first approach for automatically classifying the surface that the voiding flow impacts in non-invasive sound uroflowmetry tests using machine learning. Often, the voiding flow impacts the toilet walls (traditionally made of ceramic) instead of the water in the toilet. This may cause a reduction in the strength of the recorded audio signal, leading to a decrease in the amplitude
-
Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-12 Huda Barakat, Oytun Turk, Cenk Demiroglu
Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of exceptionally high quality, closely mimicking human speech. Nevertheless, given the wide array of applications now employing TTS models, mere high-quality speech generation is no longer sufficient
-
Vulnerability issues in Automatic Speaker Verification (ASV) systems EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-10 Priyanka Gupta, Hemant A. Patil, Rodrigo Capobianco Guido
Claimed identities of speakers can be verified by means of automatic speaker verification (ASV) systems, also known as voice biometric systems. Focusing on security and robustness against spoofing attacks on ASV systems, and observing that the investigation of attacker’s perspectives is capable of leading the way to prevent known and unknown threats to ASV systems, several countermeasures (CMs) have
-
Blind extraction of guitar effects through blind system inversion and neural guitar effect modeling EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-07 Reemt Hinrichs, Kevin Gerkens, Alexander Lange, Jörn Ostermann
Audio effects are an ubiquitous tool in music production due to the interesting ways in which they can shape the sound of music. Guitar effects, the subset of all audio effects focusing on guitar signals, are commonly used in popular music to shape the guitar sound to fit specific genres or to create more variety within musical compositions. Automatic extraction of guitar effects and their parameter
-
Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-03 Sivaramakrishna Yecchuri, Sunny Dayal Vanambathina
Recent advancements in deep learning-based speech enhancement models have extensively used attention mechanisms to achieve state-of-the-art methods by demonstrating their effectiveness. This paper proposes a transformer attention network based sub-convolutional U-Net (TANSCUNet) for speech enhancement. Instead of adopting conventional RNNs and temporal convolutional networks for sequence modeling,
-
Acoustical feature analysis and optimization for aesthetic recognition of Chinese traditional music EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-02-02 Lingyun Xie, Yuehong Wang, Yan Gao
Chinese traditional music, a vital expression of Chinese cultural heritage, possesses both a profound emotional resonance and artistic allure. This study sets forth to refine and analyze the acoustical features essential for the aesthetic recognition of Chinese traditional music, utilizing a dataset spanning five aesthetic genres. Through recursive feature elimination, we distilled an initial set of
-
Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-01-20 Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele, Adane Mamuye
Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network prediction model has gained attention as the reconstruction process of a nonlinear and nonstationary speech signal. This study proposes a novel approach to improve speech coding performance by
-
Correction: Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-01-15 Stijn Kindt, Jenthe Thienpondt, Luca Becker, Nilesh Madhu
Correction: EURASIP Journal on Audio, Speech, and Music Processing 2023, 46 (2023) https://doi.org/10.1186/s13636-023-00310-w Following publication of the original article [1], we have been notified that Figure 14, for each cluster subfigure, there was an additional bottom row. These have been removed. Originally published Figure 14: Corrected Figure 14: The original article has been corrected. Kindt
-
Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-01-15 Shangda Wu, Yue Yang, Zhaowen Wang, Xiaobing Li, Maosong Sun
Melody harmonization, which involves generating a chord progression that complements a user-provided melody, continues to pose a significant challenge. A chord progression must not only be in harmony with the melody, but also interdependent on its rhythmic pattern. While previous neural network-based systems have been successful in producing chord progressions for given melodies, they have not adequately
-
Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-01-11 Junya Koguchi, Masanori Morise
Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such as electric bass guitar (EBG), shares acoustical similarities with speech. We propose an attack-sustain (AS) representation of the playing technique to take advantage of this similarity. The AS
-
Significance of relative phase features for shouted and normal speech classification EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-01-06 Khomdet Phapatanaburi, Longbiao Wang, Meng Liu, Seiichi Nakagawa, Talit Jumphoo, Peerapong Uthansakul
Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows
-
Deep semantic learning for acoustic scene classification EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2024-01-03 Yun-Fei Shao, Xin-Xin Ma, Yong Ma, Wei-Qiang Zhang
Acoustic scene classification (ASC) is the process of identifying the acoustic environment or scene from which an audio signal is recorded. In this work, we propose an encoder-decoder-based approach to ASC, which is borrowed from the SegNet in image semantic segmentation tasks. We also propose a novel feature normalization method named Mixup Normalization, which combines channel-wise instance normalization
-
Online distributed waveform-synchronization for acoustic sensor networks with dynamic topology EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-12-18 Aleksej Chinaev, Niklas Knaepper, Gerald Enzner
Acoustic sensing by multiple devices connected in a wireless acoustic sensor network (WASN) creates new opportunities for multichannel signal processing. However, the autonomy of agents in such a network still necessitates the alignment of sensor signals to a common sampling rate. It has been demonstrated that waveform-based estimation of sampling rate offset (SRO) between any node pair can be retrieved
-
Signal processing and machine learning for speech and audio in acoustic sensor networks EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-12-17 Walter Kellermann, Rainer Martin, Nobutaka Ono
Nowadays, we are surrounded by a plethora of recording devices, including mobile phones, laptops, tablets, smartwatches, and camcorders, among others. However, conventional multichannel signal processing methods can usually not be applied to jointly process the signals recorded by multiple distributed devices because synchronous recording is essential. Thus, commercially available microphone array
-
Lightweight target speaker separation network based on joint training EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-12-06 Jing Wang, Hanyue Liu, Liang Xu, Wenjing Yang, Weiming Yi, Fang Liu
Target speaker separation aims to separate the speech components of the target speaker from mixed speech and remove extraneous components such as noise. In recent years, deep learning-based speech separation methods have made significant breakthroughs and have gradually become mainstream. However, these existing methods generally face problems with system latency and performance upper limits due to
-
Piano score rearrangement into multiple difficulty levels via notation-to-notation approach EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-12-05 Masahiro Suzuki
Musical score rearrangement is an emerging area in symbolic music processing, which aims to transform a musical score into a different style. This study focuses on the task of changing the playing difficulty of piano scores, addressing two challenges in musical score rearrangement. First, we address the challenge of handling musical notation on scores. While symbolic music research often relies on
-
Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-12-05 Pierre-Amaury Grumiaux, Mathieu Lagrange
The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as audio coding or audio restoration. In this article, we focus on efficient bandwidth extension of monophonic and polyphonic musical signals using a differentiable digital signal processing (DDSP)
-
Effective acoustic parameters for automatic classification of performed and synthesized Guzheng music EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-12-01 Huiwen Xue, Chenxin Sun, Mingcheng Tang, Chenrui Hu, Zhengqing Yuan, Min Huang, Zhongzhe Xiao
This study focuses on exploring the acoustic differences between synthesized Guzheng pieces and real Guzheng performances, with the aim of improving the quality of synthesized Guzheng music. A dataset with consideration of generalizability with multiple sources and genres is constructed as the basis of analysis. Classification accuracy up to 93.30% with a single feature put forward the fact that although
-
Predominant audio source separation in polyphonic music EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-11-24 Lekshmi Chandrika Reghunath, Rajeev Rajan
Predominant source separation is the separation of one or more desired predominant signals, such as voice or leading instruments, from polyphonic music. The proposed work uses time-frequency filtering on predominant source separation and conditional adversarial networks to improve the perceived quality of isolated sounds. The pitch tracks corresponding to the prominent sound sources of the polyphonic
-
A survey of technologies for automatic Dysarthric speech recognition EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-11-11 Zhaopeng Qian, Kejing Xiao, Chongchong Yu
Speakers with dysarthria often struggle to accurately pronounce words and effectively communicate with others. Automatic speech recognition (ASR) is a powerful tool for extracting the content from speakers with dysarthria. However, the narrow concept of ASR typically only covers technologies that process acoustic modality signals. In this paper, we broaden the scope of this concept that the generalized
-
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-11-04 Kavya Manohar, Jayan A R, Rajeev Rajan
This article presents the research work on improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling. The speech recognition system is built using a deep neural network–hidden Markov model (DNN-HMM)-based automatic speech recognition (ASR). We propose a novel method, syllable-byte pair encoding (S-BPE), that combines linguistically
-
Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-31 Stijn Kindt, Jenthe Thienpondt, Luca Becker, Nilesh Madhu
Speaker embeddings, from the ECAPA-TDNN speaker verification network, were recently introduced as features for the task of clustering microphones in ad hoc arrays. Our previous work demonstrated that, in comparison to signal-based Mod-MFCC features, using speaker embeddings yielded a more robust and logical clustering of the microphones around the sources of interest. This work aims to further establish
-
W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-28 Hao Huang, Lin Wang, Jichen Yang, Ying Hu, Liang He
Non-parallel data voice conversion (VC) has achieved considerable breakthroughs due to self-supervised pre-trained representation (SSPR) being used in recent years. Features extracted by the pre-trained model are expected to contain more content information. However, in common VC with SSPR, there is no special implementation to remove speaker information in the content representation extraction by
-
YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-19 Le Ma, Xinda Wu, Ruiyuan Tang, Chongjun Zhong, Kejun Zhang
Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which makes manually selecting music time-consuming and require professional knowledge and it becomes crucial to automatically recommend music for video. For there is no e-commerce advertisements dataset
-
Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-13 Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller
Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA is also closely associated with various life-threatening diseases such as sudden cardiac arrest and is regarded as a grave medical ailment. Preliminary studies have shown that in the USA, OSA
-
Transformer-based autoencoder with ID constraint for unsupervised anomalous sound detection EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-13 Jian Guan, Youde Liu, Qiuqiang Kong, Feiyang Xiao, Qiaoxi Zhu, Jiantong Tian, Wenwu Wang
Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-based methods could be limited as the feature learned from normal sounds can also fit with anomalous sounds, reducing the ability of the model in detecting
-
Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-12 Chunxi Wang, Maoshen Jia, Xinfeng Zhang
In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each interested speaker from an environment that includes the speech of other speakers, background noise, and room reverberation remains challenging. In order to solve this problem, a speech separation
-
Speech emotion recognition based on Graph-LSTM neural network EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-11 Yan Li, Yapeng Wang, Xu Yang, Sio-Kei Im
Currently, Graph Neural Networks have been extended to the field of speech signal processing. It is the more compact and flexible way to represent speech sequences by graphs. However, the structures of the relationships in recent studies are tend to be relatively uncomplicated. Moreover, the graph convolution module exhibits limitations that impede its adaptability to intricate application scenarios
-
An acoustic echo canceller optimized for hands-free speech telecommunication in large vehicle cabins EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-07 Amin Saremi, Balaji Ramkumar, Ghazaleh Ghaffari, Zonghua Gu
Acoustic echo cancelation (AEC) is a system identification problem that has been addressed by various techniques and most commonly by normalized least mean square (NLMS) adaptive algorithms. However, performing a successful AEC in large commercial vehicles has proved complicated due to the size and challenging variations in the acoustic characteristics of their cabins. Here, we present a wideband fully
-
Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-10-04 Elisa Tengan, Thomas Dietzen, Filip Elvander, Toon van Waterschoot
In this paper, two approaches are proposed for estimating the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. These approaches are based on a method previously presented by the authors, in which point source DOAs were estimated by using a broadband signal model and solving a group-sparse optimization problem
-
Cascade algorithms for combined acoustic feedback cancelation and noise reduction EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-09-21 Santiago Ruiz, Toon van Waterschoot, Marc Moonen
This paper presents three cascade algorithms for combined acoustic feedback cancelation (AFC) and noise reduction (NR) in speech applications. A prediction error method (PEM)-based adaptive feedback cancelation (PEM-based AFC) algorithm is used for the AFC stage, while a multichannel Wiener filter (MWF) is applied for the NR stage. A scenario with M microphones and 1 loudspeaker is considered, without
-
Learning-based robust speaker counting and separation with the aid of spatial coherence EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-09-20 Yicheng Hsu, Mingsian R. Bai
A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions (wRTFs) across time frames. The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local
-
Acoustic object canceller: removing a known signal from monaural recording using blind synchronization EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-09-11 Takao Kawamura, Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono, Ryoichi Miyazaki
In this paper, we propose a technique for removing a specific type of interference from a monaural recording. Nonstationary interferences are generally challenging to eliminate from such recordings. However, if the interference is a known sound like a cell phone ringtone, music from a CD or streaming service, or a radio or TV broadcast, its source signal can be easily obtained. In our method, we define
-
The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-09-07 Lekai Zhang, Yingfan Wang, Kailun He, Hailong Zhang, Baixi Xing, Xiaofeng Liu, Fo Hu
Traffic congestion can lead to negative driving emotions, significantly increasing the likelihood of traffic accidents. Reducing negative driving emotions as a means to mitigate speeding, reckless overtaking, and aggressive driving behaviors is a viable approach. Among the potential methods, affective speech has been considered one of the most promising. However, research on humor-based affective speech
-
Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-09-05 Zhiyong Chen, Shugong Xu
Speaker recognition, the process of automatically identifying a speaker based on individual characteristics in speech signals, presents significant challenges when addressing heterogeneous-domain conditions. Federated learning, a recent development in machine learning methods, has gained traction in privacy-sensitive tasks, such as personal voice assistants in home environments. However, its application
-
Dual input neural networks for positional sound source localization EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-08-30 Eric Grinstein, Vincent W. Neo, Patrick A. Naylor
In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such
-
Training audio transformers for cover song identification EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-08-25 Te Zeng, Francis C. M. Lau
In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitations in effectively modeling temporal contextual information. Analogous to the successes of transformer architecture used in the fields of computer vision and audio classification, to capture long-range
-
Channel and temporal-frequency attention UNet for monaural speech enhancement EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-08-14 Shiyun Xu, Zehua Zhang, Mingjiang Wang
The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancement tasks aimed at improving speech quality. In this study, we propose a monaural speech enhancement model called the channel and temporal-frequency attention UNet (CTFUNet). CTFUNet takes the noisy
-
Microphone utility estimation in acoustic sensor networks using single-channel signal features EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-08-03 Michael Günther, Andreas Brendel, Walter Kellermann
In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computational load, ideally both at the same time. In the acoustic domain, signal cross-correlation is a natural choice to quantify the usefulness of microphone signals, i.e., microphone utility, for coherent
-
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-07-01 Xingwei Liang, Zehua Zhang, Ruifeng Xu
Personalized voice triggering is a key technology in voice assistants and serves as the first step for users to activate the voice assistant. Personalized voice triggering involves keyword spotting (KWS) and speaker verification (SV). Conventional approaches to this task include developing KWS and SV systems separately. This paper proposes a single system called the multi-task deep cross-attention
-
Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-06-30 Yuting Zhou, Hongjie Wan
The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, which is effective in tasks such as speech recognition. However, it achieves high performance relying on complex
-
Automatic detection of attachment style in married couples through conversation analysis EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-31 Tuğçe Melike Koçak, Büşra Çilem Dibek, Esma Nafiye Polat, Nilüfer Kafesçioğlu, Cenk Demiroğlu
Analysis of couple interactions using speech processing techniques is an increasingly active multi-disciplinary field that poses challenges such as automatic relationship quality assessment and behavioral coding. Here, we focused on the prediction of individuals’ attachment style using interactions of recently married (1–15 months) couples. For low-level acoustic feature extraction, in addition to
-
Parallel processing of distributed beamforming and multichannel linear prediction for speech denoising and deverberation in wireless acoustic sensor networks EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-22 Zhe Han, Yuxuan Ke, Xiaodong Li, Chengshi Zheng
More and more smart home devices with microphones come into our life in these years; it is highly desirable to connect these microphones as wireless acoustic sensor networks (WASNs) so that these devices can be better controlled in an enclosure. For indoor applications, both environmental noise and room reverberation may severely degrade speech quality, and thus both of them need to be removed to improve
-
Variational Autoencoders for chord sequence generation conditioned on Western harmonic music complexity EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-15 Luca Comanducci, Davide Gioiosa, Massimiliano Zanoni, Fabio Antonacci, Augusto Sarti
In recent years, the adoption of deep learning techniques has allowed to obtain major breakthroughs in the automatic music generation research field, sparking a renewed interest in generative music. A great deal of work has focused on the possibility of conditioning the generation process in order to be able to create music according to human-understandable parameters. In this paper, we propose a technique
-
Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-15 Tong Liu, Xiaochen Yuan
Emotion plays a dominant role in speech. The same utterance with different emotions can lead to a completely different meaning. The ability to perform various of emotion during speaking is also one of the typical characters of human. In this case, technology trends to develop advanced speech emotion classification algorithms in the demand of enhancing the interaction between computer and human beings
-
Speech emotion recognition based on emotion perception EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-12 Gang Liu, Shifang Cai, Ce Wang
Speech emotion recognition (SER) is a hot topic in speech signal processing. With the advanced development of the cheap computing power and proliferation of research in data-driven methods, deep learning approaches are prominent solutions to SER nowadays. SER is a challenging task due to the scarcity of datasets and the lack of emotion perception. Most existing networks of SER are based on computer
-
Time-domain adaptive attention network for single-channel speech separation EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-11 Kunpeng Wang, Hao Zhou, Jingxiang Cai, Wenna Li, Juan Yao
Recent years have witnessed a great progress in single-channel speech separation by applying self-attention based networks. Despite the excellent performance in mining relevant long-sequence contextual information, self-attention networks cannot perfectly focus on subtle details in speech signals, such as temporal or spectral continuity, spectral structure, and timbre. To tackle this problem, we proposed
-
Explicit-memory multiresolution adaptive framework for speech and music separation EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-09 Ashwin Bellur, Karan Thakkar, Mounya Elhilali
The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of the input and uses memory (or priors) to guide the selection of a target sound from the input mixture. Moreover, feedback mechanisms refine the memory constructs resulting in further improvement
-
MUSIB: musical score inpainting benchmark EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-05 Mauricio Araneda-Hernandez, Felipe Bravo-Marquez, Denis Parra, Rodrigo F. Cádiz
Music inpainting is a sub-task of automated music generation that aims to infill incomplete musical pieces to help musicians in their musical composition process. Many methods have been developed for this task. However, we observe a tendency for each method to be evaluated using different datasets and metrics in the papers where they are presented. This lack of standardization hinders an adequate comparison
-
A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-05-01 Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter. Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs). By deriving new metrics analyzing the dereverberation performance in various time
-
MYRiAD: a multi-array room acoustic database EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-04-26 Thomas Dietzen, Randall Ali, Maja Taseska, Toon van Waterschoot
In the development of acoustic signal processing algorithms, their evaluation in various acoustic environments is of utmost importance. In order to advance evaluation in realistic and reproducible scenarios, several high-quality acoustic databases have been developed over the years. In this paper, we present another complementary database of acoustic recordings, referred to as the Multi-arraY Room
-
Voice activity detection in the presence of transient based on graph EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-04-20 Xiao-Yuan Guo, Chun-Xian Gao, Hui Liu
Voice activity detection remains a significant challenge in the presence of transients since transients are more dominant than speech, though it has achieved satisfactory performance in quasi-stationary noisy environments. This paper studies the differences between speech and transients in nonlinear dynamic characteristics and proposes a new method for accurately detecting speech and transients. Limited
-
Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-04-07 Pu Wang, Hugo Van hamme
With the rise of deep learning, spoken language understanding (SLU) for command-and-control applications such as a voice-controlled virtual assistant can offer reliable hands-free operation to physically disabled individuals. However, due to data scarcity, it is still a challenge to process dysarthric speech. Pre-training (part of) the SLU model with supervised automatic speech recognition (ASR) targets
-
Three-stage training and orthogonality regularization for spoken language recognition EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-04-06 Zimu Li, Yanyan Xu, Dengfeng Ke, Kaile Su
Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack of a better training strategy for such architectures of two individual branches. In this paper, we analyze the mostly used two-stage training strategies and reveal a trade-off between the recognition
-
AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-03-23 Fabian Ostermann, Igor Vatolkin, Martin Ebeling
We present a new dataset of 3000 artificial music tracks with rich annotations based on real instrument samples and generated by algorithmic composition with respect to music theory. Our collection provides ground truth onset information and has several advantages compared to many available datasets. It can be used to compare and optimize algorithms for various music information retrieval tasks like
-
Deep learning-based wave digital modeling of rate-dependent hysteretic nonlinearities for virtual analog applications EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-03-08 Oliviero Massi, Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini
Electromagnetic components greatly contribute to the peculiar timbre of analog audio gear. Indeed, distortion effects due to the nonlinear behavior of magnetic materials are known to play an important role in enriching the harmonic content of an audio signal. However, despite the abundant research that has been devoted to the characterization of nonlinearities in the context of virtual analog modeling
-
A latent rhythm complexity model for attribute-controlled drum pattern generation EURASIP J. Audio Speech Music Proc. (IF 2.4) Pub Date : 2023-02-17 Alessandro Ilic Mezza, Massimiliano Zanoni, Augusto Sarti
Most music listeners have an intuitive understanding of the notion of rhythm complexity. Musicologists and scientists, however, have long sought objective ways to measure and model such a distinctively perceptual attribute of music. Whereas previous research has mainly focused on monophonic patterns, this article presents a novel perceptually-informed rhythm complexity measure specifically designed