Skip to main content
Log in

Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Much recent progress in monaural speech separation (MSS) has been achieved through a series of deep learning architectures based on autoencoders, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor autoencoder (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific generative factors and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi-rong Mao.

Additional information

Project supported by the Key Project of the National Natural Science Foundation of China (No. U1836220), the National Natural Science Foundation of China (No. 61672267), the Qing Lan Talent Program of Jiangsu Province, China, and the Key Innovation Project of Undergraduate Students in Jiangsu Province, China (No. 201810299045Z)

Contributors

Jing-jing CHEN and Qi-rong MAO designed the research. Jing-jing CHEN processed the data. Jing-jing CHEN and Qi-rong MAO drafted the manuscript. You-cai QIN, Shuang-qing QIAN, and Zhi-shen ZHENG helped organize the manuscript. Jing-jing CHEN and Qi-rong MAO revised and finalized the paper.

Compliance with ethics guidelines

Jing-jing CHEN, Qi-rong MAO, You-cai QIN, Shuangqing QIAN, and Zhi-shen ZHENG declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Jj., Mao, Qr., Qin, Yc. et al. Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder. Front Inform Technol Electron Eng 21, 1639–1650 (2020). https://doi.org/10.1631/FITEE.2000019

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2000019

Key words

CLC number

Navigation