当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of VoIP Speech With Multiple Domain Deep Features
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 12-18-2019 , DOI: 10.1109/tifs.2019.2960635
Yuankun Huang , Bin Li , Mauro Barni , Jiwu Huang

Identifying whether a phone call comes from VoIP (Voice over Internet Protocol) is a challenging but less-investigated audio forensic issue. As shown in a previous study, existing feature based methods do not work well. In this paper, we propose a robust data-driven approach, called CNN-MLS (convolutional neural network based multi-domain learning scheme), to distinguish VoIP calls from mobile phone calls. To better explore the differences between VoIP and mobile phone calls, we first process data with high-pass filtering, and then extract deep features from both temporal domain and spectral domain. Two CNN architectures are designed for accepting data from respective domains, and some tricks such as auxiliary classifiers and individual subnet training are used for accelerating network convergence. The deep features are finally fused in a classification module for identifying the phone call type. The proposed method is evaluated on VPCID (VoIP Phone Call Identification Database) dataset, under various testing conditions. We pay particular attention to tests on data belonging to a source mismatched with the training sources. Experimental results show that, compared with existing methods, our method can achieve satisfactory and better accuracy on two-second-long inputs, implying that an alert may be activated shortly after a VoIP call is made.

中文翻译:


具有多域深度特征的 VoIP 语音识别



识别电话是否来自 VoIP(互联网协议语音)是一个具有挑战性但研究较少的音频取证问题。正如之前的研究所示,现有的基于特征的方法效果不佳。在本文中,我们提出了一种强大的数据驱动方法,称为 CNN-MLS(基于卷积神经网络的多域学习方案),以区分 VoIP 呼叫和移动电话呼叫。为了更好地探索 VoIP 和手机通话之间的差异,我们首先使用高通滤波处理数据,然后从时域和谱域提取深层特征。两种 CNN 架构被设计用于接受来自各自领域的数据,并使用辅助分类器和单独子网训练等一些技巧来加速网络收敛。深层特征最终融合在分类模块中,用于识别电话呼叫类型。在各种测试条件下,所提出的方法在 VPCID(VoIP 电话呼叫识别数据库)数据集上进行了评估。我们特别注意对属于与训练源不匹配的源的数据进行测试。实验结果表明,与现有方法相比,我们的方法可以在两秒长的输入上实现令人满意且更好的准确度,这意味着在进行 VoIP 呼叫后不久可能会激活警报。
更新日期:2024-08-22
down
wechat
bug