Detection of heterogeneous parallel steganography for low bit-rate VoIP speech streams
Introduction
Steganalysis is to detect the existence of the confidential messages (or secret information) inside apparently innocent carriers [1], [2], [3]. It is a countermeasure technique against steganography, which is a technique of embedding secret information into digital carriers [4], [5], [6].
In this paper, we focus on detecting a new type of steganography, which applies multiple orthogonal steganographic methods to a single carrier. Steganographic methods are called orthogonal if their processes of information hiding and extraction operate independently[7]. This new type of steganography has two characteristics. First, it is heterogeneous because it uses two or more different steganographic methods to hide information. Second, it is parallel because the confidential messages hidden with each steganographic method are embedded individually. Therefore, we name this kind of steganography as heterogeneous parallel steganography (HPS). In contrast, we name the traditional steganography which utilizes only one steganographic method as single steganography.
Streaming media, such as Voice over Internet Protocol (VoIP), is a suitable carrier for HPS. Streaming media is audio or video content sent in the compressed form over the Internet. It consists of a series of packets that contain protocol headers and payloads (e.g., audio and/or video frames). HPS can be applied on streaming media. Taking VoIP as an example, the steganographic methods that embed messages during the process of linear prediction coding (LPC) are orthogonal to the steganographic methods that combine information hiding with pitch period prediction. Therefore, the two kinds of steganographic methods can build HPS for VoIP.
However, it is challenging for existing steganalysis methods to detect HPS on streaming media. General steganalysis methods usually extract universal features (e.g., mel-cepstrum coefficients) and train a classifier to detect various steganography [8], [9], [10]. However, most of these methods are incapable of achieving high detection accuracy [11]. Targeted steganalysis methods focus on specific steganography [12], [13], [14]. Hence, these methods are unable to detect the other steganography used in HPS. Fig. 1 presents the difference between detecting single steganography and heterogeneous parallel steganography which uses two orthogonal steganographic methods alternately on streaming media for targeted steganalysis methods. Ensemble methods combine the detection results of multiple targeted steganalysis methods. They are still not satisfactory enough to detect HPS because each targeted steganalysis method makes detection independently from the specific extracted features. Therefore, it is necessary to design a steganalysis method for detecting HPS on streaming media.
This paper targets on detecting HPS for low bit-rate VoIP speech streams, i.e., a kind of widely-used streaming media. Two orthogonal steganographic methods, i.e., Quantization Index Modulation (QIM) [15] and Pitch Modulation Steganography (PMS) [16], are used to form HPS. We propose a novel deep model named as Steganalysis Feature Fusion Network (SFFN). SFFN combines three neural network structures, i.e., the convolutional neural network (CNN), the recurrent neural network (RNN) and the fully-connected network (FCN). Different from ensemble methods that simply combine the detection results of multiple targeted steganalysis methods, our method is able to effectively fuse the extracted steganalysis features for different steganographic methods used in HPS. Consequently, our method is capable of achieving high accuracy in the task of detecting HPS for low bit-rate VoIP speech streams. In summary, our contributions can be briefed as follows.
- •
We propose a novel deep steganalysis method to handle the task of detecting HPS for low bit-rate VoIP speech streams. It can effectively fuse the extracted steganalysis features so as to make detection in high precision.
- •
The experimental results demonstrate that our approach achieves state-of-the-art detection accuracy in this task. Moreover, it can perform real-time detection.
Section snippets
Payload-based steganography for low bit-rate VoIP
VoIP, a kind of broadly-used streaming media, provides an economical protocol for telephone communication due to its easy data network access. Since analysis-by-synthesis LPC can be used to achieve high compression ratio and satisfactory speech quality, LPC-based low bit-rate speech codecs, e.g., G.723.1 and G.729, are widely applied to VoIP. For these codecs, information can be hidden on the payloads (i.e., speech frame) during the process of encoding speech.
Steganographic methods which are
Proposed method
We propose an effective steganalysis method for the task of detecting HPS for low bit-rate VoIP speech streams. In this paper, we focus on detection of HPS that is made up of QIM and PMS. In this section, we introduce the details of the proposed method including processing of input data, network structure and training procedure.Algorithm 1. Decoding the ACD codewords of a frame to the pitch delays. Input The ACD codewords . Output The pitch delays . 1: ≔ 143;
Experiments and discussion
To evaluate the performance of the proposed SFFN steganalysis method, several experiments are conducted from the aspects of speech segment length, embedding rate, input data processing, fine-tuning and time consumption in this section. Two single steganographic methods and one HPS method are used us the information hiding methods. They are CNV-QIM [15], PMS [16] and HPS formed by CNV-QIM and PMS. For performance benchmarking, we compare the proposed method with four steganalysis methods. The
Conclusion
In this paper, we approach a new task of detecting heterogeneous parallel steganography (HPS) on streaming media. Specifically, we focus on a specific medium, i.e., low bit-rate VoIP speech streams and HPS is formed by two orthogonal steganographic methods, i.e., QIM and PMS. Considering it is challenging for existing steganalysis methods to detect HPS, we propose a novel deep model named as Steganalysis Feature Fusion Network (SFFN) to tackle the task. SFFN can effectively extract steganalysis
CRediT authorship contribution statement
Yuting Hu: Conceptualization, Methodology, Software, Writing - original draft. Yihua Huang: Validation, Formal analysis, Data curation. Zhongliang Yang: Writing - review & editing. Yongfeng Huang: Resources, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is supported by the National Key R&D Program (2018YFB0804103) and the National Natural Science Foundation of China (No. U1705261).
Yuting Hu received the B.E. degree in electronic engineering in 2016 from Tsinghua University, Beijing, China, where she is currently working toward Ph.D. degree. Her current research interests include steganography and steganalysis.
References (39)
- et al.
Quantitative steganalysis of spatial LSB based stego images using reduced instances and features
Pattern Recognition Letters
(2018) - et al.
MP3 steganalysis based on joint point-wise and block-wise correlations
Information Sciences
(2020) - et al.
Video steganography: a review
Neurocomputing
(2019) - et al.
Steganalysis of joint codeword quantization index modulation steganography based on codeword Bayesian network
Neurocomputing
(2018) - et al.
LPC parameters substitution for speech information hiding
The Journal of China Universities of Posts and Telecommunications
(2009) - et al.
Convolutional neural networks for hyperspectral image classification
Neurocomputing
(2017) - et al.
CNN-based steganalysis of MP3 steganography in the entropy code domain
- et al.
A fast and efficient text steganalysis method
IEEE Signal Processing Letters
(2019) - et al.
AHCM: Adaptive Huffman Code Mapping for audio steganography based on psychoacoustic model
IEEE Transactions on Information Forensics and Security
(2019) - et al.
CNN-based adversarial embedding for image steganography
IEEE Transactions on Information Forensics and Security
(2019)
Covert voice over internet protocol communications based on spatial model
Science China Technological Sciences
Temporal derivative-based spectrum and mel-cepstrum audio steganalysis
IEEE Transactions on Information Forensics and Security
Steganalysis of QIM steganography in low-bit-rate speech signals
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Detection of covert voice-over Internet protocol communications using sliding window-based steganalysis
IET Communications
Detection of quantization index modulation steganography in G.723.1 bit stream based on quantization index sequence analysis
Journal of Zhejiang University Science C
Steganography integration into a low-bit rate speech codec
IEEE Transactions on Information Forensics and Security
Cited by (22)
SANet: A Compressed Speech Encoder and Steganography Algorithm Independent Steganalysis Deep Neural Network
2024, IEEE/ACM Transactions on Audio Speech and Language ProcessingFedSpy: A Secure Collaborative Speech Steganalysis Framework Based on Federated Learning
2023, Electronics (Switzerland)Frame-level steganalysis of QIM steganography in compressed speech based on multi-dimensional perspective of codeword correlations
2023, Journal of Ambient Intelligence and Humanized Computing
Yuting Hu received the B.E. degree in electronic engineering in 2016 from Tsinghua University, Beijing, China, where she is currently working toward Ph.D. degree. Her current research interests include steganography and steganalysis.
Yihua Huang received the B.E. degree in telecommunications engineering with management in 2020 from Beijing University of Posts and Telecommunications. He is planning to pursue the M.S. degree in computer science further. His research interests now focus on VoIP steganalysis and deep learning.
Zhongliang Yang received his B.S. degree in electronic science and technology from Sichuan University in 2015. He received the Ph.D. degree in electronic engineering from Tsinghua University in 2020. His research interests include information hiding and natural language processing.
Yongfeng Huang received the Ph.D. degree in computer science and engineering from Huazhong University of Science and Technology in 2000. He is currently a Professor with the Department of Electronic Engineering, Tsinghua University. His research interests include information hiding and natural language processing.