Elsevier

Neurocomputing

Volume 419, 2 January 2021, Pages 70-79
Neurocomputing

Detection of heterogeneous parallel steganography for low bit-rate VoIP speech streams

https://doi.org/10.1016/j.neucom.2020.08.002Get rights and content

Abstract

This paper considers a new task of detecting heterogeneous parallel steganography (HPS) on streaming media. This task is to detect the existence of the confidential messages hidden in the frames of streaming media with multiple kinds of orthogonal steganographic methods. We target on detecting HPS in this work for low bit-rate Voice over Internet Protocol (VoIP) speech streams, which is a widely-used streaming medium. Specifically, two steganographic methods, i.e., Quantization Index Modulation and Pitch Modulation Steganography, are utilized to form the HPS. Detecting HPS on low bit-rate VoIP speech streams is challenging for existing steganalysis methods. To accomplish the target, we propose a novel deep model named as Steganalysis Feature Fusion Network (SFFN). SFFN consists of three sub-networks, i.e., a feature learning network, a feature fusion network and a classification network. With the three sub-networks, SFFN can effectively extract steganalysis features for the steganographic methods used in HPS and can fuse the features to make credible prediction. The experimental results demonstrate that our method is superior to the state-of-the-art steganalysis methods when detecting HPS. Besides, our method meets the requirement of real-time detection.

Introduction

Steganalysis is to detect the existence of the confidential messages (or secret information) inside apparently innocent carriers [1], [2], [3]. It is a countermeasure technique against steganography, which is a technique of embedding secret information into digital carriers [4], [5], [6].

In this paper, we focus on detecting a new type of steganography, which applies multiple orthogonal steganographic methods to a single carrier. Steganographic methods are called orthogonal if their processes of information hiding and extraction operate independently[7]. This new type of steganography has two characteristics. First, it is heterogeneous because it uses two or more different steganographic methods to hide information. Second, it is parallel because the confidential messages hidden with each steganographic method are embedded individually. Therefore, we name this kind of steganography as heterogeneous parallel steganography (HPS). In contrast, we name the traditional steganography which utilizes only one steganographic method as single steganography.

Streaming media, such as Voice over Internet Protocol (VoIP), is a suitable carrier for HPS. Streaming media is audio or video content sent in the compressed form over the Internet. It consists of a series of packets that contain protocol headers and payloads (e.g., audio and/or video frames). HPS can be applied on streaming media. Taking VoIP as an example, the steganographic methods that embed messages during the process of linear prediction coding (LPC) are orthogonal to the steganographic methods that combine information hiding with pitch period prediction. Therefore, the two kinds of steganographic methods can build HPS for VoIP.

However, it is challenging for existing steganalysis methods to detect HPS on streaming media. General steganalysis methods usually extract universal features (e.g., mel-cepstrum coefficients) and train a classifier to detect various steganography [8], [9], [10]. However, most of these methods are incapable of achieving high detection accuracy [11]. Targeted steganalysis methods focus on specific steganography [12], [13], [14]. Hence, these methods are unable to detect the other steganography used in HPS. Fig. 1 presents the difference between detecting single steganography and heterogeneous parallel steganography which uses two orthogonal steganographic methods alternately on streaming media for targeted steganalysis methods. Ensemble methods combine the detection results of multiple targeted steganalysis methods. They are still not satisfactory enough to detect HPS because each targeted steganalysis method makes detection independently from the specific extracted features. Therefore, it is necessary to design a steganalysis method for detecting HPS on streaming media.

This paper targets on detecting HPS for low bit-rate VoIP speech streams, i.e., a kind of widely-used streaming media. Two orthogonal steganographic methods, i.e., Quantization Index Modulation (QIM) [15] and Pitch Modulation Steganography (PMS) [16], are used to form HPS. We propose a novel deep model named as Steganalysis Feature Fusion Network (SFFN). SFFN combines three neural network structures, i.e., the convolutional neural network (CNN), the recurrent neural network (RNN) and the fully-connected network (FCN). Different from ensemble methods that simply combine the detection results of multiple targeted steganalysis methods, our method is able to effectively fuse the extracted steganalysis features for different steganographic methods used in HPS. Consequently, our method is capable of achieving high accuracy in the task of detecting HPS for low bit-rate VoIP speech streams. In summary, our contributions can be briefed as follows.

  • We propose a novel deep steganalysis method to handle the task of detecting HPS for low bit-rate VoIP speech streams. It can effectively fuse the extracted steganalysis features so as to make detection in high precision.

  • The experimental results demonstrate that our approach achieves state-of-the-art detection accuracy in this task. Moreover, it can perform real-time detection.

Section snippets

Payload-based steganography for low bit-rate VoIP

VoIP, a kind of broadly-used streaming media, provides an economical protocol for telephone communication due to its easy data network access. Since analysis-by-synthesis LPC can be used to achieve high compression ratio and satisfactory speech quality, LPC-based low bit-rate speech codecs, e.g., G.723.1 and G.729, are widely applied to VoIP. For these codecs, information can be hidden on the payloads (i.e., speech frame) during the process of encoding speech.

Steganographic methods which are

Proposed method

We propose an effective steganalysis method for the task of detecting HPS for low bit-rate VoIP speech streams. In this paper, we focus on detection of HPS that is made up of QIM and PMS. In this section, we introduce the details of the proposed method including processing of input data, network structure and training procedure.

Algorithm 1. Decoding the ACD codewords of a frame to the pitch delays.
Input
 The ACD codewords a=(a1,a2).
Output
 The pitch delays p=(pint1,pfra1,pint2,pfra2).
1: Γmax ≔ 143;

Experiments and discussion

To evaluate the performance of the proposed SFFN steganalysis method, several experiments are conducted from the aspects of speech segment length, embedding rate, input data processing, fine-tuning and time consumption in this section. Two single steganographic methods and one HPS method are used us the information hiding methods. They are CNV-QIM [15], PMS [16] and HPS formed by CNV-QIM and PMS. For performance benchmarking, we compare the proposed method with four steganalysis methods. The

Conclusion

In this paper, we approach a new task of detecting heterogeneous parallel steganography (HPS) on streaming media. Specifically, we focus on a specific medium, i.e., low bit-rate VoIP speech streams and HPS is formed by two orthogonal steganographic methods, i.e., QIM and PMS. Considering it is challenging for existing steganalysis methods to detect HPS, we propose a novel deep model named as Steganalysis Feature Fusion Network (SFFN) to tackle the task. SFFN can effectively extract steganalysis

CRediT authorship contribution statement

Yuting Hu: Conceptualization, Methodology, Software, Writing - original draft. Yihua Huang: Validation, Formal analysis, Data curation. Zhongliang Yang: Writing - review & editing. Yongfeng Huang: Resources, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is supported by the National Key R&D Program (2018YFB0804103) and the National Natural Science Foundation of China (No. U1705261).

Yuting Hu received the B.E. degree in electronic engineering in 2016 from Tsinghua University, Beijing, China, where she is currently working toward Ph.D. degree. Her current research interests include steganography and steganalysis.

References (39)

  • Y.F. Huang et al.

    Covert voice over internet protocol communications based on spatial model

    Science China Technological Sciences

    (2016)
  • C. Kraetzer, J. Dittmann, Mel-cepstrum-based steganalysis for VoIP steganography, in: Security, Steganography, and...
  • C. Kraetzer, J. Dittmann, Pros and cons of Mel-cepstrum based audio steganalysis using SVM classification, in:...
  • Q.Z. Liu et al.

    Temporal derivative-based spectrum and mel-cepstrum audio steganalysis

    IEEE Transactions on Information Forensics and Security

    (2009)
  • S.B. Li et al.

    Steganalysis of QIM steganography in low-bit-rate speech signals

    IEEE/ACM Transactions on Audio, Speech, and Language Processing

    (2017)
  • Y.F. Huang et al.

    Detection of covert voice-over Internet protocol communications using sliding window-based steganalysis

    IET Communications

    (2011)
  • S.B. Li et al.

    Detection of quantization index modulation steganography in G.723.1 bit stream based on quantization index sequence analysis

    Journal of Zhejiang University Science C

    (2012)
  • B. Xiao, Y.F. Huang, S.Y. Tang, An approach to information hiding in low bit-rate speech stream, in: IEEE GLOBECOM...
  • Y.F. Huang et al.

    Steganography integration into a low-bit rate speech codec

    IEEE Transactions on Information Forensics and Security

    (2012)
  • Cited by (22)

    View all citing articles on Scopus

    Yuting Hu received the B.E. degree in electronic engineering in 2016 from Tsinghua University, Beijing, China, where she is currently working toward Ph.D. degree. Her current research interests include steganography and steganalysis.

    Yihua Huang received the B.E. degree in telecommunications engineering with management in 2020 from Beijing University of Posts and Telecommunications. He is planning to pursue the M.S. degree in computer science further. His research interests now focus on VoIP steganalysis and deep learning.

    Zhongliang Yang received his B.S. degree in electronic science and technology from Sichuan University in 2015. He received the Ph.D. degree in electronic engineering from Tsinghua University in 2020. His research interests include information hiding and natural language processing.

    Yongfeng Huang received the Ph.D. degree in computer science and engineering from Huazhong University of Science and Technology in 2000. He is currently a Professor with the Department of Electronic Engineering, Tsinghua University. His research interests include information hiding and natural language processing.

    View full text