当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting Language Model for Efficient Linguistic Steganalysis: An Empirical Study
arXiv - CS - Multimedia Pub Date : 2021-07-26 , DOI: arxiv-2107.12168
Biao Yi, Hanzhou Wu, Guorui Feng, Xinpeng Zhang

Recent advances in linguistic steganalysis have successively applied CNNs, RNNs, GNNs and other deep learning models for detecting secret information in generative texts. These methods tend to seek stronger feature extractors to achieve higher steganalysis effects. However, we have found through experiments that there actually exists significant difference between automatically generated steganographic texts and carrier texts in terms of the conditional probability distribution of individual words. Such kind of statistical difference can be naturally captured by the language model used for generating steganographic texts, which drives us to give the classifier a priori knowledge of the language model to enhance the steganalysis ability. To this end, we present two methods to efficient linguistic steganalysis in this paper. One is to pre-train a language model based on RNN, and the other is to pre-train a sequence autoencoder. Experimental results show that the two methods have different degrees of performance improvement when compared to the randomly initialized RNN classifier, and the convergence speed is significantly accelerated. Moreover, our methods have achieved the best detection results.

中文翻译:

利用语言模型进行高效的语言隐写分析:一项实证研究

语言隐写分析的最新进展先后应用 CNN、RNN、GNN 等深度学习模型来检测生成文本中的秘密信息。这些方法倾向于寻求更强的特征提取器来实现更高的隐写分析效果。但是,我们通过实验发现,自动生成的隐写文本与载体文本在单个单词的条件概率分布上实际上存在显着差异。用于生成隐写文本的语言模型可以自然地捕捉到这种统计差异,这促使我们为分类器提供语言模型的先验知识,以增强隐写分析能力。为此,我们在本文中提出了两种有效的语言隐写分析方法。一种是基于 RNN 预训练语言模型,另一种是预训练序列自编码器。实验结果表明,与随机初始化的RNN分类器相比,两种方法都有不同程度的性能提升,收敛速度明显加快。此外,我们的方法取得了最好的检测结果。
更新日期:2021-07-27
down
wechat
bug