当前位置: X-MOL 学术Secur. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PBDT: Python Backdoor Detection Model Based on Combined Features
Security and Communication Networks Pub Date : 2021-09-14 , DOI: 10.1155/2021/9923234
Yong Fang 1 , Mingyu Xie 1 , Cheng Huang 1
Affiliation  

Application security is essential in today’s highly development period. Backdoor is a means by which attackers can invade the system to achieve illegal purposes and damage users’ rights. It has posed a serious threat to network security. Thus, it is urgent to take adequate measures to defend such attacks. Previous research work was mainly focused on numerous PHP webshells, with less research on Python backdoor files. Language differences make the method not entirely applicable. This paper proposes a Python backdoor detection model named PBDT based on combined features. The model summarizes the common functional modules and functions in the backdoor files and extracts the number of calls in the text to form sample features. What is more, we consider the text’s statistical characteristics, including the information entropy, the longest string, etc., to identify the obfuscated Python code. Besides, the opcode sequence is used to represent code characteristics, such as TF-IDF vector and FastText classifier, to eliminate the influence of interference items. Finally, we introduce the Random Forest algorithm to build a classifier. Covering most types of backdoors, some samples are obfuscated, the model achieves an accuracy of 97.70%, and the TNR index is as high as 98.66%, showing a good classification performance in Python backdoor detection.

中文翻译:

PBDT:基于组合特征的Python后门检测模型

在当今高度发展的时代,应用安全至关重要。后门是攻击者入侵系统以达到非法目的和损害用户权益的一种手段。它对网络安全构成了严重威胁。因此,迫切需要采取足够的措施来防御此类攻击。之前的研究工作主要集中在众多的 PHP webshel​​l 上,对 Python 后门文件的研究较少。语言差异使该方法不完全适用。本文提出了一种基于组合特征的Python后门检测模型PBDT。该模型汇总了后门文件中的常用功能模块和功能,并提取出文本中的调用次数,形成样本特征。更重要的是,我们考虑了文本的统计特征,包括信息熵、最长字符串等,识别混淆的 Python 代码。此外,操作码序列用于表示代码特征,如TF-IDF向量和FastText分类器,以消除干扰项的影响。最后,我们引入随机森林算法来构建分类器。覆盖大部分类型的后门,部分样本被混淆,模型准确率达到97.70%,TNR指数高达98.66%,在Python后门检测中表现出良好的分类性能。
更新日期:2021-09-14
down
wechat
bug