File fragment recognition based on content and statistical features,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

File fragment recognition based on content and statistical features
arXiv - CS - Multimedia Pub Date : 2021-02-25 , DOI: arxiv-2102.12699
Marzieh Masoumi, Ahmad Keshavarz, Reza Fotohi

Nowadays, the speed up development and use of digital devices such as smartphones have put people at risk of internet crimes. The evidence of present crimes in a computer file can be easily unreachable by changing the prefix of a file or other algorithms. In more complex cases, either file divided into different parts or the parts of a file that has information about the file type are deleted, where the file fragment recognition issue is discussed. The known files are divided into different fragments, and different classification algorithms are used to solve the problems of file fragment recognition. The issue of identifying the type of file fragment due to its importance in cybercrime issues as well as antivirus has been highly emphasized and has been addressed in many articles. Increasing the accuracy in this field on the types of widely used files due to the sensitivity of the subject of recognizing the type of file under study is the main goal of researchers in this field. Failure to identify the correct type of file will lead to deviations of the results and evidence from the main issue or failure to conclude. In this paper, first, the file is divided into different fragments. Then, the file fragment features, which are obtained from Binary Frequency Distribution, are reduced by 2 feature reduction algorithms; Sequential Forward Selection algorithm as well as Sequential Floating Forward Selection algorithm to delete sparse features that result in increased accuracy and speed. Finally, the reduced features are given to 3 Multiclass classifier algorithms, Multilayer Perceptron, Support Vector Machines, and K-Nearest Neighbor for classification and comparison of the results. The proposed recognition algorithm can recognize 6 types of useful files and may distinguish a type of file fragments with higher accuracy than the similar works done.

中文翻译：

基于内容和统计特征的文件碎片识别

如今，加快开发和使用智能手机等数字设备的速度使人们处于互联网犯罪的风险中。通过更改文件的前缀或其他算法，很容易无法获得计算机文件中当前犯罪的证据。在更复杂的情况下，将删除分为不同部分的文件或具有文件类型信息的文件部分，在此讨论文件碎片识别问题。已知文件被分为不同的片段，并且使用不同的分类算法来解决文件片段识别的问题。由于文件片段在网络犯罪问题和防病毒中的重要性，因此识别文件片段类型的问题已得到高度重视，并在许多文章中得到了解决。由于主题识别所研究文件的类型的敏感性，提高该领域在广泛使用的文件类型上的准确性是该领域研究人员的主要目标。如果未能确定正确的文件类型，将导致结果和主要问题的证据存在偏差或无法得出结论。在本文中，首先，文件分为不同的片段。然后，通过2种特征约简算法对从二进制频率分布中获得的文件片段特征进行约简。顺序前向选择算法以及顺序浮动前向选择算法可删除稀疏特征，从而提高准确性和速度。最后，将简化后的功能提供给3种多类分类器算法，多层感知器，支持向量机，和K-Nearest邻居进行结果分类和比较。所提出的识别算法可以识别6种类型的有用文件，并且可以比进行的相似工作更高的准确性来区分一种文件片段。

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文