A Proposed Model for Source Code Reuse Detection in Computer Programs,Iranian Journal of Science and Technology, Transactions of Electrical Engineering

当前位置： X-MOL 学术 › Iran. J. Sci. Technol. Trans. Electr. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Proposed Model for Source Code Reuse Detection in Computer Programs
Iranian Journal of Science and Technology, Transactions of Electrical Engineering ( IF 2.4 ) Pub Date : 2021-01-20 , DOI: 10.1007/s40998-020-00403-8
Zahra Setoodeh , Mohammad Reza Moosavi , Mostafa Fakhrahmad , Mohammad Bidoki

Source code reuse detection has become of growing significance as a common plagiarism prevention practice in academic research. For a large collection of source codes, the manual detection of the code reuse seems impractical, and there is a vital need for automatic and highly accurate tools. This paper introduces a structure-based approach for recognizing source code (SOCO) reuse in reference programs. The proposed model consists of the three main phases; preprocessing, sequence generation, and decision-making based on estimated similarities. Firstly, important instructions in each code file are identified, and source code is converted to a string of specific tokens. A sequence alignment process is then carried out, and the tree representation of the source code is constructed. In the third phase, the similarity values among the code files are estimated using three different innovative strategies based on both lexical and structural comparison of source codes. Finally, the system decides on each pair of files. The SOCO-2014 corpus is used for evaluating the method. The comparative experimental results of our model and that of the contest participants indicate that our proposed method’s performance is acceptable and promising.

中文翻译：

计算机程序中源代码重用检测的建议模型

作为学术研究中常见的防止抄袭行为，源代码重用检测已变得越来越重要。对于大量的源代码，手动检测代码重用似乎是不切实际的，因此迫切需要自动和高度准确的工具。本文介绍了一种用于识别参考程序中源代码（SOCO）重用的基于结构的方法。所提出的模型包括三个主要阶段。预处理，序列生成和基于估计相似性的决策。首先，识别每个代码文件中的重要指令，并将源代码转换为特定令牌的字符串。然后执行序列比对过程，并构建源代码的树表示。在第三阶段基于源代码的词汇和结构比较，使用三种不同的创新策略来估算代码文件之间的相似性值。最后，系统决定每对文件。SOCO-2014语料库用于评估该方法。我们的模型与竞赛参与者的比较实验结果表明，我们提出的方法的性能是可以接受的并且很有希望。

更新日期：2021-01-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>