Flow Chart Generation-Based Source Code Similarity Detection Using Process Mining,Scientific Programming

当前位置： X-MOL 学术 › Sci. Program. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Flow Chart Generation-Based Source Code Similarity Detection Using Process Mining
Scientific Programming ( IF 1.672 ) Pub Date : 2020-07-07 , DOI: 10.1155/2020/8865413
Feng Zhang _{1,

2} , Lulu Li ₁ , Cong Liu ₃ , Qingtian Zeng _{1,

2}

Affiliation

Source code similarity detection has extensive applications in computer programming teaching and software intellectual property protection. In the teaching of computer programming courses, students may utilize some complex source code obfuscation techniques, e.g., opaque predicates, loop unrolling, and function inlining and outlining, to reduce the similarity between code fragments and avoid the plagiarism detection. Existing source code similarity detection approaches only consider static features of source code, making it difficult to cope with more complex code obfuscation techniques. In this paper, we propose a novel source code similarity detection approach by considering the dynamic features at runtime of source code using process mining. More specifically, given two pieces of source code, their running logs are obtained by source code instrumentation and execution. Next, process mining is used to obtain the flow charts of the two pieces of source code by analyzing their collected running logs. Finally, similarity of the two pieces of source code is measured by computing the similarity of these two flow charts. Experimental results show that the proposed approach can deal with more complex obfuscation techniques including opaque predicates and loop unrolling as well as function inlining and outlining, which cannot be handled by existing work properly. Therefore, we argue that our approach can defeat commonly used code obfuscation techniques more effectively for source code similarity detection than the existing state-of-the-art approaches.

中文翻译：

使用流程挖掘的基于流程图生成的源代码相似性检测

源代码相似性检测在计算机程序设计教学和软件知识产权保护方面有着广泛的应用。在计算机程序设计课程的教学中，学生可能会利用一些复杂的源代码混淆技术，如不透明谓词、循环展开、函数内联和大纲等，以减少代码片段之间的相似性，避免抄袭检测。现有的源代码相似性检测方法只考虑源代码的静态特征，难以应对更复杂的代码混淆技术。在本文中，我们通过使用过程挖掘考虑源代码运行时的动态特征，提出了一种新的源代码相似性检测方法。更具体地说，给定两段源代码，它们的运行日志是通过源代码插桩和执行获得的。接下来，通过流程挖掘，通过分析采集到的运行日志，得到两段源代码的流程图。最后，通过计算这两个流程图的相似度来衡量两段源代码的相似度。实验结果表明，所提出的方法可以处理更复杂的混淆技术，包括不透明谓词和循环展开以及函数内联和大纲，这些是现有工作无法正确处理的。因此，我们认为我们的方法可以比现有的最先进方法更有效地击败常用的代码混淆技术，用于源代码相似性检测。使用流程挖掘，通过分析采集到的运行日志，得到两段源代码的流程图。最后，通过计算这两个流程图的相似度来衡量两段源代码的相似度。实验结果表明，所提出的方法可以处理更复杂的混淆技术，包括不透明谓词和循环展开以及函数内联和大纲，这些是现有工作无法正确处理的。因此，我们认为我们的方法可以比现有的最先进方法更有效地击败常用的代码混淆技术，用于源代码相似性检测。使用流程挖掘，通过分析采集到的运行日志，得到两段源代码的流程图。最后，通过计算这两个流程图的相似度来衡量两段源代码的相似度。实验结果表明，所提出的方法可以处理更复杂的混淆技术，包括不透明谓词和循环展开以及函数内联和大纲，这些是现有工作无法正确处理的。因此，我们认为我们的方法可以比现有的最先进方法更有效地击败常用的代码混淆技术，用于源代码相似性检测。实验结果表明，所提出的方法可以处理更复杂的混淆技术，包括不透明谓词和循环展开以及函数内联和大纲，这些是现有工作无法正确处理的。因此，我们认为我们的方法可以比现有的最先进方法更有效地击败常用的代码混淆技术，用于源代码相似性检测。实验结果表明，所提出的方法可以处理更复杂的混淆技术，包括不透明谓词和循环展开以及函数内联和大纲，这些是现有工作无法正确处理的。因此，我们认为我们的方法可以比现有的最先进方法更有效地击败常用的代码混淆技术，用于源代码相似性检测。

更新日期：2020-07-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>