An intelligent decision support system for software plagiarism detection in academia,International Journal of Intelligent Systems

当前位置： X-MOL 学术 › Int. J. Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An intelligent decision support system for software plagiarism detection in academia
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2021-02-21 , DOI: 10.1002/int.22399
Farhan Ullah ₁ , Sohail Jabbar ₂ , Leonardo Mostarda ₃

Affiliation

The act of source code plagiarism is an academic offense that discourages the learning habits of students. Online support is available through which students can hire professional developers to code their regular programming tasks. These facilities make it easier for students to practice plagiarism. First, raw source codes are cleaned from noisy data to extract meaningful codes as the actual logic is more important to the programmers. Second, pre‐processing techniques based on tokenization are used to convert filtered codes into meaningful tokens. It breaks the codes into small instances with the number of occurrences known as the frequency. Thirdly, the local and global weighting scheme method is applied to estimate the significance of each feature in an individual or a group of documents. It helps us greatly to zoom in on the importance of each feature of how effective it is for the next phase. Fourth, the single value decomposition method is used to reduce the dimensions of these features by maintaining the actual semantics of the source codes. This technique is used to remove overloaded noise information and collect only those features that are more effective for plagiarism detection. Fifth, the latent semantic analysis (LSA) technique is used to mine the actual semantics of the source codes in the form of latent variables. After that, the LSA features are used as input to cosine similarity to compute the plagiarism among different source codes. To validate the proposed approach, we used the topic modeling approach to group the relevant features into different topics.

中文翻译：

学术界软件窃检测的智能决策支持系统

源代码窃行为是一项学术罪行，会妨碍学生的学习习惯。提供在线支持，学生可以通过该支持聘请专业开发人员来编写其常规编程任务。这些设施使学生更容易进行窃。首先，从嘈杂的数据中清除原始源代码，以提取有意义的代码，因为实际的逻辑对程序员而言更为重要。其次，基于令牌化的预处理技术用于将过滤后的代码转换为有意义的令牌。它将代码分解为小实例，出现的次数称为频率。第三，使用局部和全局加权方案方法来估计单个或一组文档中每个特征的重要性。它可以帮助我们极大地了解每个功能对下一阶段的有效性的重要性。第四，单值分解方法用于通过保持源代码的实际语义来减小这些功能的维数。此技术用于删除过载的噪声信息，并仅收集对for窃检测更有效的那些功能。第五，潜在语义分析（LSA）技术用于以潜在变量的形式挖掘源代码的实际语义。之后，将LSA特征用作余弦相似度的输入，以计算不同源代码之间的窃。为了验证所提出的方法，我们使用主题建模方法将相关功能分为不同的主题。单值分解方法用于通过保持源代码的实际语义来减小这些功能的维数。此技术用于删除过载的噪声信息，并仅收集对for窃检测更有效的那些功能。第五，潜在语义分析（LSA）技术用于以潜在变量的形式挖掘源代码的实际语义。之后，将LSA特征用作余弦相似度的输入，以计算不同源代码之间的窃。为了验证所提出的方法，我们使用主题建模方法将相关功能分为不同的主题。单值分解方法用于通过保持源代码的实际语义来减小这些功能的维数。此技术用于删除过载的噪声信息，并仅收集对for窃检测更有效的那些功能。第五，潜在语义分析（LSA）技术用于以潜在变量的形式挖掘源代码的实际语义。之后，将LSA特征用作余弦相似度的输入，以计算不同源代码之间的窃。为了验证所提出的方法，我们使用主题建模方法将相关功能分为不同的主题。此技术用于删除过载的噪声信息，并仅收集对for窃检测更有效的那些功能。第五，潜在语义分析（LSA）技术用于以潜在变量的形式挖掘源代码的实际语义。之后，将LSA特征用作余弦相似度的输入，以计算不同源代码之间的窃。为了验证所提出的方法，我们使用主题建模方法将相关功能分为不同的主题。此技术用于删除过载的噪声信息，并仅收集对for窃检测更有效的那些功能。第五，潜在语义分析（LSA）技术用于以潜在变量的形式挖掘源代码的实际语义。之后，将LSA特征用作余弦相似度的输入，以计算不同源代码之间的窃。为了验证所提出的方法，我们使用主题建模方法将相关功能分为不同的主题。LSA特征用作余弦相似度的输入，以计算不同源代码之间的窃。为了验证所提出的方法，我们使用主题建模方法将相关功能分为不同的主题。LSA特征用作余弦相似度的输入，以计算不同源代码之间的窃。为了验证所提出的方法，我们使用主题建模方法将相关功能分为不同的主题。

更新日期：2021-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11