A New Hybrid Technique for Detection of Plagiarism from Text Documents,Arabian Journal for Science and Engineering

当前位置： X-MOL 学术 › Arab. J. Sci. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A New Hybrid Technique for Detection of Plagiarism from Text Documents
Arabian Journal for Science and Engineering ( IF 2.6 ) Pub Date : 2020-05-11 , DOI: 10.1007/s13369-020-04565-9
Lovepreet Ahuja , Vishal Gupta , Rohit Kumar

Plagiarism occurs when we use the ideas, expressions, work, and words of other authors and do not give them the required attribution. The major contributing factor in plagiarism is the availability of a high amount of data and information on the internet that can be swiftly accessed. The proposed system introduces an extrinsic plagiarism detection approach inspired by cognition because it utilizes semantic knowledge to detect the plagiarized part from the text without human involvement. A lexical database like WordNet assists the computers to perceive the data and information. These days most of the plagiarism detection systems fail to detect highly complex cases of plagiarism. The proposed system uses Dice measure as similarity measure for finding the semantic resemblance between the pair of sentences. It also uses linguistic features like path similarity, depth estimation measure to compute the resemblance between the pair of words and these features are combined by assigning different weights to them. It is capable of identifying cases like restructuring, paraphrasing, verbatim copy, and synonymized plagiarism. It has been evaluated on the PAN-PC-11 corpus. The results obtained from the proposed system signify that it has outperformed other existing systems on PAN-PC-11 in terms of precision, recall, F-measure, and PlagDet score. The proposed system has innovative approach, but the results are somehow close and reasonably better than the existing systems.

中文翻译：

从文本文档中检测Pla窃的新混合技术

当我们使用其他作者的想法，表达，作品和言语而没有给他们所需的出处时，便会发生窃。窃的主要促成因素是互联网上可以快速访问的大量数据和信息的可用性。拟议的系统引入了一种受认知启发的外部窃检测方法，因为它利用语义知识从文本中检测出pla窃的部分，而无需人工干预。词汇数据库（如WordNet）可帮助计算机感知数据和信息。如今，大多数pla窃检测系统无法检测到高度复杂的of窃案件。所提出的系统使用Dice量度作为相似度量度，以查找句子对之间的语义相似度。它还使用诸如路径相似性，深度估计量之类的语言功能来计算这对单词之间的相似度，并且通过为它们分配不同的权重来组合这些功能。它能够识别诸如重组，释义，逐字记录和syn窃的代名词。已在PAN-PC-11语料库上对其进行了评估。从提议的系统中获得的结果表明，在精度，召回率，F测量和PlagDet得分。所提出的系统具有创新的方法，但是结果在某种程度上比现有系统更接近且合理地更好。

更新日期：2020-05-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11