当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic plagiarism detection in obfuscated text
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2020-04-22 , DOI: 10.1007/s10044-020-00882-9
Alaa Saleh Altheneyan , Mohamed El Bachir Menai

Plagiarism is a serious problem in education, research, publishing and other fields. Automatic plagiarism detection systems are crucial for ensuring the integrity and genuineness of intellectual work. There are different types of plagiarism, such as copy–paste, obfuscation and translation. In particular, obfuscated text is one of the hardest types of plagiarism to detect. In this paper, we propose an automatic plagiarism detection system for obfuscated text based on a support vector machine classifier that exploits a set of lexical, syntactic and semantic features. We evaluated the performance of the proposed system on benchmark English and Arabic corpora made available by the PAN Workshop series: PAN 2012, PAN 2013, PAN 2014 and PAN@FIRE2015. We also compared the performance of our system to the performances of other systems that participated in the PAN competitions. The obtained results show that our system had the best performance in terms of the F-measure on the PAN 2012 and on the PAN@FIRE2015 obfuscated sub-corpora, was among the top four on the PAN 2013 corpus and was among the top two on the PAN 2014 corpus.

中文翻译:

混淆文本中的自动抄袭检测

education窃在教育,研究,出版和其他领域是一个严重的问题。自动窃检测系统对于确保智力工作的完整性和真实性至关重要。different窃有不同类型,例如复制粘贴,混淆和翻译。特别是,混淆文本是最难发现的窃类型之一。在本文中,我们提出了一种基于支持向量机分类器的模糊文本自动抄袭检测系统,该分类器利用了一组词汇,句法和语义特征。我们通过PAN Workshop系列提供的基准英语和阿拉伯语料集评估了建议系统的性能:PAN 2012,PAN 2013,PAN 2014和PAN @ FIRE2015。我们还将系统性能与参加PAN竞赛的其他系统的性能进行了比较。获得的结果表明,我们的系统在PAN 2012和PAN @ FIRE2015混淆子集上的F -measure在PAN 2013语料库中排名前四,在PAN 2014语料库中排名前二。
更新日期:2020-04-22
down
wechat
bug