Using word semantic concepts for plagiarism detection in text documents,Information Retrieval

当前位置： X-MOL 学术 › Information Retrieval › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using word semantic concepts for plagiarism detection in text documents
Information Retrieval Pub Date : 2021-07-14 , DOI: 10.1007/s10791-021-09394-4
Chia-Yang Chang , Shie-Jue Lee , Chih-Hung Wu , Chih-Feng Liu , Ching-Kuan Liu

Plagiarism is a common problem in the modern age. With the advance of Internet, it is more and more convenient to access other people’s writings or publications. When someone uses the content of a text in an undesirable way, plagiarism may occur. Plagiarism infringes the intellectual property rights, so it is a serious problem nowadays. However, detecting plagiarism effectively is a challenging work. Traditional methods, like vector space model or bag-of-words, are short of providing a good solution due to the incapability of handling the semantics of words satisfactorily. In this paper, we propose a new method for plagiarism detection. We use Word2vec to transform the words into word vectors which are able to reveal the semantic relationship among different words. Through word vectors, words are clustered into concepts. Then documents and their paragraphs are represented in terms of concepts, and plagiarism detection can be done more effectively. A number of experiments are conducted to demonstrate the good performance of our proposed method.

中文翻译：

使用单词语义概念进行文本文档中的抄袭检测

抄袭是现代社会的一个普遍问题。随着互联网的进步，获取他人的著作或出版物变得越来越方便。当有人以不合要求的方式使用文本内容时，可能会发生剽窃。抄袭侵犯了知识产权，因此这是当今一个严重的问题。然而，有效地检测抄袭是一项具有挑战性的工作。传统的方法，如向量空间模型或词袋模型，由于无法令人满意地处理单词的语义，因此无法提供良好的解决方案。在本文中，我们提出了一种新的抄袭检测方法。我们使用Word2vec将单词转换为词向量，词向量能够揭示不同单词之间的语义关系。通过词向量，词被聚类成概念。然后用概念来表示文档及其段落，并且可以更有效地进行抄袭检测。进行了大量的实验来证明我们提出的方法的良好性能。

更新日期：2021-07-14

点击分享查看原文

点击收藏