Abstract
Plagiarism occurs when we use the ideas, expressions, work, and words of other authors and do not give them the required attribution. The major contributing factor in plagiarism is the availability of a high amount of data and information on the internet that can be swiftly accessed. The proposed system introduces an extrinsic plagiarism detection approach inspired by cognition because it utilizes semantic knowledge to detect the plagiarized part from the text without human involvement. A lexical database like WordNet assists the computers to perceive the data and information. These days most of the plagiarism detection systems fail to detect highly complex cases of plagiarism. The proposed system uses Dice measure as similarity measure for finding the semantic resemblance between the pair of sentences. It also uses linguistic features like path similarity, depth estimation measure to compute the resemblance between the pair of words and these features are combined by assigning different weights to them. It is capable of identifying cases like restructuring, paraphrasing, verbatim copy, and synonymized plagiarism. It has been evaluated on the PAN-PC-11 corpus. The results obtained from the proposed system signify that it has outperformed other existing systems on PAN-PC-11 in terms of precision, recall, F-measure, and PlagDet score. The proposed system has innovative approach, but the results are somehow close and reasonably better than the existing systems.
Similar content being viewed by others
References
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Gene Ontology Consortium: The gene ontology (GO) database and informatics resource. Nucl. Acids Res. 32(90001), 258D–261D (2004)
Altheide, P.: Spatial data transfer standard (SDTS). Encycl. GIS 1994, 1087–1095 (2008)
Chong, M.Y.M.: A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis Rep (2013)
Potthast, M.; Stein, B.; Barrón-Cedeño, A.; Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of 23rd International Conference on Computational Linguistics, Coling 2010, pp. 997–1005 (2010)
Mahdavi, P.; Siadati, Z.; Yaghmaee, F.: Automatic external Persian plagiarism detection using vector space model. In: Proceedings of the 4th International Conference on Computer and Knowledge Engineering, ICCKE 2014, pp. 697–702 (2014)
Oberreuter, G.; Velásquez, J.D.: Text mining applied to plagiarism detection: the use of words for detecting deviations in the writing style. Expert Syst. Appl. 40(9), 3756–3763 (2013)
Rao, S.; Gupta, P.; Singhal, K.; Majumder, P.: External & intrinsic plagiarism detection: VSM & discourse markers based approach notebook for PAN at CLEF 2011. CEUR Workshop Proc. 1177, 2–6 (2011)
Wang, S.; Qi, H.; Kong, L.; Nu, C.: Combination of VSM and Jaccard coefficient for external plagiarism detection. Proc. Int. Conf. Mach. Learn. Cybern. 4, 1880–1885 (2013)
Vani, K.; Gupta, D.: Detection of idea plagiarism using syntax–semantic concept extractions with genetic algorithm. Expert Syst. Appl. 73, 11–26 (2017)
Alzahrani, S.M.; Salim, N.; Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(2), 133–149 (2012)
Abdi, A.; Shamsuddin, S.M.; Idris, N.; Alguliyev, R.M.; Aliguliyev, R.M.: A linguistic treatment for automatic external plagiarism detection. Knowl. Based Syst. 135, 135–146 (2017)
Vani, K.; Gupta, D.: Using K-means cluster based techniques in external plagiarism detection. In: Proceedings of 2014 International Conference on Contemporary Computing and Informatics, IC3I 2014, pp. 1268–1273 (2014)
Gildea, D.; Jurafsky, D.: Automatic labeling of semantic roles. Comput. Linguist. 28(3), 245–288 (2002)
Kipper, K.; Dang, H.T.; Palmer, M.: Class-based construction of a verb lexicon lexicalized tree-adjoining grammars (2000)
Ekbal, A.; Saha, S.; Choudhary, G.: Plagiarism detection in text using vector space model. In: Proceedings of the 2012 12th International Conference on Hybrid Intelligent Systems, HIS 2012, pp. 366–371 (2012)
Leilei, K.; Haoliang, Q.; Shuai, W.; Cuixia, D.; Suhong, W.; Yong, H.: Approaches for candidate document retrieval and detailed comparison of plagiarism detection. In: Noteb. PAN CLEF 2012 (2012)
Jun-Peng, B.; Jun-Yi, S.; Xiao-Dong, L.; Qin-Bao, S.: A survey on natural language text copy detection. J. Softw. 14(10), 1753–1760 (2003)
Osman, A.H.; Salim, N.; Binwahlan, M.S.; Twaha, S.; Kumar, Y.J.; Abuobieda, A.: Plagiarism detection scheme based on semantic role labeling. In: Proceedings—2012 International Conference on Information Retrieval and Knowledge Management, CAMP’12, pp. 30–33 (2012)
Kang, N.; Gelbukh, A.; Han, S.: PPChecker: plagiarism pattern checker in document copy detection, no. Dcd, pp. 661–667 (2006)
Rahman, M.K.M.; Chow, T.W.S.: Content-based hierarchical document organization using multi-layer hybrid network and tree-structured features. Expert Syst. Appl. 37(4), 2874–2881 (2010)
Osman, A.H.; Salim, N.; Binwahlan, M.S.; Alteeb, R.; Abuobieda, A.: An improved plagiarism detection scheme based on semantic role labeling. Appl. Soft Comput. J. 12(5), 1493–1502 (2012)
Paul, M.; Jamal, S.: An Improved SRL based plagiarism detection technique using sentence ranking. In: 2014 International Conference on Information and Communication Technologies, (ICICT), pp. 223–230 (2014)
Kent, C.K.; Salim, N.: Web based cross language plagiarism detection. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation (CIMSiM), pp. 199–204 (2010)
Alzahrani, S.; Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler Harman 1176, 1–8 (2010)
Sahi, M.; Gupta, V.: A novel technique for detecting plagiarism in documents exploiting information sources. Cognit. Comput. 9(6), 852–867 (2017)
Lin, D.: An information-theoretic definition of similarity. In: 1998 Proceedings of the 15th International Conference ICML 1998, Madison, WI, pp. 296–304 (1998)
Palmer, M.: Verb semantics and lexical Zhibiao W u. In: 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138 (1994)
Sahi, M.; Gupta, V.: Efficiency comparison of various plagiarism detection techniques. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 2974–2978 (2016)
Abdi, A.; Idris, N.; Alguliyev, R.M.; Aliguliyev, R.M.: PDLK: plagiarism detection using linguistic knowledge. Expert Syst. Appl. 42(22), 8936–8946 (2015)
Vani, K.; Gupta, D.: Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In: 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, pp. 1578–1584 (2015)
Vani, K.; Gupta, D.: Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: comparisons, analysis and challenges. Inf. Process. Manag. 54(3), 408–432 (2018)
Baba, K.: Fast plagiarism detection based on simple document similarity. In: 2017 Twelfth International Conference on Digital Information Management (ICDIM), pp. 54–58 (2017)
Sindhu, L.; Idicula, S.M.: A plagiarism detection system for Malayalam text based documents with full and partial copy. Proc. Technol. 25, 372–377 (2016)
Ezzikouri, H.; Oukessou, M.; Erritali, M.; Madani, Y.: Fuzzy cross language plagiarism detection approach based on semantic similarity and Hadoop MapReduce. In: Recent Advances in Intuitionistic Fuzzy Logic Systems, pp. 181–190. Springer (2019)
Tomasic, A.; Garcia-Molina, H.: Query processing and inverted indices in shared-nothing text document information retrieval systems. VLDB J. 2(3), 243–275 (1993)
Leacock, C.; Chodorow, M.: Combining local context and WordNet similarity for word sense identification. WordNet Electron. Lex. Database 49(2), 265–283 (1998)
Pedersen, T.; Patwardhan, S.; Michelizzi, J.: WordNet:: similarity—measuring the relatedness of concepts. In: Naacl, pp. 38–41 (2004)
Li, Y.; McLean, D.; Bandar, Z.A.; O’Shea, J.D.; Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Stamatatos, E.: Plagiarism detection using stopword n-grams. J. Am. Soc. Inf. Sci. Technol. 62(12), 2512–2527 (2011)
El-Alfy, E.S.M.; Abdel-Aal, R.E.; Al-Khatib, W.G.; Alvi, F.: Boosting paraphrase detection through textual similarity metrics with abductive networks. Appl. Soft Comput. J. 26, 444–453 (2015)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahuja, L., Gupta, V. & Kumar, R. A New Hybrid Technique for Detection of Plagiarism from Text Documents. Arab J Sci Eng 45, 9939–9952 (2020). https://doi.org/10.1007/s13369-020-04565-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-04565-9