Skip to main content
Log in

A New Hybrid Technique for Detection of Plagiarism from Text Documents

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Plagiarism occurs when we use the ideas, expressions, work, and words of other authors and do not give them the required attribution. The major contributing factor in plagiarism is the availability of a high amount of data and information on the internet that can be swiftly accessed. The proposed system introduces an extrinsic plagiarism detection approach inspired by cognition because it utilizes semantic knowledge to detect the plagiarized part from the text without human involvement. A lexical database like WordNet assists the computers to perceive the data and information. These days most of the plagiarism detection systems fail to detect highly complex cases of plagiarism. The proposed system uses Dice measure as similarity measure for finding the semantic resemblance between the pair of sentences. It also uses linguistic features like path similarity, depth estimation measure to compute the resemblance between the pair of words and these features are combined by assigning different weights to them. It is capable of identifying cases like restructuring, paraphrasing, verbatim copy, and synonymized plagiarism. It has been evaluated on the PAN-PC-11 corpus. The results obtained from the proposed system signify that it has outperformed other existing systems on PAN-PC-11 in terms of precision, recall, F-measure, and PlagDet score. The proposed system has innovative approach, but the results are somehow close and reasonably better than the existing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  2. Gene Ontology Consortium: The gene ontology (GO) database and informatics resource. Nucl. Acids Res. 32(90001), 258D–261D (2004)

    Article  Google Scholar 

  3. Altheide, P.: Spatial data transfer standard (SDTS). Encycl. GIS 1994, 1087–1095 (2008)

    Article  Google Scholar 

  4. Chong, M.Y.M.: A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis Rep (2013)

  5. Potthast, M.; Stein, B.; Barrón-Cedeño, A.; Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of 23rd International Conference on Computational Linguistics, Coling 2010, pp. 997–1005 (2010)

  6. Mahdavi, P.; Siadati, Z.; Yaghmaee, F.: Automatic external Persian plagiarism detection using vector space model. In: Proceedings of the 4th International Conference on Computer and Knowledge Engineering, ICCKE 2014, pp. 697–702 (2014)

  7. Oberreuter, G.; Velásquez, J.D.: Text mining applied to plagiarism detection: the use of words for detecting deviations in the writing style. Expert Syst. Appl. 40(9), 3756–3763 (2013)

    Article  Google Scholar 

  8. Rao, S.; Gupta, P.; Singhal, K.; Majumder, P.: External & intrinsic plagiarism detection: VSM & discourse markers based approach notebook for PAN at CLEF 2011. CEUR Workshop Proc. 1177, 2–6 (2011)

    Google Scholar 

  9. Wang, S.; Qi, H.; Kong, L.; Nu, C.: Combination of VSM and Jaccard coefficient for external plagiarism detection. Proc. Int. Conf. Mach. Learn. Cybern. 4, 1880–1885 (2013)

    Google Scholar 

  10. Vani, K.; Gupta, D.: Detection of idea plagiarism using syntax–semantic concept extractions with genetic algorithm. Expert Syst. Appl. 73, 11–26 (2017)

    Article  Google Scholar 

  11. Alzahrani, S.M.; Salim, N.; Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(2), 133–149 (2012)

    Article  Google Scholar 

  12. Abdi, A.; Shamsuddin, S.M.; Idris, N.; Alguliyev, R.M.; Aliguliyev, R.M.: A linguistic treatment for automatic external plagiarism detection. Knowl. Based Syst. 135, 135–146 (2017)

    Article  Google Scholar 

  13. Vani, K.; Gupta, D.: Using K-means cluster based techniques in external plagiarism detection. In: Proceedings of 2014 International Conference on Contemporary Computing and Informatics, IC3I 2014, pp. 1268–1273 (2014)

  14. Gildea, D.; Jurafsky, D.: Automatic labeling of semantic roles. Comput. Linguist. 28(3), 245–288 (2002)

    Article  Google Scholar 

  15. Kipper, K.; Dang, H.T.; Palmer, M.: Class-based construction of a verb lexicon lexicalized tree-adjoining grammars (2000)

  16. Ekbal, A.; Saha, S.; Choudhary, G.: Plagiarism detection in text using vector space model. In: Proceedings of the 2012 12th International Conference on Hybrid Intelligent Systems, HIS 2012, pp. 366–371 (2012)

  17. Leilei, K.; Haoliang, Q.; Shuai, W.; Cuixia, D.; Suhong, W.; Yong, H.: Approaches for candidate document retrieval and detailed comparison of plagiarism detection. In: Noteb. PAN CLEF 2012 (2012)

  18. Jun-Peng, B.; Jun-Yi, S.; Xiao-Dong, L.; Qin-Bao, S.: A survey on natural language text copy detection. J. Softw. 14(10), 1753–1760 (2003)

    Google Scholar 

  19. Osman, A.H.; Salim, N.; Binwahlan, M.S.; Twaha, S.; Kumar, Y.J.; Abuobieda, A.: Plagiarism detection scheme based on semantic role labeling. In: Proceedings—2012 International Conference on Information Retrieval and Knowledge Management, CAMP’12, pp. 30–33 (2012)

  20. Kang, N.; Gelbukh, A.; Han, S.: PPChecker: plagiarism pattern checker in document copy detection, no. Dcd, pp. 661–667 (2006)

  21. Rahman, M.K.M.; Chow, T.W.S.: Content-based hierarchical document organization using multi-layer hybrid network and tree-structured features. Expert Syst. Appl. 37(4), 2874–2881 (2010)

    Article  Google Scholar 

  22. Osman, A.H.; Salim, N.; Binwahlan, M.S.; Alteeb, R.; Abuobieda, A.: An improved plagiarism detection scheme based on semantic role labeling. Appl. Soft Comput. J. 12(5), 1493–1502 (2012)

    Article  Google Scholar 

  23. Paul, M.; Jamal, S.: An Improved SRL based plagiarism detection technique using sentence ranking. In: 2014 International Conference on Information and Communication Technologies, (ICICT), pp. 223–230 (2014)

  24. Kent, C.K.; Salim, N.: Web based cross language plagiarism detection. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation (CIMSiM), pp. 199–204 (2010)

  25. Alzahrani, S.; Salim, N.: Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler Harman 1176, 1–8 (2010)

    Google Scholar 

  26. Sahi, M.; Gupta, V.: A novel technique for detecting plagiarism in documents exploiting information sources. Cognit. Comput. 9(6), 852–867 (2017)

    Article  Google Scholar 

  27. Lin, D.: An information-theoretic definition of similarity. In: 1998 Proceedings of the 15th International Conference ICML 1998, Madison, WI, pp. 296–304 (1998)

  28. Palmer, M.: Verb semantics and lexical Zhibiao W u. In: 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138 (1994)

  29. Sahi, M.; Gupta, V.: Efficiency comparison of various plagiarism detection techniques. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 2974–2978 (2016)

  30. Abdi, A.; Idris, N.; Alguliyev, R.M.; Aliguliyev, R.M.: PDLK: plagiarism detection using linguistic knowledge. Expert Syst. Appl. 42(22), 8936–8946 (2015)

    Article  Google Scholar 

  31. Vani, K.; Gupta, D.: Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In: 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, pp. 1578–1584 (2015)

  32. Vani, K.; Gupta, D.: Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: comparisons, analysis and challenges. Inf. Process. Manag. 54(3), 408–432 (2018)

    Article  Google Scholar 

  33. Baba, K.: Fast plagiarism detection based on simple document similarity. In: 2017 Twelfth International Conference on Digital Information Management (ICDIM), pp. 54–58 (2017)

  34. Sindhu, L.; Idicula, S.M.: A plagiarism detection system for Malayalam text based documents with full and partial copy. Proc. Technol. 25, 372–377 (2016)

    Article  Google Scholar 

  35. Ezzikouri, H.; Oukessou, M.; Erritali, M.; Madani, Y.: Fuzzy cross language plagiarism detection approach based on semantic similarity and Hadoop MapReduce. In: Recent Advances in Intuitionistic Fuzzy Logic Systems, pp. 181–190. Springer (2019)

  36. Tomasic, A.; Garcia-Molina, H.: Query processing and inverted indices in shared-nothing text document information retrieval systems. VLDB J. 2(3), 243–275 (1993)

    Article  Google Scholar 

  37. Leacock, C.; Chodorow, M.: Combining local context and WordNet similarity for word sense identification. WordNet Electron. Lex. Database 49(2), 265–283 (1998)

    Google Scholar 

  38. Pedersen, T.; Patwardhan, S.; Michelizzi, J.: WordNet:: similarity—measuring the relatedness of concepts. In: Naacl, pp. 38–41 (2004)

  39. Li, Y.; McLean, D.; Bandar, Z.A.; O’Shea, J.D.; Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  40. Stamatatos, E.: Plagiarism detection using stopword n-grams. J. Am. Soc. Inf. Sci. Technol. 62(12), 2512–2527 (2011)

    Article  Google Scholar 

  41. El-Alfy, E.S.M.; Abdel-Aal, R.E.; Al-Khatib, W.G.; Alvi, F.: Boosting paraphrase detection through textual similarity metrics with abductive networks. Appl. Soft Comput. J. 26, 444–453 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vishal Gupta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahuja, L., Gupta, V. & Kumar, R. A New Hybrid Technique for Detection of Plagiarism from Text Documents. Arab J Sci Eng 45, 9939–9952 (2020). https://doi.org/10.1007/s13369-020-04565-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-020-04565-9

Keywords

Navigation