Skip to main content
Log in

Efficient model similarity estimation with robust hashing

  • Regular Paper
  • Published:
Software and Systems Modeling Aims and scope Submit manuscript

Abstract

As model-driven engineering (MDE) is increasingly adopted in complex industrial scenarios, modeling artefacts become a key and strategic asset for companies. As such, any MDE ecosystem must provide mechanisms to protect and exploit them. Current approaches depend on the calculation of the relative similarity among pairs of models. Unfortunately, model similarity calculation mechanisms are computationally expensive which prevents their use in large repositories or very large models. In this sense, this paper explores the adaptation of the robust hashing technique to the MDE domain as an efficient estimation method for model similarity. Indeed, robust hashing algorithms (i.e., hashing algorithms that generate similar outputs from similar input data) have proved useful as a key building block in intellectual property protection, authenticity assessment and fast comparison and retrieval solutions for different application domains. We present a detailed method for the generation of robust hashes for different types of models. Our approach is based on the translation to the MDE domain of diverse techniques such as summary extraction, minhash generation and locality-sensitive hash function families, originally developed for the comparison and classification of large datasets. We validate our approach with a prototype implementation and show that: (1) our approach can deal with any graph-based model representation; (2) a strong correlation exists between the similarity calculated directly on the robust hashes and a distance metric calculated over the original models; and (3) our approach scales well on large models and greatly reduces the time required to find similar models in large repositories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://web.emn.fr/x-info/atlanmod/index.php?title=Zoos

  2. https://gitlab.com/smartine/RobustModelHashing

  3. note that EMF models may have more than one root or just store the elements at the package level without root (we can then consider all those elements as roots). In that case, root contents are added sequentially to A.

  4. the use of the EMFCompare differences list’s size as metric, requires the use of models of similar size for it to be meaningful.

  5. https://code.google.com/archive/a/eclipselabs.org/p/ecore-mutator

  6. we used Spearman rank correlation obtaining a coefficients ranging from 81 to 90.

  7. Experiments are performed on A Intel® \({\hbox {Core}}^{\hbox {TM}}\) i5-6200U CPU @ 2.30GHz 4 cores, running Ubuntu 16.04

  8. https://github.com/atlanmod/mondo-atlzoo-benchmark/tree/master/fr.inria.atlanmod.instantiator

  9. Hashing may also be used as a key building block in the construction of a watermarking scheme [1] for models but to be most effective the hashing creation process should be different from the one presented here [39].

References

  1. Fridrich, J., Goljan, M.: Robust hash functions for digital watermarking. In Proceedings International Conference on Information Technology: Coding and Computing, 2000., pp. 178–183. IEEE, (2000)

  2. Lee, S.-H., Kwon, K.-R.: Robust 3D mesh model hashing based on feature object. Digit. Sign. Process. 22(5), 744–759 (2012)

    Article  MathSciNet  Google Scholar 

  3. Steinebach, M., Klöckner, P., Reimers, N., Wienand, D., Wolf, P.: Robust Hash Algorithms for Text. In IFIP International Conference on Communications and Multimedia Security, pp. 135–144. Springer Berlin Heidelberg, Berlin, Heidelberg, (2013). ISBN 978-3-642-40779-6

  4. Rivest, R.: The MD5 message-digest algorithm. (1992)

  5. Eastlake, D., Jones, P.: Us secure hash algorithm 1 (SHA1). Technical report, 2001

  6. Feistel, H.: Cryptography and computer privacy. Sci. Am. 228(5), 15–23 (1973)

    Article  Google Scholar 

  7. Broder, A. Z.: On the resemblance and containment of documents. In Proceedings on Compression and Complexity of Sequences 1997., pp. 21–29. IEEE, (1997)

  8. Martínez, S., Gérard, S., Cabot, J.: Robust hashing for models. In Proceedings of the 21th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pages 312–322, (2018)

  9. Steinberg, D., Budinsky, F., Paternosto, M., Merks, E.:EMF: Eclipse Modeling Framework 2.0. Addison-Wesley Professional, 2nd edition, 2009. ISBN 0321331885

  10. Syriani, E., Bill, R., Wimmer, M.: Domain-specific model distance measures. J. Object Technol. 18(3), 1–19 (2019)

    Article  Google Scholar 

  11. Bézivin, Jean: On the unification power of models. Softw. Syst. Model. 4(2), 171–188 (2005)

    Article  Google Scholar 

  12. Lano, K., Rahimi, S.K.: Slicing techniques for UML models. J. Object Technol. 10(11), 1–49 (2011)

    Google Scholar 

  13. Blouin, A., Combemale, B., Baudry, B., Beaudoux, O.: Modeling model slicers. In International Conference on Model Driven Engineering Languages and Systems, pages 62–76. Springer, (2011)

  14. Struber, D., Rubin, J., Taentzer, G., Chechik, M.: Splitting models using information retrieval and model crawling techniques. In International Conference on Fundamental Approaches to Software Engineering, pages 47–62. Springer, (2014)

  15. Brottier, E., Fleurey, F., Steel, J., Baudry, B., Le Traon, Y.: Metamodel-based test generation for model transformations: an algorithm and a tool. In Software R eliability Engineering, 2006. ISSRE’06. 17th International Symposium on, pp. 85–94. IEEE, (2006)

  16. Scheidgen, M.: Reference representation techniques for large models. In Proceedings of the Workshop on Scalability in Model Driven Engineering, p 5. ACM, (2013)

  17. Reddy, R., France, R., Ghosh, S., Fleurey, F., Baudry, B.: Model composition-a signature-based approach. In Aspect Oriented Modeling (AOM) Workshop, (2005)

  18. Leskovec, J., Rajaraman, A., Ullman, J. D.: Mining of massive datasets. Cambridge university press, (2014)

  19. Juels, A., Wattenberg, M.: A fuzzy commitment scheme. In Proceedings of the 6th ACM conference on Computer and communications security, pp. 28–36. ACM, (1999)

  20. Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I.: Atl: A model transformation tool. Sci. Comput. Program. 72(1–2), 31–39 (2008)

  21. Troya, J., Fleck, M., Kessentini, M., Wimmer, M., Alkhaze, B.: Rules and helpers dependencies in atl–technical report. Universidad de Sevilla, (2016)

  22. Kehrer, T., Kelter, U., Pietsch, P., Schmidt, M.: Adaptability of model comparison tools. In 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 306–309. IEEE, (2012)

  23. Kinneer, C., Herzig, S. J. I.: Dissimilarity measures for clustering space mission architectures. In Proceedings of the 21th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 392–402, (2018)

  24. Semeráth, Oszkár, Farkas, Rebeka, Bergmann, Gábor, Varró, Dániel: Diversity of graph models and graph generators in mutation testing. Int. J. Softw. Tools Technol. Transf. 22(1), 57–78 (2020)

    Article  Google Scholar 

  25. Kolovos, D. S., Di Ruscio, D., Pierantonio, A., Paige, R. F.: Different models for model matching: An analysis of approaches to support model differencing. In 2009 ICSE Workshop on Comparison and Versioning of Software Models, pp. 1–6. IEEE, (2009)

  26. Brun, Cédric, Pierantonio, Alfonso: Model differences in the eclipse modeling framework. UPGRADE, Eur J Inf Prof 9(2), 29–34 (2008)

    Google Scholar 

  27. Ferdjoukh,Adel., Galinier,Florian., Bourreau,Eric., Chateau,Annie., Nebut,Clémentine. :Measuring differences to compare sets of models and improve diversity in mde. In ICSEA: International Conference on Software Engineering Advances, (2017)

  28. Ferdjoukh, A., Galinier, F., Bourreau, E., Chateau, A., Nebut, C.: Measurement and generation of diversity and meaningfulness in model driven engineering. (2018)

  29. Toulmé, A., Inc, I.:Presentation of EMF compare utility. In Eclipse Modeling Symposium, pages 1–8, (2006)

  30. Wachsmuth, G.: Metamodel adaptation and model co-adaptation. In European Conference on Object-Oriented Programming, pages 600–624. Springer, (2007)

  31. Ledeczi, A., Maroti, M., Bakay, A., Karsai, G., Garrett, J., Thomason, C., Nordstrom, G., Sprinkle, J., Volgyesi, P.: The generic modeling environment. In Workshop on Intelligent Signal Processing, Budapest, Hungary, volume 17, page 1, (2001)

  32. López, J. A. H., Cuadrado, J. S.: Mar: a structure-based search engine for models. In Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 57–67, (2020)

  33. Martínez, S., Wimmer, M., Cabot, J.: Efficient plagiarism detection for software modeling assignments. Comput. Sci. Edu. 30, 187–215 (2020)

    Article  Google Scholar 

  34. Basciani, F., Di Rocco,J., Di Ruscio, D., Di Salle, A., Iovino, L., Pierantonio, A.: Mdeforge: an extensible web-based modeling platform. In CloudMDE@ MoDELS, 1242, 66–75 (2014)

  35. Wille, D., Babur, Ö., Cleophas, L., van den Seidl, C., Brand, M., Schaefer, I.: Improving custom-tailored variability mining using outlier and cluster detection. Sci. Comput. Program. 163, 62–84 (2018)

    Article  Google Scholar 

  36. O Constant. Emf diff/merge, (2012)

  37. Kolovos, D. S.: Establishing correspondences between models with the epsilon comparison language. In European Conference on Model Driven Architecture-Foundations and Applications, pp. 146–157. Springer, (2009)

  38. Falleri, J., Huchard, M., Lafourcade, M., Nebut, C.: Metamodel matching for automatic model transformation generation. In International Conference on Model Driven Engineering Languages and Systems, pp. 326–340. Springer, (2008)

  39. Martínez, S., Gérard, S., Cabot, J.: On watermarking for collaborative model-driven engineering. IEEE Access 6, 29715–29728 (2018)

    Article  Google Scholar 

  40. Papi, F. G., Hübner, J. F., de Brito, M.: Instrumenting accountability in MAS with blockchain. Accountability and Responsibility in Multiagent Systems, p 20

  41. Neisse, R., Steri, G., Nai-Fovino, I.: A blockchain-based approach for data accountability and provenance tracking. arXiv preprint arXiv:1706.04507, (2017)

  42. Karsh, R.K., Laskar, R.H., Richhariya, B.B.: Robust image hashing using ring partition-PGNMF and local features. SpringerPlus 5(1), 1995 (2016)

    Article  Google Scholar 

  43. Liu, YuLing, Xiao, Yong: A robust image hashing algorithm resistant against geometrical attacks. Radio Eng. 22(4), 1072–1081 (2013)

    Google Scholar 

  44. Swaminathan, Ashwin, Mao, Yinian, Min, Wu: Robust and secure image hashing. IEEE Trans. Inf. Forens. Secur. 1(2), 215–230 (2006)

    Article  Google Scholar 

  45. Venkatesan, R., Koon, S-M., Jakubowski, M. H., Moulin, P.: Robust image hashing. In Proceedings 2000 International Conference on Image Processing 2000, vol. 3, pp. 664–666. IEEE, (2000)

  46. Tarmissi, K., Hamza, A.B.: Information-theoretic hashing of 3D objects using spectral graph theory. Exp. Syst. Appl. 36(5), 9409–9414 (2009)

    Article  Google Scholar 

  47. Coskun, B., Sankur, B.: Robust video hash extraction. In 2004 12th European Signal Processing Conference, pp. 2295–2298. IEEE, (2004)

  48. De Roover, Cedric, De Vleeschouwer, Christophe, Lefebvre, Frédéric, Macq, Benoit: Robust video hashing based on radial projections of key frames. IEEE Trans. Sign. Process. 53(10), 4020–4037 (2005)

    Article  MathSciNet  Google Scholar 

  49. Michael C.: Locality-sensitive hashing for massive string-based ontology matching. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 134–140. IEEE, 2014

  50. Noyrit, F., Gérard, S., Terrier, F.: Computer assisted integration of domain-specific modeling languages using text analysis techniques. In International Conference on Model Driven Engineering Languages and Systems, pp. 505–521. Springer, (2013)

  51. Babur, Ö., Cleophas, L.: Using n-grams for the automated clustering of structural models. In International Conference on Current Trends in Theory and Practice of Informatics, pp. 510–524. Springer, (2017)

  52. Babur, Önder, Cleophas, Loek, van den Brand, Mark: Metamodel clone detection with SAMOS. J. Comput. Lang. 51, 57–74 (2019)

    Article  Google Scholar 

  53. Cavnar, WB., Trenkle, JM.: N-gram-based text categorization. In Proceedings of the 3rd Symposium on Document Analysis and Information Retrieval (SDAIR), (1994)

  54. Bézivin, J., Jouault, F., Valduriez, P.: On the need for megamodels. In Proceedings of the OOPSLA/GPCE: Best Practices for Model-Driven Software Development workshop, 19th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, (2004)

  55. Song, Jingkuan, Yang, Yi, Li, Xuelong, Huang, Zi, Yang, Yang: Robust hashing with local models for approximate similarity search. IEEE Trans. Cybern. 44(7), 1225–1236 (2014)

    Article  Google Scholar 

  56. Pietsch, C., Ohrndorf, M., Kelter, U., Kehrer, T.: Incrementally slicing editable submodels. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 913–918. IEEE, (2017)

  57. Taentzer, G., Kehrer, T., Pietsch, C., Kelter, U.: A formal framework for incremental model slicing. In International Conference on Fundamental Approaches to Software Engineering. Springer, Cham pp. 3–20 (2018)

Download references

Acknowledgements

This work has been partially funded by the Spanish government (LOCOSS project - PID2020-114615RB-I00) and the ECSEL Joint Undertaking (AIDOaRt project - grant agreement No 101007350).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvador Martínez.

Additional information

Communicated by Gregor Engels.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martínez, S., Gérard, S. & Cabot, J. Efficient model similarity estimation with robust hashing. Softw Syst Model 21, 337–361 (2022). https://doi.org/10.1007/s10270-021-00915-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10270-021-00915-9

Keywords

Navigation