Efficient model similarity estimation with robust hashing

Martínez, Salvador; Gérard, Sébastien; Cabot, Jordi

doi:10.1007/s10270-021-00915-9

Efficient model similarity estimation with robust hashing

Regular Paper
Published: 05 August 2021

Volume 21, pages 337–361, (2022)
Cite this article

Software and Systems Modeling Aims and scope Submit manuscript

413 Accesses
5 Citations
2 Altmetric
Explore all metrics

Abstract

As model-driven engineering (MDE) is increasingly adopted in complex industrial scenarios, modeling artefacts become a key and strategic asset for companies. As such, any MDE ecosystem must provide mechanisms to protect and exploit them. Current approaches depend on the calculation of the relative similarity among pairs of models. Unfortunately, model similarity calculation mechanisms are computationally expensive which prevents their use in large repositories or very large models. In this sense, this paper explores the adaptation of the robust hashing technique to the MDE domain as an efficient estimation method for model similarity. Indeed, robust hashing algorithms (i.e., hashing algorithms that generate similar outputs from similar input data) have proved useful as a key building block in intellectual property protection, authenticity assessment and fast comparison and retrieval solutions for different application domains. We present a detailed method for the generation of robust hashes for different types of models. Our approach is based on the translation to the MDE domain of diverse techniques such as summary extraction, minhash generation and locality-sensitive hash function families, originally developed for the comparison and classification of large datasets. We validate our approach with a prototype implementation and show that: (1) our approach can deal with any graph-based model representation; (2) a strong correlation exists between the similarity calculated directly on the robust hashes and a distance metric calculated over the original models; and (3) our approach scales well on large models and greatly reduces the time required to find similar models in large repositories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Process Model Similarity Techniques for Process Querying

Systematic review of matching techniques used in model-driven methodologies

Article Open access 01 November 2019

Ferenc Attila Somogyi & Mark Asztalos

Control Variates for Similarity Search

Notes

http://web.emn.fr/x-info/atlanmod/index.php?title=Zoos
https://gitlab.com/smartine/RobustModelHashing
note that EMF models may have more than one root or just store the elements at the package level without root (we can then consider all those elements as roots). In that case, root contents are added sequentially to A.
the use of the EMFCompare differences list’s size as metric, requires the use of models of similar size for it to be meaningful.
https://code.google.com/archive/a/eclipselabs.org/p/ecore-mutator
we used Spearman rank correlation obtaining a coefficients ranging from 81 to 90.
Experiments are performed on A Intel® \({\hbox {Core}}^{\hbox {TM}}\) i5-6200U CPU @ 2.30GHz 4 cores, running Ubuntu 16.04
https://github.com/atlanmod/mondo-atlzoo-benchmark/tree/master/fr.inria.atlanmod.instantiator
Hashing may also be used as a key building block in the construction of a watermarking scheme [1] for models but to be most effective the hashing creation process should be different from the one presented here [39].

References

Fridrich, J., Goljan, M.: Robust hash functions for digital watermarking. In Proceedings International Conference on Information Technology: Coding and Computing, 2000., pp. 178–183. IEEE, (2000)
Lee, S.-H., Kwon, K.-R.: Robust 3D mesh model hashing based on feature object. Digit. Sign. Process. 22(5), 744–759 (2012)
Article MathSciNet Google Scholar
Steinebach, M., Klöckner, P., Reimers, N., Wienand, D., Wolf, P.: Robust Hash Algorithms for Text. In IFIP International Conference on Communications and Multimedia Security, pp. 135–144. Springer Berlin Heidelberg, Berlin, Heidelberg, (2013). ISBN 978-3-642-40779-6
Rivest, R.: The MD5 message-digest algorithm. (1992)
Eastlake, D., Jones, P.: Us secure hash algorithm 1 (SHA1). Technical report, 2001
Feistel, H.: Cryptography and computer privacy. Sci. Am. 228(5), 15–23 (1973)
Article Google Scholar
Broder, A. Z.: On the resemblance and containment of documents. In Proceedings on Compression and Complexity of Sequences 1997., pp. 21–29. IEEE, (1997)
Martínez, S., Gérard, S., Cabot, J.: Robust hashing for models. In Proceedings of the 21th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pages 312–322, (2018)
Steinberg, D., Budinsky, F., Paternosto, M., Merks, E.:EMF: Eclipse Modeling Framework 2.0. Addison-Wesley Professional, 2nd edition, 2009. ISBN 0321331885
Syriani, E., Bill, R., Wimmer, M.: Domain-specific model distance measures. J. Object Technol. 18(3), 1–19 (2019)
Article Google Scholar
Bézivin, Jean: On the unification power of models. Softw. Syst. Model. 4(2), 171–188 (2005)
Article Google Scholar
Lano, K., Rahimi, S.K.: Slicing techniques for UML models. J. Object Technol. 10(11), 1–49 (2011)
Google Scholar
Blouin, A., Combemale, B., Baudry, B., Beaudoux, O.: Modeling model slicers. In International Conference on Model Driven Engineering Languages and Systems, pages 62–76. Springer, (2011)
Struber, D., Rubin, J., Taentzer, G., Chechik, M.: Splitting models using information retrieval and model crawling techniques. In International Conference on Fundamental Approaches to Software Engineering, pages 47–62. Springer, (2014)
Brottier, E., Fleurey, F., Steel, J., Baudry, B., Le Traon, Y.: Metamodel-based test generation for model transformations: an algorithm and a tool. In Software R eliability Engineering, 2006. ISSRE’06. 17th International Symposium on, pp. 85–94. IEEE, (2006)
Scheidgen, M.: Reference representation techniques for large models. In Proceedings of the Workshop on Scalability in Model Driven Engineering, p 5. ACM, (2013)
Reddy, R., France, R., Ghosh, S., Fleurey, F., Baudry, B.: Model composition-a signature-based approach. In Aspect Oriented Modeling (AOM) Workshop, (2005)
Leskovec, J., Rajaraman, A., Ullman, J. D.: Mining of massive datasets. Cambridge university press, (2014)
Juels, A., Wattenberg, M.: A fuzzy commitment scheme. In Proceedings of the 6th ACM conference on Computer and communications security, pp. 28–36. ACM, (1999)
Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I.: Atl: A model transformation tool. Sci. Comput. Program. 72(1–2), 31–39 (2008)
Troya, J., Fleck, M., Kessentini, M., Wimmer, M., Alkhaze, B.: Rules and helpers dependencies in atl–technical report. Universidad de Sevilla, (2016)
Kehrer, T., Kelter, U., Pietsch, P., Schmidt, M.: Adaptability of model comparison tools. In 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 306–309. IEEE, (2012)
Kinneer, C., Herzig, S. J. I.: Dissimilarity measures for clustering space mission architectures. In Proceedings of the 21th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 392–402, (2018)
Semeráth, Oszkár, Farkas, Rebeka, Bergmann, Gábor, Varró, Dániel: Diversity of graph models and graph generators in mutation testing. Int. J. Softw. Tools Technol. Transf. 22(1), 57–78 (2020)
Article Google Scholar
Kolovos, D. S., Di Ruscio, D., Pierantonio, A., Paige, R. F.: Different models for model matching: An analysis of approaches to support model differencing. In 2009 ICSE Workshop on Comparison and Versioning of Software Models, pp. 1–6. IEEE, (2009)
Brun, Cédric, Pierantonio, Alfonso: Model differences in the eclipse modeling framework. UPGRADE, Eur J Inf Prof 9(2), 29–34 (2008)
Google Scholar
Ferdjoukh,Adel., Galinier,Florian., Bourreau,Eric., Chateau,Annie., Nebut,Clémentine. :Measuring differences to compare sets of models and improve diversity in mde. In ICSEA: International Conference on Software Engineering Advances, (2017)
Ferdjoukh, A., Galinier, F., Bourreau, E., Chateau, A., Nebut, C.: Measurement and generation of diversity and meaningfulness in model driven engineering. (2018)
Toulmé, A., Inc, I.:Presentation of EMF compare utility. In Eclipse Modeling Symposium, pages 1–8, (2006)
Wachsmuth, G.: Metamodel adaptation and model co-adaptation. In European Conference on Object-Oriented Programming, pages 600–624. Springer, (2007)
Ledeczi, A., Maroti, M., Bakay, A., Karsai, G., Garrett, J., Thomason, C., Nordstrom, G., Sprinkle, J., Volgyesi, P.: The generic modeling environment. In Workshop on Intelligent Signal Processing, Budapest, Hungary, volume 17, page 1, (2001)
López, J. A. H., Cuadrado, J. S.: Mar: a structure-based search engine for models. In Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 57–67, (2020)
Martínez, S., Wimmer, M., Cabot, J.: Efficient plagiarism detection for software modeling assignments. Comput. Sci. Edu. 30, 187–215 (2020)
Article Google Scholar
Basciani, F., Di Rocco,J., Di Ruscio, D., Di Salle, A., Iovino, L., Pierantonio, A.: Mdeforge: an extensible web-based modeling platform. In CloudMDE@ MoDELS, 1242, 66–75 (2014)
Wille, D., Babur, Ö., Cleophas, L., van den Seidl, C., Brand, M., Schaefer, I.: Improving custom-tailored variability mining using outlier and cluster detection. Sci. Comput. Program. 163, 62–84 (2018)
Article Google Scholar
O Constant. Emf diff/merge, (2012)
Kolovos, D. S.: Establishing correspondences between models with the epsilon comparison language. In European Conference on Model Driven Architecture-Foundations and Applications, pp. 146–157. Springer, (2009)
Falleri, J., Huchard, M., Lafourcade, M., Nebut, C.: Metamodel matching for automatic model transformation generation. In International Conference on Model Driven Engineering Languages and Systems, pp. 326–340. Springer, (2008)
Martínez, S., Gérard, S., Cabot, J.: On watermarking for collaborative model-driven engineering. IEEE Access 6, 29715–29728 (2018)
Article Google Scholar
Papi, F. G., Hübner, J. F., de Brito, M.: Instrumenting accountability in MAS with blockchain. Accountability and Responsibility in Multiagent Systems, p 20
Neisse, R., Steri, G., Nai-Fovino, I.: A blockchain-based approach for data accountability and provenance tracking. arXiv preprint arXiv:1706.04507, (2017)
Karsh, R.K., Laskar, R.H., Richhariya, B.B.: Robust image hashing using ring partition-PGNMF and local features. SpringerPlus 5(1), 1995 (2016)
Article Google Scholar
Liu, YuLing, Xiao, Yong: A robust image hashing algorithm resistant against geometrical attacks. Radio Eng. 22(4), 1072–1081 (2013)
Google Scholar
Swaminathan, Ashwin, Mao, Yinian, Min, Wu: Robust and secure image hashing. IEEE Trans. Inf. Forens. Secur. 1(2), 215–230 (2006)
Article Google Scholar
Venkatesan, R., Koon, S-M., Jakubowski, M. H., Moulin, P.: Robust image hashing. In Proceedings 2000 International Conference on Image Processing 2000, vol. 3, pp. 664–666. IEEE, (2000)
Tarmissi, K., Hamza, A.B.: Information-theoretic hashing of 3D objects using spectral graph theory. Exp. Syst. Appl. 36(5), 9409–9414 (2009)
Article Google Scholar
Coskun, B., Sankur, B.: Robust video hash extraction. In 2004 12th European Signal Processing Conference, pp. 2295–2298. IEEE, (2004)
De Roover, Cedric, De Vleeschouwer, Christophe, Lefebvre, Frédéric, Macq, Benoit: Robust video hashing based on radial projections of key frames. IEEE Trans. Sign. Process. 53(10), 4020–4037 (2005)
Article MathSciNet Google Scholar
Michael C.: Locality-sensitive hashing for massive string-based ontology matching. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 134–140. IEEE, 2014
Noyrit, F., Gérard, S., Terrier, F.: Computer assisted integration of domain-specific modeling languages using text analysis techniques. In International Conference on Model Driven Engineering Languages and Systems, pp. 505–521. Springer, (2013)
Babur, Ö., Cleophas, L.: Using n-grams for the automated clustering of structural models. In International Conference on Current Trends in Theory and Practice of Informatics, pp. 510–524. Springer, (2017)
Babur, Önder, Cleophas, Loek, van den Brand, Mark: Metamodel clone detection with SAMOS. J. Comput. Lang. 51, 57–74 (2019)
Article Google Scholar
Cavnar, WB., Trenkle, JM.: N-gram-based text categorization. In Proceedings of the 3rd Symposium on Document Analysis and Information Retrieval (SDAIR), (1994)
Bézivin, J., Jouault, F., Valduriez, P.: On the need for megamodels. In Proceedings of the OOPSLA/GPCE: Best Practices for Model-Driven Software Development workshop, 19th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, (2004)
Song, Jingkuan, Yang, Yi, Li, Xuelong, Huang, Zi, Yang, Yang: Robust hashing with local models for approximate similarity search. IEEE Trans. Cybern. 44(7), 1225–1236 (2014)
Article Google Scholar
Pietsch, C., Ohrndorf, M., Kelter, U., Kehrer, T.: Incrementally slicing editable submodels. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 913–918. IEEE, (2017)
Taentzer, G., Kehrer, T., Pietsch, C., Kelter, U.: A formal framework for incremental model slicing. In International Conference on Fundamental Approaches to Software Engineering. Springer, Cham pp. 3–20 (2018)

Download references

Acknowledgements

This work has been partially funded by the Spanish government (LOCOSS project - PID2020-114615RB-I00) and the ECSEL Joint Undertaking (AIDOaRt project - grant agreement No 101007350).

Author information

Authors and Affiliations

IMT Atlantique, Lab-STICC, UMR 6285, Brest, France
Salvador Martínez
Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France
Sébastien Gérard
ICREA - UOC, Barcelona, Spain
Jordi Cabot

Authors

Salvador Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Gérard
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Cabot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvador Martínez.

Additional information

Communicated by Gregor Engels.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martínez, S., Gérard, S. & Cabot, J. Efficient model similarity estimation with robust hashing. Softw Syst Model 21, 337–361 (2022). https://doi.org/10.1007/s10270-021-00915-9

Download citation

Received: 27 April 2020
Revised: 11 April 2021
Accepted: 19 July 2021
Published: 05 August 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10270-021-00915-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Efficient model similarity estimation with robust hashing

Abstract

Access this article

Similar content being viewed by others

Process Model Similarity Techniques for Process Querying

Systematic review of matching techniques used in model-driven methodologies

Control Variates for Similarity Search

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient model similarity estimation with robust hashing

Abstract

Access this article

Similar content being viewed by others

Process Model Similarity Techniques for Process Querying

Systematic review of matching techniques used in model-driven methodologies

Control Variates for Similarity Search

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation