Efficient model similarity estimation with robust hashing,Software and Systems Modeling

当前位置： X-MOL 学术 › Softw. Syst. Model. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient model similarity estimation with robust hashing
Software and Systems Modeling ( IF 2.0 ) Pub Date : 2021-08-05 , DOI: 10.1007/s10270-021-00915-9
Salvador Martínez ₁ , Sébastien Gérard ₂ , Jordi Cabot ₃

Affiliation

As model-driven engineering (MDE) is increasingly adopted in complex industrial scenarios, modeling artefacts become a key and strategic asset for companies. As such, any MDE ecosystem must provide mechanisms to protect and exploit them. Current approaches depend on the calculation of the relative similarity among pairs of models. Unfortunately, model similarity calculation mechanisms are computationally expensive which prevents their use in large repositories or very large models. In this sense, this paper explores the adaptation of the robust hashing technique to the MDE domain as an efficient estimation method for model similarity. Indeed, robust hashing algorithms (i.e., hashing algorithms that generate similar outputs from similar input data) have proved useful as a key building block in intellectual property protection, authenticity assessment and fast comparison and retrieval solutions for different application domains. We present a detailed method for the generation of robust hashes for different types of models. Our approach is based on the translation to the MDE domain of diverse techniques such as summary extraction, minhash generation and locality-sensitive hash function families, originally developed for the comparison and classification of large datasets. We validate our approach with a prototype implementation and show that: (1) our approach can deal with any graph-based model representation; (2) a strong correlation exists between the similarity calculated directly on the robust hashes and a distance metric calculated over the original models; and (3) our approach scales well on large models and greatly reduces the time required to find similar models in large repositories.

中文翻译：

具有鲁棒散列的高效模型相似性估计

随着模型驱动工程 (MDE) 在复杂的工业场景中越来越多地被采用，建模人工制品成为公司的关键和战略资产。因此，任何 MDE 生态系统都必须提供保护和利用它们的机制。当前的方法取决于对模型对之间的相对相似性的计算。不幸的是，模型相似性计算机制在计算上很昂贵，这阻碍了它们在大型存储库或非常大的模型中的使用。从这个意义上说，本文探讨了鲁棒哈希技术对 MDE 域的适应性，作为模型相似性的有效估计方法。事实上，稳健的散列算法（即从相似的输入数据生成相似输出的散列算法）已被证明是知识产权保护的关键组成部分，针对不同应用领域的真实性评估和快速比较和检索解决方案。我们提出了一种为不同类型的模型生成健壮哈希的详细方法。我们的方法基于将各种技术（例如摘要提取、minhash 生成和局部敏感哈希函数系列）转换为 MDE 域，这些技术最初是为大型数据集的比较和分类而开发的。我们用原型实现验证了我们的方法，并表明：（1）我们的方法可以处理任何基于图的模型表示；(2) 在鲁棒哈希上直接计算的相似度与在原始模型上计算的距离度量之间存在很强的相关性；

更新日期：2021-08-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11