A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario,International Journal of Information Technology & Decision Making

当前位置： X-MOL 学术 › Int. J. Inf. Technol. Decis. Mak. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario
International Journal of Information Technology & Decision Making ( IF 2.5 ) Pub Date : 2020-06-12 , DOI: 10.1142/s0219622020500182
Francesco Cauteruccio ₁ , Paolo Lo Giudice ₂ , Lorenzo Musarella ₂ , Giorgio Terracina ₁ , Domenico Ursino ₃ , Luca Virgili ₃

Affiliation

The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.

中文翻译：

一种从大数据场景中的结构化、半结构化和非结构化源中提取模式间属性的轻量级方法

模式间属性（例如，同义词、同名词、下名词和子模式相似性）的知识对于允许在以不同格式为特征的来源中做出决策起着关键作用。过去，已经提出了大量和多种从结构化和半结构化数据中导出模式间属性的方法。然而，目前估计超过 80% 的数据源是非结构化的。此外，通常参与交互的来源数量比过去要多得多。因此，在这种新情况下，需要采用新方法来解决模式间属性派生问题。在本文中，我们旨在通过提出一种能够从大量结构化、

更新日期：2020-06-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11