Integration and classification approach based on probabilistic semantic association for big data,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Integration and classification approach based on probabilistic semantic association for big data
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2021-10-11 , DOI: 10.1007/s40747-021-00548-x
Vishnu VandanaKolisetty ₁ , Dharmendra Singh Rajput ₂

Affiliation

The process of integration through classification provides a unified representation of diverse data sources in Big data. The main challenges of big data analysis are due to the various granularities, irreconcilable data models, and multipart interdependencies between data content. Previously designed models were facing problems in integrating and analyzing big data due to highly complex and dynamic multi-source and heterogeneous information variation and also in processing and classifying the association among the attributes in a schema. In this paper, we propose an integration and classification approach through designing a Probabilistic Semantic Association (PSA) method to generate the feature pattern for the sources of big data. The PSA approach is trained to understand the data association and dependency pattern between the data class and incoming data to map the data objects accurately. It initially builds a data integration mechanism by transforming data into structured and learn to utilize the trained knowledge to classify the probabilistic association among the data and knowledge patterns. Later it builds a data analysis mechanism to analyze the mapped data through PSA to evaluate the integration efficiency. An experimental evaluation is performed over a real-time crime dataset generated from multiple locations having various events classes. The analysis of results confined that the utilization of knowledge patterns of accurate classification to enhance the integration of multiple source data is appropriate. The measure of precision, recall, fall-out rate, and F-measure approve the efficiency of the proposed PSA method. Even in comparison with the state-of-art classification method and with SC-LDA algorithm shows an improvisation in the prediction accuracy and enhance the data integration.

中文翻译：

基于概率语义关联的大数据整合分类方法

通过分类整合的过程提供了大数据中不同数据源的统一表示。大数据分析的主要挑战是由于各种粒度、不可调和的数据模型以及数据内容之间的多部分相互依赖性。由于高度复杂和动态的多源异构信息变化，以及处理和分类模式中属性之间的关联，先前设计的模型在集成和分析大数据方面面临问题。在本文中，我们通过设计一种概率语义关联（PSA）方法来为大数据源生成特征模式，从而提出了一种集成和分类方法。PSA 方法经过训练以了解数据类和传入数据之间的数据关联和依赖模式，以准确映射数据对象。它最初通过将数据转化为结构化来构建数据集成机制，并学会利用经过训练的知识对数据和知识模式之间的概率关联进行分类。后来它建立了一个数据分析机制，通过PSA分析映射的数据来评估集成效率。对从具有各种事件类别的多个位置生成的实时犯罪数据集进行实验评估。结果分析表明，利用准确分类的知识模式来增强多源数据的整合是合适的。精确度、召回率、脱落率的衡量标准，和 F-measure 批准提议的 PSA 方法的效率。即使与最先进的分类方法和 SC-LDA 算法相比，它也显示出预测准确性的改进并增强了数据集成。

更新日期：2021-10-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11