当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Big data quality framework: a holistic approach to continuous quality management
Journal of Big Data ( IF 8.1 ) Pub Date : 2021-05-29 , DOI: 10.1186/s40537-021-00468-0
Ikbal Taleb , Mohamed Adel Serhani , Chafik Bouhaddioui , Rachida Dssouli

Big Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and analyzed to generate value-added data-driven insights and decisions. Degradation in Data Quality may result in unpredictable consequences. In this case, confidence and worthiness in the data and its source are lost. In the Big Data context, data characteristics, such as volume, multi-heterogeneous data sources, and fast data generation, increase the risk of quality degradation and require efficient mechanisms to check data worthiness. However, ensuring Big Data Quality (BDQ) is a very costly and time-consuming process, since excessive computing resources are required. Maintaining Quality through the Big Data lifecycle requires quality profiling and verification before its processing decision. A BDQ Management Framework for enhancing the pre-processing activities while strengthening data control is proposed. The proposed framework uses a new concept called Big Data Quality Profile. This concept captures quality outline, requirements, attributes, dimensions, scores, and rules. Using Big Data profiling and sampling components of the framework, a faster and efficient data quality estimation is initiated before and after an intermediate pre-processing phase. The exploratory profiling component of the framework plays an initial role in quality profiling; it uses a set of predefined quality metrics to evaluate important data quality dimensions. It generates quality rules by applying various pre-processing activities and their related functions. These rules mainly aim at the Data Quality Profile and result in quality scores for the selected quality attributes. The framework implementation and dataflow management across various quality management processes have been discussed, further some ongoing work on framework evaluation and deployment to support quality evaluation decisions conclude the paper.



中文翻译:

大数据质量框架:持续质量管理的整体方法

大数据是政府、机构和私人机构支持其分析决策的重要研究领域。大数据是指关于数据的所有内容,包括如何收集、处理和分析数据以生成增值数据驱动的见解和决策。数据质量的下降可能会导致不可预测的后果。在这种情况下,对数据及其来源的信心和价值就会丧失。在大数据环境中,诸如卷,多异构数据源和快速数据生成之类的数据特征增加了质量下降的风险,并需要有效的机制来检查数据的价值。然而,确保大数据质量 (BDQ) 是一个非常昂贵且耗时的过程,因为需要过多的计算资源。在大数据生命周期中保持质量需要在处理决策之前进行质量分析和验证。提出了一个 BDQ 管理框架,用于在加强数据控制的同时加强预处理活动。提议的框架使用称为大数据质量配置文件的新概念。此概念包含质量大纲、要求、属性、维度、分数和规则。使用框架的大数据分析和采样组件,可以在中间预处理阶段之前和之后启动更快、更有效的数据质量评估。框架的探索性分析组件在质量分析中起初步作用;它使用一组预定义的质量指标来评估重要的数据质量维度。它通过应用各种预处理活动及其相关功能来生成质量规则。这些规则主要针对数据质量配置文件,并产生所选质量属性的质量分数。已经讨论了跨各种质量管理流程的框架实施和数据流管理,进一步开展了一些框架评估和部署工作,以支持质量评估决策。

更新日期:2021-05-30
down
wechat
bug