当前位置: X-MOL 学术Journal of Data and Information Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Tailor-made Data Quality Approach for Higher Educational Data
Journal of Data and Information Science ( IF 1.5 ) Pub Date : 2020-07-09 , DOI: 10.2478/jdis-2020-0029
Cinzia Daraio 1 , Renato Bruni 1 , Giuseppe Catalano 1 , Alessandro Daraio 1 , Giorgio Matteucci 1 , Monica Scannapieco 2 , Daniel Wagner-Schuster 3 , Benedetto Lepori 4
Affiliation  

Abstract Purpose This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register (ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain. Design/methodology/approach The proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data. Findings We show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information. Research limitations The coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated. Practical implications The consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives. Originality/value The data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.

中文翻译:

量身定制的高等教育数据质量方法

摘要目的本文涉及诸如高等教育机构之类的知识型组织的数据质量程序的定义。主要目的是提供一种灵活的方法来监控欧洲高等教育注册(ETER)数据库的数据质量,说明其功能并突出该领域仍需面对的主要挑战。设计/方法/方法提议的数据质量方法基于两种检查,一种是评估横截面数据的一致性,另一种是评估多年期数据的稳定性。该方法论具有操作和经验取向。这意味着建议的检查不采用任何理论分布来确定确定潜在异常值的阈值参数,不一致和数据错误。结果我们发现,提出的横截面检查和多年检查有助于识别异常值,极端观察结果以及检测在可用元数据中未描述的本体不一致。因此,它们可能是整合可用信息处理的有用补充。研究局限性研究范围仅限于欧洲高等教育机构。横截面检查和多年检查尚未完全集成。实际意义考虑到可用数据和信息的质量,对于增强对数据质量意识的实证研究,突出问题以及需要进行投资以改善未来数据收集计划中数据的覆盖范围和互操作性的领域至关重要。
更新日期:2020-07-09
down
wechat
bug