当前位置: X-MOL 学术Records Management Journal › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Algorithmic methods to explore the automation of the appraisal of structured and unstructured digital data
Records Management Journal ( IF 0.8 ) Pub Date : 2020-07-03 , DOI: 10.1108/rmj-09-2019-0049
Basma Makhlouf Shabou , Julien Tièche , Julien Knafou , Arnaud Gaudinat

This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'Etat de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process.,Based on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert’s feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution.,The main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert’s consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro analysis, microanalysis, statistics, retrieval, administration and, finally, the decision modeling and machine learning. The relevance of metrics and functionalities is based on the theoretical validity and computational character of their method. These results are largely satisfactory and promising.,This study offers a valuable aid to improve the validity and performance of archival appraisal processes and decision-making. Transferability and applicability of these archival and data mining metrics could be considered for other types of data. An adaptation of this method and its metrics could be tested on research data, medical data or banking data.

中文翻译:

探索结构化和非结构化数字数据评估自动化的算法方法

本文旨在描述由瑞士纳沙泰尔州国家档案馆(纳沙泰尔州纳沙泰尔国家档案局,OAEN)在日内瓦工商管理学院HES-SO进行的跨学科创新研究。要解决的问题是最经典的问题之一:如何以大量多样化和复杂的数据记录格式和内容提取和区分相关数据。这项研究的目的是为软件提供一个框架和概念证明,以帮助就保留和处置提交给OAEN的记录和数据做出合理的决定。为此,作者设计了两个轴:归档轴,以提出用于评估结构化和非结构化数据的归档指标,并以数据挖掘轴为基础,提出算法方法作为评估过程的补充或/和附加指标。该探索性研究基于两个轴,设计和测试与数据挖掘指标配对使用的归档指标的可行性,以提高评估效率。尽可能以系统甚至自动的方式进行数字评估。在Axis 1下,作者启动了三个步骤:首先,设计一个概念框架,以详细的三维方法(可信度,可利用性,代表性)记录数据评估。此外,作者定义了主要原理并提出了指导概念层面的操作。第二,操作性建议的指标以变量表示,并由量化方法支持对其进行度量和评分。第三,作者与经验丰富的专业人员共享了这个概念框架,提出了维度和可操作变量(度量),以对其进行验证。专家的反馈最终使作者有了一个想法:这些指标的相关性和可行性。这两个方面可以证明这种方法在现实档案实践中的可接受性。同时,Axis 2提出的功能不仅要涵盖对数据的宏分析,还应涵盖算法方法以实现数字档案和数据挖掘指标的计算。在此基础上,提出了三个使用案例,以想象应用该解决方案的合理和说明性场景。主要结果证明了使用可重现方法测量数据和记录的价值的可行性。更具体地说,对于Axis 1,作者以灵活和模块化的方式应用了指标。作者还定义了启用计算评分方法所需的主要原理。通过专家咨询获得的有关42种指标相关性的结果表明,接受率超过80%。此外,结果表明所有指标的60%可以自动化。关于第2轴,已开发并提出了6种主要类型的33种功能:宏观分析,微观分析,统计,检索,管理以及最终的决策建模和机器学习。度量和功能的相关性基于其方法的理论有效性和计算特性。这些结果在很大程度上是令人满意的和有希望的。本研究为提高档案评估过程和决策的有效性和绩效提供了宝贵的帮助。对于其他类型的数据,可以考虑这些存档和数据挖掘指标的可传递性和适用性。可以在研究数据,医学数据或银行数据上测试此方法及其指标的改编。
更新日期:2020-07-03
down
wechat
bug