Data governance: Organizing data for trustworthy Artificial Intelligence
Introduction
Organizations in general, and public sector organizations in particular, increasingly collect and use Big and Open Linked Data (BOLD) (Janssen, Matheus, & Zuiderwijk, 2015). The rise of BOLD, combined with machine learning and other forms of Artificial Intelligence (AI) results in the increasing use of Big Data Algorithmic Systems (BDAS). Such systems are used to make decisions about: access to affordable loans amid the shortage of credit files; matching of skills and jobs to promote access to employment; implementing admission to schools while helping individuals choose the right school; and mitigating risks of disparities in the treatment of individuals by law enforcement while helping build trust between the public and law enforcement (Executive Office of the President, 2016).
The use of BDAS for improving and opening government is met with a lot of enthusiasm. However, BDAS rely heavily on the use of data combined from various sources, some controlled by the organization itself, others controlled by partner organizations, yet others controlled by unknown entities. Without control over such data to ensure quality and compliance, BDAS would be too risky to be entrusted with consequential decisions. Therefore, many organizations are turning to data governance as a means to exercise control over the quality of their data and over compliance with relevant legal and ethical requirements in order to guarantee the delivery of trustworthy decisions. The concept of trustworthiness, which can be directly controlled or indirectly influenced (Yang & Anguelov, 2013), refers to properties through which a trusted entity is serving the interests of the trustor (Levi & Stoker, 2000). In the situation under study, the trustor (an organization) entrusts its system (BDAS, which itself uses BOLD and AI) in making sound decisions.
Data governance is about allocating authority and control over data (Brackett & Earley, 2009) and the exercise of such authority through decision-making in data-related matters (Plotkin, 2013). To fulfil its goals, data governance should focus not just on data, but on the systems through which data is collected, managed and used. Specifically, people are essential in these systems (Benfeldt, Persson, & Madsen, 2020); thus data governance should provide incentives and sanctions to stimulate desirable behaviour of the persons involved in collecting, managing and using data. Beyond a single organization, data governance depends on collaboration between organizations and persons that make up the system. This multi-organizational context requires trusted frameworks to ensure reliable data-sharing between all organizations involved, that the right data is securely and reliably shared between participating organizations, while complying with General Data Protecting Regulation (GDPR) (European Parliament and European Council, 2016) and other relevant laws and regulations.
Consistent with this context, we define data governance as:
Organizations and their personnel defining, applying and monitoring the patterns of rules and authorities for directing the proper functioning of, and ensuring the accountability for, the entire life-cycle of data and algorithms within and across organizations.
This definition takes into account both data and data processing by AI and other algorithms, considers that both data and algorithms change during their respective life-cycles, accounts for the personnel responsible for creating and use of data and algorithms, and adopts a systems (multi-organizational) view.
Data governance is a success factor for BDAS (Brous, Janssen, & Krans, 2020) and has an overall positive effect on the performance of organizations that apply BDAS (Zhang, Zhao, & Kumar, 2016). The purpose is to increase the value of data and minimize data-related costs and risks (Abraham, Schneider, & vom Brocke, J., 2019). Given the consequential and repetitive nature of the BDAS decision-making, mistakes in data governance that affect the working of such systems can have profound legal, financial and social implications on the organizations involved, citizens and businesses, and society at large. Such mistakes can result in systemic bias, unlawful decisions, large financial exposures, political crises, lives lost or any combination thereof. In the interconnected world, where data is collected by (and about) governments, businesses and citizens, and is processed by different entities using various algorithms, dependencies grow, mistakes accumulate, and accountability is gradually lost in the process.
The rationale outlined above directly leads to the goal of this article. The goal is threefold. First, to define and conceptualize data governance for AI-based BDAS. Second, to review the challenges and approaches to such governance. Third, to propose the concept of trusted AI-based BDAS and a framework for data governance for such systems.
The rest of the article is structured as follows. Section 2 introduces the concept of data governance, followed by data governance for AI-based BDAS. Different forms of data governance for AI-based BDAS are outlined in Section 3. Section 4 formulates the main proposal: trusted AI-based BDAS and a data governance framework for such systems. The proposal consists of: system-level governance model of BDAS in Section 4.1, data stewardship and base registries as the foundation for data governance in Section 4.2, and the trusted framework and self-sovereign identities for data sharing in Section 4.3. Finally, essential data governance principles are outlined in Section 5.
Section snippets
Data governance
Data governance has been given scant attention and is often overlooked by organizations in their efforts to realize BDAS and create Fair, Accountable and Transparent (FAT) algorithms. Often the focus is on experimenting with AI, but acquiring and preparing data for AI, which often consumes most of the time, is given less consideration. However, the ubiquitous nature of data, when using large volumes and varieties of data from multiple sources, the uncertain impact of data flows on data quality,
Data governance approaches
A common challenge with data governance is that the data flow and logic may not follow the structure of an organization. The mismatch between organizational structure and data usage can easily result in data silos, duplications, unclear responsibilities, and missing control of data over its entire life-cycle. This is particularly the case for BDAS, which are typically crossing departmental boundaries, not bound to any single function or process, and have to deal with data originating in
Data governance for trusted BDAS
This section aims to formulate the main proposal of this article: the concept of trusted AI-based BDAS and a framework for data governance for such systems. The proposal consists of three elements: system-level governance model for BDAS (Section 4.1), data stewardship and based registries (Section 4.2), and the trusted data-sharing framework based on self-sovereign identities and data-sharing agreements (Section 4.3).
Essential data governance principles
Although the foundation of trustworthy BDAS is sound data governance, this area is often overlooked. Data governance for BDAS is a complex field, and the development of BDAS without due attention to data governance is a significant risk. Data governance can be viewed as organizations and their personnel defining, applying and monitoring the patterns of rules and authorities for directing the proper functioning of, and ensuring the accountability for, the entire life-cycle of data and algorithms
Marijn Janssen is a full Professor in ICT & Governance and head of the Information and Communication Technology (ICT) research group of the Technology, Policy and Management (TPM) Faculty of Delft University of Technology.
References (29)
- et al.
Data governance: A conceptual framework, structured review, and research agenda
International Journal of Information Management
(2019) Stewardship and usefulness: Policy principles for information-based transparency
Government Information Quarterly
(2010)- et al.
The challenges and limits of big data algorithms in technocratic governance
Government Information Quarterly
(2016) - et al.
Adaptive governance: Towards a stable, accountable and responsive government
Government Information Quarterly
(2016) - et al.
Blockchain in government: Benefits and implications of distributed ledger technology for information sharing
Government Information Quarterly
(2017) Big data: A report on algorithmic systems, opportunity, and civil rights
(2016)- et al.
Data governance as a collective action problem
Information Systems Frontiers
(2020) - et al.
Ethical and socially-aware data labels. Paper presented at the Annual International Symposium on Information Management and Big Data.
(2018) - et al.
The DAMA guide to the data management body of knowledge (DAMA-DMBOK guide)
(2009) - et al.
Data governance as success factor for data science. Responsible Design, Implementation and Use of Information and Communication Technology: 19th IFIP WG 6.11 Conference on e-Business, e-Services, and e-Society, I3E 2020, Skukuza, South Africa, April 6–8, 2020, Proceedings, Part I, 12066
(2020)
Managing information sharing and stewardship for public-sector collaboration: A management control approach
Public Management Review
Data glitches: Monsters in your data
COBIT 5 and enterprise governance of information technology: Building blocks and research opportunities
Journal of Information Systems
A first look at identity management schemes on the blockchain
IEEE Security & Privacy
Cited by (0)
Marijn Janssen is a full Professor in ICT & Governance and head of the Information and Communication Technology (ICT) research group of the Technology, Policy and Management (TPM) Faculty of Delft University of Technology.
Paul Brous is researcher at the Information and Communication Technology (ICT) research group of the Technology, Policy and Management (TPM) Faculty of Delft University of Technology.
Elsa Estevez is the Chair holder of the UNESCO Chair on Knowledge Societies and Digital Governance at Universidad Nacional del Sur, Independent Researcher at the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), and full professor at Universidad Nacional de La Plata, all in Argentina.
Luís Soares Barbosa is the deputy head of UNU-EGOV and full professor at the Department of Informatics at the University of Minho.
Tomasz Janowski is head of the Department of Informatics in Management at the Faculty of Economics and Management, Gdańsk University of Technology, Poland and invited professor at the Department for E-Governance and Administration, Faculty of Business and Globalization, Danube University Krems, Austria.