Elsevier

Decision Support Systems

Volume 150, November 2021, 113664
Decision Support Systems

Interpretable data science for decision making

https://doi.org/10.1016/j.dss.2021.113664Get rights and content

Abstract

This paper describes the foundations of interpretable data science for decision making and serves as an editorial to the corresponding special issue. Interpretable data science analyzes data that summarizes domain relationships to produce knowledge that is readily understandable by human decision makers. To this end, we contextualize the current role of interpretable data science for improved business decision making and introduce the notion of an interpretable decision support system (iDSS). We discuss five underlying characteristics of iDSS, i.e., performance, scalability, comprehensibility, justifiability and actionability. This paper further zooms in on pertinent data science decisions in the input, processing and output stage when designing iDSS. For each of the contributing papers in this special issue, we describe their major contributions to the field of interpretable data science for decision making.

Introduction

Recent advances in information technology and increased focus on digitalization due to the COVID-19 pandemic has further boosted the attractiveness amongst organizations to collect, store and analyze various types of (big) company data. Data science is on the rise and is becoming a game changer for many organizations. A recent global survey [1] by McKinsey & Company confirms these trends:

  • 1)

    organizations attribute 20% or more of their organizations' earnings before interest and taxes (EBIT) to data science;

  • 2)

    50% of the respondents claim to have adopted data science in at least one business function; and,

  • 3)

    the survey highlights that respondents working in high performing organizations with respect to data science are 2.3 times more likely than others to consider their C-suite leaders very effective.

The main goal of data science is to support and improve decision-making processes. Organizations' data science roadmaps typically contain applications falling under descriptive (“what has happened?”), diagnostic (“why things are happening?”), predictive (“what will happen?”) and prescriptive (“What should be done to make things happen?”) analytics. Many businesses rely on advanced statistical and machine learning algorithms to support operational decision making across various business domains and processes, including credit risk [2], customer relationship management [3], human resource management [4], finance [5], fraud detection [6], inventory management [7], fleet management [8], amongst others. However, investments in improving data science capabilities are not always reflected in additional revenues or decreased costs. Decision makers might be reluctant to rely on statistical or machine learning models if it is not immediately clear how their decision outcomes are obtained. Decision makers are often skeptical towards data science outcomes and tend to contrast the data science-steered decisions with their own business logic and intuitions. Therefore, giving insight into the underlying model drivers are a must in helping to personalize the decision-making strategies. Companies are collecting a wide variety of information resulting in both high dimensional data in terms of the number of observations and variables and a combination of structured and unstructured data like textual, audio, or image data.

Current streams in the data science literature mainly focus on investigating the beneficial impact of data preprocessing methods [9], new data sources like text or audio [10], sophisticated and scalable algorithmic developments [11] or novel statistical evaluation metrics [12]. Although these techniques are highly relevant in the back end of the data science pipeline, we see a practical need and challenging opportunities for more research in bringing the outcomes of the data science pipeline closer to the needs of business decision makers. Indeed, the prevalent focus on the data and technology has resulted in a strong emphasis on the data science practice itself, while neglecting the interpretability and usefulness of the resulting outcomes to the business users. Interpretable data science analyzes data that summarizes domain relationships to produce knowledge that is readily understandable by human decision makers. Yet, it is necessary for the following reasons. First, adopting new information technologies is risky and might lead to failure. In order to maximize the chances of success, organizations have to overcome the resistance to change amongst their employees and gain trust [13]. Especially in less mature analytics organizations, interpretable data science offers an insightful solution where business decision makers can understand how decisions are formulated and how these data science understandings correspond with existing business knowledge to mitigate the risk of adoption. Second, several analytical applications, for instance credit scoring, are imposed by law to run analytical solutions that are interpretable. For instance, credit risk professionals must look into interpretable credit scoring models that provide insight into the probability of default (PD), loss given default (LGD) or exposure at default (EAD) in a Basel or IFRS 9 context. The European Commission introduced the General Data Protection Regulation (GDPR) on May 25th 2018 including the right to explanation. The latter means that any individual has the right to an explanation for any decision made by an algorithm. All this shapes up legal reasons why organizations need to better incorporate and deploy interpretable data science algorithms. Third, organizations realize that managing big data analytical solutions is not merely a technological and pure black-box methodological issue. Business users need transparent, reliable, and interpretable insights offered by data science tools to steer their decisions and reduce the associated risk. Extant research [e.g. [14,15]] found that data science methodologies should be designed to seamlessly integrate with business users' capabilities and knowledge in a certain domain. Therefore, interpretable data science solutions are needed for reducing risks and making sound (business) decisions. Therefore, optimizing the interaction between human and data science approaches is pressing goal. Fourth, data science is considered as one of the main drivers of innovation in a variety of business contexts. Literature to date confirms that data science solutions are efficient and effective over human decision making, but often are prone to algorithmic biases. For instance, [16] show in the context of crowd lending that machines are more efficient than crowd investors, while the investment decisions made by algorithms improve upon investors' decisions. However, they find suggestive evidence that the machine decisions are biased in gender and race. Another example is given by [17] who discuss the algorithmic failure of Australia's Centrelink RoboDebt scheme. Welfare payments were made using the self-reported fortnightly income and were cross-referenced against a predicted fortnightly income taken as a simple average of the annual earnings reported to the Australian Tax Office. The latter was used in a DSS to auto-generate debt notices without any further human intervention or explanation. This DSS has proven to be highly discriminative against casualized, often lower paid, workforce. If a construction worker was unable to find work for the first eight months of the financial year but earned 16,000 AUD in the last four months, this would have had an automated debt raised against this person. The scheme was a disaster for the Australian government with more than 470,000 debts wrongfully raised. Therefore, to identify and resolve these algorithmic biases, organizations must have full insight into the functioning and argumentation power of the data science solution.

This introduction to the special issue entitled “Interpretable Data Science for Decision Making” contributes to the information systems (IS) literature stream in the following ways. First, we define an interpretable decision support system (iDSS) and contextualizes the concept of interpretable data science for decision making. Second, we introduce and summarize the various papers that have been accepted for publication. Finally, we conclude with some directions for further research in the field of interpretable data science for improved decision making.

Section snippets

iDSS: interpretable decision support system

An iDSS is a computerized and interpretable solution used to support decisions in an organization with the goal to generate direct or indirect value. Within the context of this special issue we define the core of the iDSS as an interpretable data science solution. We further zoom in onto the iDSS as the marriage between interpretable data science and improved decision making. Fig. 1 visualizes the five core dimensions of an iDSS and these are discussed in detail here below.

  • Performance

An iDSS is

Overview of contributions to special issue

This section summarizes the ten papers of the special issue and gives insight into how they contribute to the intersection of interpretable data science and improved decision making.

Baesens et al.'s paper concludes that financial institutions increasingly rely upon data-driven methods for developing fraud detection systems and that interpretability is of utmost importance for the management to have confidence in the model and for designing fraud prevention strategies. Moreover, models that

Final notes & areas for further research

The different papers in this special issue show that interpretable data science is a blooming area of research. The focus of research to date lies on the correct tradeoff between the various dimensions of the iDSS defined in Fig. 1. While the prevailing understanding has long been that to increase performance, one must restrict the other dimensions, this special issue shows that this is not necessarily longer the case. The amount of performance that must be traded for an increase in the one or

Acknowledgements

We thank the authors of the papers collected in this special issue and the editor-in-chief Prof. James Marsden for his leadership in supporting this special issue and his comments on this editorial.

References (43)

  • D. Martens et al.

    Predicting going concern opinion with data mining

    Decis. Support. Syst.

    (2008)
  • K. Coussement et al.

    A Bayesian approach for incorporating expert opinions into decision support systems: a case study of online consumer-satisfaction detection

    Decis. Support. Syst.

    (2015)
  • X. Zhang et al.

    HOBA: a novel feature engineering methodology for credit card fraud detection with a deep learning architecture

    Inf. Sci.

    (2021)
  • Y. Lucas et al.

    Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

    Futur. Gener. Comput. Syst.

    (2020)
  • D. Oosterlinck et al.

    From one-class to two-class classification by incorporating expert knowledge: novelty detection in human behaviour

    Eur. J. Oper. Res.

    (2020)
  • K. Coussement et al.

    Improved marketing decision making in a customer churn prediction context using generalized additive models

    Expert Syst. Appl.

    (2010)
  • S. Lessmann et al.

    Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research

    Eur. J. Oper. Res.

    (2015)
  • K. Coussement et al.

    Improving direct mail targeting through customer response modeling

    Expert Syst. Appl.

    (2015)
  • K. Coussement et al.

    Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers

    Expert Syst. Appl.

    (2009)
  • T. Balakrishnan et al.

    The State of AI in 2020

  • D. Pessach et al.

    Employees recruitment: a prescriptive analytics approach via machine learning and mathematical programming

    Decis. Support. Syst.

    (2020)
  • Cited by (28)

    View all citing articles on Scopus
    View full text