Interpretable data science for decision making
Introduction
Recent advances in information technology and increased focus on digitalization due to the COVID-19 pandemic has further boosted the attractiveness amongst organizations to collect, store and analyze various types of (big) company data. Data science is on the rise and is becoming a game changer for many organizations. A recent global survey [1] by McKinsey & Company confirms these trends:
- 1)
organizations attribute 20% or more of their organizations' earnings before interest and taxes (EBIT) to data science;
- 2)
50% of the respondents claim to have adopted data science in at least one business function; and,
- 3)
the survey highlights that respondents working in high performing organizations with respect to data science are 2.3 times more likely than others to consider their C-suite leaders very effective.
The main goal of data science is to support and improve decision-making processes. Organizations' data science roadmaps typically contain applications falling under descriptive (“what has happened?”), diagnostic (“why things are happening?”), predictive (“what will happen?”) and prescriptive (“What should be done to make things happen?”) analytics. Many businesses rely on advanced statistical and machine learning algorithms to support operational decision making across various business domains and processes, including credit risk [2], customer relationship management [3], human resource management [4], finance [5], fraud detection [6], inventory management [7], fleet management [8], amongst others. However, investments in improving data science capabilities are not always reflected in additional revenues or decreased costs. Decision makers might be reluctant to rely on statistical or machine learning models if it is not immediately clear how their decision outcomes are obtained. Decision makers are often skeptical towards data science outcomes and tend to contrast the data science-steered decisions with their own business logic and intuitions. Therefore, giving insight into the underlying model drivers are a must in helping to personalize the decision-making strategies. Companies are collecting a wide variety of information resulting in both high dimensional data in terms of the number of observations and variables and a combination of structured and unstructured data like textual, audio, or image data.
Current streams in the data science literature mainly focus on investigating the beneficial impact of data preprocessing methods [9], new data sources like text or audio [10], sophisticated and scalable algorithmic developments [11] or novel statistical evaluation metrics [12]. Although these techniques are highly relevant in the back end of the data science pipeline, we see a practical need and challenging opportunities for more research in bringing the outcomes of the data science pipeline closer to the needs of business decision makers. Indeed, the prevalent focus on the data and technology has resulted in a strong emphasis on the data science practice itself, while neglecting the interpretability and usefulness of the resulting outcomes to the business users. Interpretable data science analyzes data that summarizes domain relationships to produce knowledge that is readily understandable by human decision makers. Yet, it is necessary for the following reasons. First, adopting new information technologies is risky and might lead to failure. In order to maximize the chances of success, organizations have to overcome the resistance to change amongst their employees and gain trust [13]. Especially in less mature analytics organizations, interpretable data science offers an insightful solution where business decision makers can understand how decisions are formulated and how these data science understandings correspond with existing business knowledge to mitigate the risk of adoption. Second, several analytical applications, for instance credit scoring, are imposed by law to run analytical solutions that are interpretable. For instance, credit risk professionals must look into interpretable credit scoring models that provide insight into the probability of default (PD), loss given default (LGD) or exposure at default (EAD) in a Basel or IFRS 9 context. The European Commission introduced the General Data Protection Regulation (GDPR) on May 25th 2018 including the right to explanation. The latter means that any individual has the right to an explanation for any decision made by an algorithm. All this shapes up legal reasons why organizations need to better incorporate and deploy interpretable data science algorithms. Third, organizations realize that managing big data analytical solutions is not merely a technological and pure black-box methodological issue. Business users need transparent, reliable, and interpretable insights offered by data science tools to steer their decisions and reduce the associated risk. Extant research [e.g. [14,15]] found that data science methodologies should be designed to seamlessly integrate with business users' capabilities and knowledge in a certain domain. Therefore, interpretable data science solutions are needed for reducing risks and making sound (business) decisions. Therefore, optimizing the interaction between human and data science approaches is pressing goal. Fourth, data science is considered as one of the main drivers of innovation in a variety of business contexts. Literature to date confirms that data science solutions are efficient and effective over human decision making, but often are prone to algorithmic biases. For instance, [16] show in the context of crowd lending that machines are more efficient than crowd investors, while the investment decisions made by algorithms improve upon investors' decisions. However, they find suggestive evidence that the machine decisions are biased in gender and race. Another example is given by [17] who discuss the algorithmic failure of Australia's Centrelink RoboDebt scheme. Welfare payments were made using the self-reported fortnightly income and were cross-referenced against a predicted fortnightly income taken as a simple average of the annual earnings reported to the Australian Tax Office. The latter was used in a DSS to auto-generate debt notices without any further human intervention or explanation. This DSS has proven to be highly discriminative against casualized, often lower paid, workforce. If a construction worker was unable to find work for the first eight months of the financial year but earned 16,000 AUD in the last four months, this would have had an automated debt raised against this person. The scheme was a disaster for the Australian government with more than 470,000 debts wrongfully raised. Therefore, to identify and resolve these algorithmic biases, organizations must have full insight into the functioning and argumentation power of the data science solution.
This introduction to the special issue entitled “Interpretable Data Science for Decision Making” contributes to the information systems (IS) literature stream in the following ways. First, we define an interpretable decision support system (iDSS) and contextualizes the concept of interpretable data science for decision making. Second, we introduce and summarize the various papers that have been accepted for publication. Finally, we conclude with some directions for further research in the field of interpretable data science for improved decision making.
Section snippets
iDSS: interpretable decision support system
An iDSS is a computerized and interpretable solution used to support decisions in an organization with the goal to generate direct or indirect value. Within the context of this special issue we define the core of the iDSS as an interpretable data science solution. We further zoom in onto the iDSS as the marriage between interpretable data science and improved decision making. Fig. 1 visualizes the five core dimensions of an iDSS and these are discussed in detail here below.
- •
Performance
An iDSS is
Overview of contributions to special issue
This section summarizes the ten papers of the special issue and gives insight into how they contribute to the intersection of interpretable data science and improved decision making.
Baesens et al.'s paper concludes that financial institutions increasingly rely upon data-driven methods for developing fraud detection systems and that interpretability is of utmost importance for the management to have confidence in the model and for designing fraud prevention strategies. Moreover, models that
Final notes & areas for further research
The different papers in this special issue show that interpretable data science is a blooming area of research. The focus of research to date lies on the correct tradeoff between the various dimensions of the iDSS defined in Fig. 1. While the prevailing understanding has long been that to increase performance, one must restrict the other dimensions, this special issue shows that this is not necessarily longer the case. The amount of performance that must be traded for an increase in the one or
Acknowledgements
We thank the authors of the papers collected in this special issue and the editor-in-chief Prof. James Marsden for his leadership in supporting this special issue and his comments on this editorial.
References (43)
- et al.
Two-stage consumer credit risk modelling using heterogeneous ensemble learning
Decis. Support. Syst.
(2019) - et al.
Leveraging fine-grained transaction data for customer life event predictions
Decis. Support. Syst.
(2020) - et al.
An approach to operational aircraft maintenance planning
Decis. Support. Syst.
(2010) - et al.
A comparative analysis of data preparation algorithms for customer churn prediction: a case study in the telecommunication industry
Decis. Support. Syst.
(2017) - et al.
Cost-sensitive Business Failure Prediction When Misclassification Costs Are Uncertain: A Heterogeneous Ensemble Selection Approach
Eur. J. Oper. Res.
(2020) - et al.
Artificial intelligence for decision making in the era of Big Data – evolution, challenges and research agenda
Int. J. Inf. Manag.
(2019) - et al.
Algorithmic bias in data-driven innovation in the age of AI
Int. J. Inf. Manag.
(2021) - et al.
New insights into churn prediction in the telecommunication sector: a profit driven data mining approach
Eur. J. Oper. Res.
(2012) - et al.
A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees
Eur. J. Oper. Res.
(2018) - et al.
Performance of classification models from a user perspective
Decis. Support. Syst.
(2011)
Predicting going concern opinion with data mining
Decis. Support. Syst.
A Bayesian approach for incorporating expert opinions into decision support systems: a case study of online consumer-satisfaction detection
Decis. Support. Syst.
HOBA: a novel feature engineering methodology for credit card fraud detection with a deep learning architecture
Inf. Sci.
Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs
Futur. Gener. Comput. Syst.
From one-class to two-class classification by incorporating expert knowledge: novelty detection in human behaviour
Eur. J. Oper. Res.
Improved marketing decision making in a customer churn prediction context using generalized additive models
Expert Syst. Appl.
Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research
Eur. J. Oper. Res.
Improving direct mail targeting through customer response modeling
Expert Syst. Appl.
Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers
Expert Syst. Appl.
The State of AI in 2020
Employees recruitment: a prescriptive analytics approach via machine learning and mathematical programming
Decis. Support. Syst.
Cited by (28)
Explainable artificial intelligence and agile decision-making in supply chain cyber resilience
2024, Decision Support SystemsAn interpretable automated feature engineering framework for improving logistic regression
2024, Applied Soft ComputingLocally interpretable tree boosting: An application to house price prediction
2024, Decision Support SystemsA data-driven decision support system for sustainable supplier evaluation in the Industry 5.0 era: A case study for medical equipment manufacturing
2023, Advanced Engineering Informatics