Attentive statement fraud detection: Distinguishing multimodal financial data with fine-grained attention

https://doi.org/10.1016/j.dss.2022.113913Get rights and content

Highlights

  • Financial statement fraud jeopardizes the reliability and quality of the financial reporting process.

  • Leveraging multimodal information for financial statement fraud detection has recently become of great interest.

  • We propose a novel attention-based multimodal deep learning method.

  • Experimental results show that RCMA outperformed the state-of-the-art benchmarks.

Abstract

Financial statement fraud caused by listed companies directly jeopardizes the reliability the financial reporting process. Leveraging multimodal information for financial statement fraud detection (FSFD) has recently become of great interest to academic research and industrial applications. Unfortunately, the predictive ability of multimodal information in FSFD remains largely underexplored, particularly the fusion ambiguity embedded in and among multi-modalities. In this study, we propose a novel attention-based multimodal deep learning method, named RCMA, toward an accurate FSFD. RCMA synthesizes a fine-grained attention mechanism including three innovative attention modules, i.e., ratio-aware attention, chapter-aware attention, and modality-aware attention mechanism. The first two attention mechanisms help to liberate the extraordinary predictive power of the financial modality and the textual modality on FSFD, respectively. Moreover, the proposed modality-aware attention mechanism enables better coordination between the two modalities. Furthermore, to ensure effective learning on the attention-based multimodal embedding, we design a novel loss function named Focal and Consistency Loss, or FCL. It considers class-imbalance and modality-consistency simultaneously, to specialize the optimization of FSFD. The experimental results on the real-world dataset show that the proposed RCMA on FSFD task outperformed the state-of-the-art benchmarks. Furthermore, interpretation analysis visualizes the attention weights of different ratio groups, chapters, and modalities from RCMA, and illustrates how these interpretations influence stakeholders' decision process for FSFD.

Introduction

Financial statement fraud aims to deceive the statement users (e.g., regulators and investors) with conscious misstatements, such as manipulating financial reports and disclosing incorrect financial information. Fraud triangle theory, proposed by Turner et al. [1], explained the reasons for the financial statement fraud from three perspectives: pressure, attitude, and opportunity. Companies may conduct fraud due to the pressure from poor cash position, a loss of customers, and so on [2]. They generally hold the attitude that such activities are not criminal, and competitors are participating in them, using this logic to rationalize fraudulent activities. Furthermore, weak internal controls, imperfect corporate governance and low consequence for violations, may provide companies with the opportunity to conduct fraud. Fraudulent behavior could cause adverse effects including: 1) jeopardizing the reliability and quality of the financial reporting process; 2) diminishing public confidence in the accounting and audit profession; and 3) compromising the efficiency of the capital market [2]. Under these circumstances, both the academics and industry, focus on financial statement fraud detection (FSFD) to provide early warning for relative stakeholders, and thus contributing to their decision processes. Specifically, for investors, FSFD could help them make informed decisions to avoid huge investment losses [3]. For auditors, they can benefit from the FSFD by paying more attention to companies with high fraudulent probability. For government regulators, accurate FSFD enables them to judge whether to conduct regulatory intervention in the market.

As FSFD has been dominated by quantitative financial data, recent studies have attempted to leverage the joint effect of multimodal relevant data to enhance FSFD [4]. Specifically, the financial modality, for example, financial ratios extracted from financial statement, with their periodicity, intelligibility, and domain-specificity, are one of the most effective and convenient ways to learn about a company's financial position [5]. Meanwhile, textual modality embedded in financial reports, has been also explored for FSFD [6]. Therefore, coordinating multimodal information to enhance FSFD is inevitable. Various machine learning methods have been developed for FSFD, based on multimodal financial data [7,8]. However, machine learning methods, in spite of their considerable interpretability, suffer significant deficiencies in learning powerful representations from unstructured multimodal data. Instead, deep learning methods portray strong advantages in learning from multimodal data. Multimodal deep learning methods facilitate the prediction performance by synergizing multiple data modalities in the end-to-end training process.

As indicated in prior studies [9,10], the success of multimodal deep learning lies in the reconciliation of multiple deep-learning-based feature representation. This motivates the exploration on the complementary predictive effects of multi-modalities for FSFD. Nevertheless, the fusion ambiguity embedded in and among multi-modalities is often underexplored, when multimodal deep learning methods are applied to FSFD. Specifically, for the financial modality,1 the current-year financial data along with the previous year's and industry-level financial data, contributes to FSFD [3]. These various types of financial data have been implemented in FSFD to exert their complementary predictive roles. However, since different ratio groups naturally have different discriminative power in terms of FSFD, their direct combination may result in fusion ambiguity, and subsequently hinder the prediction performance. As for the textual modality, researchers constructed it commonly based on the Management Discussion and Analysis (MD&A) of financial reports. The MD&A has been identified as the factor most likely to embed fraudulent information, because it contains self-furnished narrative contents [11]. In this study, we posit that the chapters like MD&A and other factors, contribute to the detection of financial fraud, and help in revealing the deceptive behaviours in accountancy. For example, the change of important leaders should be a matter of great importance to the company; therefore, manipulators tend to disguise this fact in the chapter Corporate Governance, to remedy the reporter's reliability. Furthermore, the textual information conveyed in the different chapters, contributes differently to FSFD. This is because chapters are set to accommodate narratives of distinct aspects within the company. This raises the requirement of making a distinct use of the chapters, while delivering them to FSFD. From the perspective of multimodality, existing studies on FSFD have fused multimodal data to explore their complementary predictive effects [12,13]. Unfortunately, the direct summation of modalities may inevitably ignore the fusion ambiguities among multiple modalities, hence weakening the predictive ability of relatively informative modality in FSFD.

In this regard, the attention mechanism, which is proposed to generate weight schemes for the input elements in deep learning, shows its potential for exploring modality-wise predictive abilities and coordinating multimodal useful information for FSFD. Existing FSFD studies have applied attention mechanisms on the textual modality to explore multi-level (e.g., word-level, sentence level, and chapter-level) fusion ambiguity, liberating its predictive power [14]. However, these attention mechanisms focused on the unimodal features, ignoring the distinct predictive power of different modalities. When fusing multimodal features for FSFD, the fusion ambiguity embedded in and among multimodal features, are key factors affecting the final prediction performance. Nevertheless, they have been insufficiently investigated by prior studies. Another consideration in FSFD is the class imbalance problem. Previous studies which constructed one-to-one matching samples for FSFD, ignored that the real number of fraudulent companies is less than that of healthy companies in the capital market [15]. Therefore, the imbalance problem becomes another issue to be solved during the building of an accurate FSFD model.

Hence, the aim of this study is to leverage multimodal information for FSFD, using a novel method, RCMA, which synthesizes three fine-grained attention mechanisms (i.e., Ratio-aware, Chapter-aware, and Modality-aware Attention mechanisms) for capitalizing on the financial modality, textual modality, and their combination. In RCMA, both the fusion ambiguity and class imbalance problems are taken into consideration. In the meanwhile, weights are assigned to multi-modalities as a two-step decision process for relative stakeholders. First, we extracted financial modality based on numerical financial statements, and textual modality based on narrative financial reports, to capture the multifaceted factors involved in fraudulent behavior. Specifically, we gathered the financial data from different accounting dimensions. Additionally, we extracted textual modality from all chapters from the financial report, to exhaustively mine the useful information. Second, for the fusion ambiguity embedded in financial modality, we designed a Ratio-aware (RA) attention mechanism based on Multilayer Perceptron (MLP) to discriminate the financial ratios' distinctive predictive abilities to FSFD. To distinguish the fusion ambiguity embedded in the textual modality, we designed a Chapter-aware (CA) attention mechanism based on Long Short-Term Memory (LSTM). This provides a comprehensive view for the prediction model to assign different chapters with specific weights. Third, we set a Modality-aware (MA) attention layer to coordinate modalities' different predictive abilities to realize multimodal fusion. Fourth, to achieve a better multimodal learning effect, we proposed a novel learning objective named Focal and Consistency Loss, or FCL. It considers the class-imbalance issue, and highlights the consistency among modalities to increase the reliability of the prediction. We have focused our attention on addressing the following three research questions.

Research Question 1. Does the fusion of multi-modal data improve the predictive ability of solely financial modality or textual modality to FSFD?

Research Question 2. Relative to existing FSFD methods, how effectively can RCMA detect financial statement fraud from multi-modal data?

Research Question 3. Can RCMA contribute to people's understanding to the FSFD and help their decision support process?

To answer these research questions, we have reorganized the evaluation results in line with them, respectively. First, we answer Research Question 1 and Research Question 2 by disentangling the modality-wise comparisons and the method-wise comparisons separately. Second, we conducted ablative analysis to further illustrate how each of the modules of our proposed method such as RA, CA, and MA attention mechanism influences the overall performance of RCMA. This can be also responsible for Research Question 2, that is, how our approach effectively improves the FSFD performance by consolidating single components for RCMA. Third, we answer our Research Question 3 by the interpretation analysis of attention weights, which intuitively shows the distinct predictive ability of different ratio groups, chapters, and modalities. It also illustrates how these interpretation results contribute to stakeholders' decision support process for FSFD. Note that the evaluations of the proposed RCMA are based on 2070 firm-year observations collected from the China Security Market Accounting Research Database (CSMARD).

We summarize the main contributions of this paper as follows:

(1) We propose a novel multimodal deep leaning framework which considers both the fusion ambiguity embedded in and among multimodal data for FSFD. In the framework, distinctive predictive abilities of different financial ratio groups in the financial modality, different chapters in the textual modality, and different modalities are coordinated to enhance FSFD. This inspires future FSFD studies to fuse multi-modalities into the deep learning framework, considering modality-specific characteristics.

(2) We propose a fine-grained attention-based multimodal deep learning method for FSFD. In RCMA, the designed MLP-based RA and LSTM-based CA attention mechanisms, are to leverage the financial and textual modality, respectively. Meanwhile, the MA attention mechanism is set to realize an adaptive use of multimodal data. Unlike existing methods that failed to sufficiently distinguish the distinct roles of multimodal data, RCMA allows intra- and inter-modality fusion ambiguity to be effectively reconciled.

(3) We empirically evaluated RCMA based on the real-world dataset collected from the CSMARD, and the experimental results verified the RCMA's superior FSFD performance. Furthermore, interpretation analysis visualizes the attention weights of different ratio groups, chapters, and modalities from RCMA, and illustrates how these interpretations influence stakeholders' decision support process for FSFD.

Section snippets

Related work

In this section, we first reviewed previous studies related to multi-modalities in FSFD. Second, we present studies about multimodal deep learning and discuss its connections to FSFD. Third, we introduced prior studies on attention mechanisms and their applications to FSFD.

Framework

Facing the challenge of handling the fusion ambiguity embedded simultaneously in and among multi-modal features in FSFD, we propose an RCMA model. In particular, RCMA focuses on three steps including intra-modal representation, inter-modal fusion, and learning strategy for FSFD. In intra-modal representation, two modalities are extracted to capture multifaceted factors. Specifically, for financial modality, it comprises financial ratios that reflect various aspects of financial statuses. An

Experimental design

In this section, an experiment was conducted to investigate whether the proposed RCMA can achieve a decent prediction performance for FSFD. Specifically, the dataset adopted in the experiment, the metrics used to evaluate the prediction performance, and the experimental procedures, are presented in detail.

Main results

In this section, the mean and standard deviations (Std) of AUC are presented in Table 3 Panel A. Moreover, Type I and type II error under different modalities and methods are shown in Table 3 Panel B.

To further clarify the experimental results, the best values of methods under each modality are presented in boldface. In Table 3 Panel A, the fused multiple modalities Mm show superiority to unimodal features across all the methods. Especially, RCMA under Mm achieves the highest AUC value,

Conclusion and future work

With the rise of deep learning models, in terms of the multimodal FSFD task, the representations of different modalities can be learned automatically and fused in a well-designed deep network for accurate predictions. Nevertheless, the fusion ambiguity embedded in and among multiple modalities is often neglected when applying deep multimodal learning in FSFD, directly weakening the final prediction accuracy. In this situation, this study proposes an RCMA method with ratio-aware, chapter-aware,

CRediT authorship contribution statement

Gang Wang: Conceptualization, Methodology. Jingling Ma: Methodology, Software, Formal analysis. Gang Chen: Methodology, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (72071062), Science Fund for Distinguished Young Scholars of AnHui (2208085J12), Anhui Provincial Key Research and Development Program (202104a05020038), and Fundamental Research Funds for the Central Universities (PA2021KCPY0032).

Gang Wang is a professor at the School of Management, Hefei University of Technology. He received his Ph.D. degree in Management Science and Engineering from the School of Management, Fudan University. His current research interests include business analytics, machine learning, fintech, and industrial big data analytics. His work has been published in such journals as MIS Quarterly, Decision Support Systems, Information Processing and Management, IEEE Transactions on Instrumentation and

References (52)

  • Q. Xu et al.

    Aspect-based sentiment classification with multi-attention network

    Neurocomputing.

    (2020)
  • F. Mai et al.

    Deep learning models for bankruptcy prediction using textual disclosures

    Eur. J. Oper. Res.

    (2019)
  • M. Kraus et al.

    Decision support from financial disclosures with deep neural networks and transfer learning

    Decis. Support. Syst.

    (2017)
  • M. Doumpos et al.

    Corporate failure prediction in the European energy sector: a multicriteria approach and the effect of country characteristics

    Eur. J. Oper. Res.

    (2017)
  • P. du Jardin

    A two-stage classification technique for bankruptcy prediction

    Eur. J. Oper. Res.

    (2016)
  • D. Veganzones et al.

    An investigation of bankruptcy prediction in imbalanced datasets

    Decis. Support. Syst.

    (2018)
  • J. Turner et al.

    An analysis of the fraud triangle, The University of Memphis Working Paper

    (2003)
  • K. Nguyen

    Financial Statement Fraud: Motives, Methods, Cases and Detection

    (2010)
  • Albrecht Abbasi et al.

    MetaFraud: a meta-learning framework for detecting financial fraud

    MIS Q.

    (2012)
  • J.L. Perols et al.

    Finding needles in a haystack: using data analytics to improve fraud prediction

    Account. Rev.

    (2017)
  • W. Dong et al.

    Leveraging financial social media data for corporate fraud detection

    J. Manag. Inf. Syst.

    (2018)
  • W.T. Mongwe et al.

    A survey of automated financial statement fraud detection with relevance to the south African context, south African

    Comput. J.

    (2020)
  • S. De et al.

    A comprehensive multi-modal NDE data fusion approach for failure assessment in aircraft lap-joint mimics

    IEEE Trans. Instrum. Meas.

    (2013)
  • S. Li et al.

    Attention-based multi-modal fusion network for semantic scene completion, in

  • L. Purda et al.

    Accounting variables, deception, and a bag of words: assessing the tools of fraud detection

    Contemp. Account. Res.

    (2015)
  • P.M. Dechow et al.

    Predicting material accounting misstatements

    Contemp. Account. Res.

    (2011)
  • Cited by (7)

    • Cybersecurity threats in FinTech: A systematic review

      2024, Expert Systems with Applications
    View all citing articles on Scopus

    Gang Wang is a professor at the School of Management, Hefei University of Technology. He received his Ph.D. degree in Management Science and Engineering from the School of Management, Fudan University. His current research interests include business analytics, machine learning, fintech, and industrial big data analytics. His work has been published in such journals as MIS Quarterly, Decision Support Systems, Information Processing and Management, IEEE Transactions on Instrumentation and Measurement, IEEE Transactions on Systems, Man, and Cybernetics: Systems, and IEEE Intelligent Systems, and in the proceedings of such conferences as the International Conference on Information Systems (ICIS), Pacific Asia Conference on Information Systems (PACIS), and Hawaii International Conference on System Sciences (HICSS). He serves as an associate editor for Decision Support Systems.

    Jingling Ma is a Ph.D. candidate in the School of Management at HeFei University of Technology. Her research interests include business analytics and fintech. Her work has appeared in the Applied Soft Computing.

    Gang Chen is a Ph.D. student in management science and engineering at Fudan University. His research interests include fintech, social media analytics, and multimodal deep learning. His work has appeared in such journals as MIS Quarterly, IEEE Intelligent Systems, Electronic Commerce Research and Applications, and Applied Soft Computing, and in the proceedings of such conferences as the Pacific Asia Conference on Information Systems (PACIS) and Workshop on Information Technologies and Systems (WITS).

    View full text