Attentive statement fraud detection: Distinguishing multimodal financial data with fine-grained attention
Introduction
Financial statement fraud aims to deceive the statement users (e.g., regulators and investors) with conscious misstatements, such as manipulating financial reports and disclosing incorrect financial information. Fraud triangle theory, proposed by Turner et al. [1], explained the reasons for the financial statement fraud from three perspectives: pressure, attitude, and opportunity. Companies may conduct fraud due to the pressure from poor cash position, a loss of customers, and so on [2]. They generally hold the attitude that such activities are not criminal, and competitors are participating in them, using this logic to rationalize fraudulent activities. Furthermore, weak internal controls, imperfect corporate governance and low consequence for violations, may provide companies with the opportunity to conduct fraud. Fraudulent behavior could cause adverse effects including: 1) jeopardizing the reliability and quality of the financial reporting process; 2) diminishing public confidence in the accounting and audit profession; and 3) compromising the efficiency of the capital market [2]. Under these circumstances, both the academics and industry, focus on financial statement fraud detection (FSFD) to provide early warning for relative stakeholders, and thus contributing to their decision processes. Specifically, for investors, FSFD could help them make informed decisions to avoid huge investment losses [3]. For auditors, they can benefit from the FSFD by paying more attention to companies with high fraudulent probability. For government regulators, accurate FSFD enables them to judge whether to conduct regulatory intervention in the market.
As FSFD has been dominated by quantitative financial data, recent studies have attempted to leverage the joint effect of multimodal relevant data to enhance FSFD [4]. Specifically, the financial modality, for example, financial ratios extracted from financial statement, with their periodicity, intelligibility, and domain-specificity, are one of the most effective and convenient ways to learn about a company's financial position [5]. Meanwhile, textual modality embedded in financial reports, has been also explored for FSFD [6]. Therefore, coordinating multimodal information to enhance FSFD is inevitable. Various machine learning methods have been developed for FSFD, based on multimodal financial data [7,8]. However, machine learning methods, in spite of their considerable interpretability, suffer significant deficiencies in learning powerful representations from unstructured multimodal data. Instead, deep learning methods portray strong advantages in learning from multimodal data. Multimodal deep learning methods facilitate the prediction performance by synergizing multiple data modalities in the end-to-end training process.
As indicated in prior studies [9,10], the success of multimodal deep learning lies in the reconciliation of multiple deep-learning-based feature representation. This motivates the exploration on the complementary predictive effects of multi-modalities for FSFD. Nevertheless, the fusion ambiguity embedded in and among multi-modalities is often underexplored, when multimodal deep learning methods are applied to FSFD. Specifically, for the financial modality,1 the current-year financial data along with the previous year's and industry-level financial data, contributes to FSFD [3]. These various types of financial data have been implemented in FSFD to exert their complementary predictive roles. However, since different ratio groups naturally have different discriminative power in terms of FSFD, their direct combination may result in fusion ambiguity, and subsequently hinder the prediction performance. As for the textual modality, researchers constructed it commonly based on the Management Discussion and Analysis (MD&A) of financial reports. The MD&A has been identified as the factor most likely to embed fraudulent information, because it contains self-furnished narrative contents [11]. In this study, we posit that the chapters like MD&A and other factors, contribute to the detection of financial fraud, and help in revealing the deceptive behaviours in accountancy. For example, the change of important leaders should be a matter of great importance to the company; therefore, manipulators tend to disguise this fact in the chapter Corporate Governance, to remedy the reporter's reliability. Furthermore, the textual information conveyed in the different chapters, contributes differently to FSFD. This is because chapters are set to accommodate narratives of distinct aspects within the company. This raises the requirement of making a distinct use of the chapters, while delivering them to FSFD. From the perspective of multimodality, existing studies on FSFD have fused multimodal data to explore their complementary predictive effects [12,13]. Unfortunately, the direct summation of modalities may inevitably ignore the fusion ambiguities among multiple modalities, hence weakening the predictive ability of relatively informative modality in FSFD.
In this regard, the attention mechanism, which is proposed to generate weight schemes for the input elements in deep learning, shows its potential for exploring modality-wise predictive abilities and coordinating multimodal useful information for FSFD. Existing FSFD studies have applied attention mechanisms on the textual modality to explore multi-level (e.g., word-level, sentence level, and chapter-level) fusion ambiguity, liberating its predictive power [14]. However, these attention mechanisms focused on the unimodal features, ignoring the distinct predictive power of different modalities. When fusing multimodal features for FSFD, the fusion ambiguity embedded in and among multimodal features, are key factors affecting the final prediction performance. Nevertheless, they have been insufficiently investigated by prior studies. Another consideration in FSFD is the class imbalance problem. Previous studies which constructed one-to-one matching samples for FSFD, ignored that the real number of fraudulent companies is less than that of healthy companies in the capital market [15]. Therefore, the imbalance problem becomes another issue to be solved during the building of an accurate FSFD model.
Hence, the aim of this study is to leverage multimodal information for FSFD, using a novel method, RCMA, which synthesizes three fine-grained attention mechanisms (i.e., Ratio-aware, Chapter-aware, and Modality-aware Attention mechanisms) for capitalizing on the financial modality, textual modality, and their combination. In RCMA, both the fusion ambiguity and class imbalance problems are taken into consideration. In the meanwhile, weights are assigned to multi-modalities as a two-step decision process for relative stakeholders. First, we extracted financial modality based on numerical financial statements, and textual modality based on narrative financial reports, to capture the multifaceted factors involved in fraudulent behavior. Specifically, we gathered the financial data from different accounting dimensions. Additionally, we extracted textual modality from all chapters from the financial report, to exhaustively mine the useful information. Second, for the fusion ambiguity embedded in financial modality, we designed a Ratio-aware (RA) attention mechanism based on Multilayer Perceptron (MLP) to discriminate the financial ratios' distinctive predictive abilities to FSFD. To distinguish the fusion ambiguity embedded in the textual modality, we designed a Chapter-aware (CA) attention mechanism based on Long Short-Term Memory (LSTM). This provides a comprehensive view for the prediction model to assign different chapters with specific weights. Third, we set a Modality-aware (MA) attention layer to coordinate modalities' different predictive abilities to realize multimodal fusion. Fourth, to achieve a better multimodal learning effect, we proposed a novel learning objective named Focal and Consistency Loss, or FCL. It considers the class-imbalance issue, and highlights the consistency among modalities to increase the reliability of the prediction. We have focused our attention on addressing the following three research questions.
Research Question 1. Does the fusion of multi-modal data improve the predictive ability of solely financial modality or textual modality to FSFD?
Research Question 2. Relative to existing FSFD methods, how effectively can RCMA detect financial statement fraud from multi-modal data?
Research Question 3. Can RCMA contribute to people's understanding to the FSFD and help their decision support process?
To answer these research questions, we have reorganized the evaluation results in line with them, respectively. First, we answer Research Question 1 and Research Question 2 by disentangling the modality-wise comparisons and the method-wise comparisons separately. Second, we conducted ablative analysis to further illustrate how each of the modules of our proposed method such as RA, CA, and MA attention mechanism influences the overall performance of RCMA. This can be also responsible for Research Question 2, that is, how our approach effectively improves the FSFD performance by consolidating single components for RCMA. Third, we answer our Research Question 3 by the interpretation analysis of attention weights, which intuitively shows the distinct predictive ability of different ratio groups, chapters, and modalities. It also illustrates how these interpretation results contribute to stakeholders' decision support process for FSFD. Note that the evaluations of the proposed RCMA are based on 2070 firm-year observations collected from the China Security Market Accounting Research Database (CSMARD).
We summarize the main contributions of this paper as follows:
(1) We propose a novel multimodal deep leaning framework which considers both the fusion ambiguity embedded in and among multimodal data for FSFD. In the framework, distinctive predictive abilities of different financial ratio groups in the financial modality, different chapters in the textual modality, and different modalities are coordinated to enhance FSFD. This inspires future FSFD studies to fuse multi-modalities into the deep learning framework, considering modality-specific characteristics.
(2) We propose a fine-grained attention-based multimodal deep learning method for FSFD. In RCMA, the designed MLP-based RA and LSTM-based CA attention mechanisms, are to leverage the financial and textual modality, respectively. Meanwhile, the MA attention mechanism is set to realize an adaptive use of multimodal data. Unlike existing methods that failed to sufficiently distinguish the distinct roles of multimodal data, RCMA allows intra- and inter-modality fusion ambiguity to be effectively reconciled.
(3) We empirically evaluated RCMA based on the real-world dataset collected from the CSMARD, and the experimental results verified the RCMA's superior FSFD performance. Furthermore, interpretation analysis visualizes the attention weights of different ratio groups, chapters, and modalities from RCMA, and illustrates how these interpretations influence stakeholders' decision support process for FSFD.
Section snippets
Related work
In this section, we first reviewed previous studies related to multi-modalities in FSFD. Second, we present studies about multimodal deep learning and discuss its connections to FSFD. Third, we introduced prior studies on attention mechanisms and their applications to FSFD.
Framework
Facing the challenge of handling the fusion ambiguity embedded simultaneously in and among multi-modal features in FSFD, we propose an RCMA model. In particular, RCMA focuses on three steps including intra-modal representation, inter-modal fusion, and learning strategy for FSFD. In intra-modal representation, two modalities are extracted to capture multifaceted factors. Specifically, for financial modality, it comprises financial ratios that reflect various aspects of financial statuses. An
Experimental design
In this section, an experiment was conducted to investigate whether the proposed RCMA can achieve a decent prediction performance for FSFD. Specifically, the dataset adopted in the experiment, the metrics used to evaluate the prediction performance, and the experimental procedures, are presented in detail.
Main results
In this section, the mean and standard deviations (Std) of AUC are presented in Table 3 Panel A. Moreover, Type I and type II error under different modalities and methods are shown in Table 3 Panel B.
To further clarify the experimental results, the best values of methods under each modality are presented in boldface. In Table 3 Panel A, the fused multiple modalities Mm show superiority to unimodal features across all the methods. Especially, RCMA under Mm achieves the highest AUC value,
Conclusion and future work
With the rise of deep learning models, in terms of the multimodal FSFD task, the representations of different modalities can be learned automatically and fused in a well-designed deep network for accurate predictions. Nevertheless, the fusion ambiguity embedded in and among multiple modalities is often neglected when applying deep multimodal learning in FSFD, directly weakening the final prediction accuracy. In this situation, this study proposes an RCMA method with ratio-aware, chapter-aware,
CRediT authorship contribution statement
Gang Wang: Conceptualization, Methodology. Jingling Ma: Methodology, Software, Formal analysis. Gang Chen: Methodology, Visualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work is partially supported by the National Natural Science Foundation of China (72071062), Science Fund for Distinguished Young Scholars of AnHui (2208085J12), Anhui Provincial Key Research and Development Program (202104a05020038), and Fundamental Research Funds for the Central Universities (PA2021KCPY0032).
Gang Wang is a professor at the School of Management, Hefei University of Technology. He received his Ph.D. degree in Management Science and Engineering from the School of Management, Fudan University. His current research interests include business analytics, machine learning, fintech, and industrial big data analytics. His work has been published in such journals as MIS Quarterly, Decision Support Systems, Information Processing and Management, IEEE Transactions on Instrumentation and
References (52)
- et al.
Financial fraud detection using vocal, linguistic and financial cues
Decis. Support. Syst.
(2015) - et al.
Intelligent financial fraud detection: a comprehensive review
Comput Secur.
(2016) - et al.
Identification of fraudulent financial statements using linguistic credibility analysis
Decis. Support. Syst.
(2011) - et al.
Mining corporate annual reports for intelligent detection of financial statement fraud – a comparative study of machine learning methods
Knowl.-Based Syst.
(2017) - et al.
Deep learning for detecting financial statement fraud
Decis. Support. Syst.
(2020) - et al.
Detecting evolutionary financial statement fraud
Decis. Support. Syst.
(2011) - et al.
Detecting the financial statement fraud: the analysis of the differences between data mining techniques and experts’ judgments
Knowl.-Based Syst.
(2015) - et al.
Making words work: using financial text as a predictor of financial events
Decis. Support. Syst.
(2010) - et al.
Enhancement of fraud detection for narratives in annual reports
Int. J. Account. Inf. Syst.
(2017) - et al.
Detecting financial restatements using data mining techniques
Expert Syst. Appl.
(2017)
Aspect-based sentiment classification with multi-attention network
Neurocomputing.
Deep learning models for bankruptcy prediction using textual disclosures
Eur. J. Oper. Res.
Decision support from financial disclosures with deep neural networks and transfer learning
Decis. Support. Syst.
Corporate failure prediction in the European energy sector: a multicriteria approach and the effect of country characteristics
Eur. J. Oper. Res.
A two-stage classification technique for bankruptcy prediction
Eur. J. Oper. Res.
An investigation of bankruptcy prediction in imbalanced datasets
Decis. Support. Syst.
An analysis of the fraud triangle, The University of Memphis Working Paper
Financial Statement Fraud: Motives, Methods, Cases and Detection
MetaFraud: a meta-learning framework for detecting financial fraud
MIS Q.
Finding needles in a haystack: using data analytics to improve fraud prediction
Account. Rev.
Leveraging financial social media data for corporate fraud detection
J. Manag. Inf. Syst.
A survey of automated financial statement fraud detection with relevance to the south African context, south African
Comput. J.
A comprehensive multi-modal NDE data fusion approach for failure assessment in aircraft lap-joint mimics
IEEE Trans. Instrum. Meas.
Attention-based multi-modal fusion network for semantic scene completion, in
Accounting variables, deception, and a bag of words: assessing the tools of fraud detection
Contemp. Account. Res.
Predicting material accounting misstatements
Contemp. Account. Res.
Cited by (7)
FollowAKOInvestor: Stock recommendation by hearing voices from all kinds of investors with machine learning
2024, Expert Systems with ApplicationsPredicting financial distress using multimodal data: An attentive and regularized deep learning method
2024, Information Processing and ManagementCybersecurity threats in FinTech: A systematic review
2024, Expert Systems with Applications
Gang Wang is a professor at the School of Management, Hefei University of Technology. He received his Ph.D. degree in Management Science and Engineering from the School of Management, Fudan University. His current research interests include business analytics, machine learning, fintech, and industrial big data analytics. His work has been published in such journals as MIS Quarterly, Decision Support Systems, Information Processing and Management, IEEE Transactions on Instrumentation and Measurement, IEEE Transactions on Systems, Man, and Cybernetics: Systems, and IEEE Intelligent Systems, and in the proceedings of such conferences as the International Conference on Information Systems (ICIS), Pacific Asia Conference on Information Systems (PACIS), and Hawaii International Conference on System Sciences (HICSS). He serves as an associate editor for Decision Support Systems.
Jingling Ma is a Ph.D. candidate in the School of Management at HeFei University of Technology. Her research interests include business analytics and fintech. Her work has appeared in the Applied Soft Computing.
Gang Chen is a Ph.D. student in management science and engineering at Fudan University. His research interests include fintech, social media analytics, and multimodal deep learning. His work has appeared in such journals as MIS Quarterly, IEEE Intelligent Systems, Electronic Commerce Research and Applications, and Applied Soft Computing, and in the proceedings of such conferences as the Pacific Asia Conference on Information Systems (PACIS) and Workshop on Information Technologies and Systems (WITS).