Skip to main content
Log in

Corporate Bankruptcy Prediction Using Machine Learning Methodologies with a Focus on Sequential Data

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

We examine whether corporate bankruptcy predictions can be improved by utilizing the recurrent neural network (RNN) and long short-term memory (LSTM) algorithms, which can process sequential data. Employing the RNN and LSTM methodologies improves bankruptcy prediction performance relative to using other classification techniques, such as logistic regression, support vector machine, and random forest methods. Because performance indicators, such as sensitivity and specificity, differ depending on the methodology, selecting a model that suits the purpose of the bankruptcy predictions is necessary. Our ensemble model, a synthesis of all methodologies, exhibits the best forecasting performance. In the test sample for the ensemble model, none of the observations with a default probability of less than 10% defaults within one year.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data and material

The data used in this study is downloaded from Wharton Research Data Services. Subscription required.

Code Availability

This research is conducted by installing scikit-learn v.0.23.1 and tensorflow v.2.3.1 on Python v.3.8.5. The code used in the study can be disclosed upon request.

Notes

  1. Detailed explanations of these metrics are provided in the Appendix.

  2. This sample period is the best option given our computational resources. However, a preliminary study analyzing the logistic regression, support vector machine, and random forest methodologies from January 1961 to December 2019 finds that the random forest model, which is an ensemble learning model, performs the best. That result is consistent with our current empirical results.

  3. We use this simple method to fill in missing data because correcting missing values is not the main focus of this study. We choose not to drop missing variables because we would lose 16.12% of the total available observations if we did. Moreover, because we need 12 months of continuous observations for our analysis, the potential scope of data loss if we dropped all missing variables may be even larger.

  4. Although these methodologies do not account for the dynamics of the features, they can still input the information. We try using a total of 96 explanatory variables in these models, but we nevertheless cannot construct forecasting models, mainly owing to memory overflow. The RNN and LSTM methods can leave only necessary information considering the order of the explanatory variables, whereas other methodologies need to estimate all relevant parameters. The authors are grateful to an anonymous referee for pointing out this issue.

  5. We thank an anonymous for suggesting ways to structure the training data.

  6. In Table 2, Logistic stands for logistic regression, SVM stands for support vector machine, RF stands for random forest, RNN stands for simple recurrent neural network, LSTM stands for long short-term memory, and Ensemble is a model that averages the predicted probabilities calculated by each individual model. Since Table 2, all figures and tables use the same notations for the models: Logistic (logistic regression), SVM (support vector machine), RF (random forest), RNN (simple recurrent neural network), LSTM (long short-term memory), and Ensemble (ensemble model).

  7. Rather et al. (2015) also show that a hybrid model that combines RNNs and other linear models performs the best in predicting stock returns.

  8. Because the training period of 2007–2014 includes the global financial crisis, these results are acceptable.

  9. We appreciate an anonymous reviewer’s suggestions regarding this table’s layout.

References

  • Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589–609

    Article  Google Scholar 

  • Aretz, K., Florackis, C., & Kostakis, A. (2018). Do stock returns really decrease with default risk? New International Evidence. Management Science, 64(8), 3821–3842

    Article  Google Scholar 

  • Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83, 405–417

    Article  Google Scholar 

  • Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4, 71–111

    Article  Google Scholar 

  • Bonfim, D. (2009). Credit risk drivers: Evaluating the contribution of firm level information and of macroeconomic dynamics. Journal of Banking & Finance, 33(2), 281–299

    Article  Google Scholar 

  • Brogaard, J., Li, D., & Xia, Y. (2017). Stock liquidity and default risk. Journal of Financial Economics, 124(3), 486–502

    Article  Google Scholar 

  • Campbell, J. Y., Hilscher, J., & Szilagyi, J. (2008). In search of distress risk. Journal of Finance, 63(6), 2899–2939

    Article  Google Scholar 

  • Charitou, A., Neophytou, E., & Charalambous, C. (2004). Predicting corporate failure: Empirical evidence for the UK. European Accounting Review, 13(3), 465–497

    Article  Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357

    Article  Google Scholar 

  • Chen, S., Härdle, W. K., & Jeong, K. (2010). Forecasting volatility with support vector machine-based GARCH model. Journal of Forecasting, 29(4), 406–433

    Google Scholar 

  • Choudhary, M. A., & Haider, A. (2012). Neural network models for inflation forecasting: An appraisal. Applied Economics, 44(20), 2631–2635

    Article  Google Scholar 

  • Cohen, R. B., Polk, C., & Vuolteenaho, T. (2003). The value spread. Journal of Finance, 58(2), 609–641

    Article  Google Scholar 

  • Dakovic, R., Czado, C., & Berg, D. (2010). Bankruptcy prediction in Norway: A comparison study. Applied Economics Letters, 17(17), 1739–1746

    Article  Google Scholar 

  • Du Jardin, P. (2018). Failure pattern-based ensembles applied to bankruptcy forecasting. Decision Support Systems, 107, 64–77

    Article  Google Scholar 

  • Du Jardin, P., Veganzones, D., & Séverin, E. (2019). Forecasting corporate bankruptcy using accrual-based models. Computational Economics, 54(1), 7–43

    Article  Google Scholar 

  • Duan, J., Sun, J., & Wang, T. (2012). Multiperiod corporate default prediction—A forward intensity approach. Journal of Econometrics, 170(1), 191–209

    Article  Google Scholar 

  • Falavigna, G. (2012). Financial ratings with scarce information: A neural network approach. Expert Systems with Applications, 39(2), 1784–1792

    Article  Google Scholar 

  • Figlewski, S., Frydman, H., & Liang, W. (2012). Modeling the effect of macroeconomic factors on corporate default and credit rating transitions. International Review of Economics and Finance, 21(1), 87–105

    Article  Google Scholar 

  • Foreman, R. D. (2003). A logistic analysis of bankruptcy within the US local telecommunications industry. Journal of Economics and Business, 55(2), 135–166

    Article  Google Scholar 

  • García, V., Marqués, A. I., Sánchez, J. S., & Ochoa-Domínguez, H. J. (2019). Dissimilarity-based linear models for corporate bankruptcy prediction. Computational Economics, 53(3), 1019–1031

    Article  Google Scholar 

  • Glover, B. (2016). The expected cost of default. Journal of Financial Economics, 119(2), 284–299

    Article  Google Scholar 

  • Härdle, W., Lee, Y. J., Schäfer, D., & Yeh, Y. R. (2009). Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies. Journal of Forecasting, 28(6), 512–534

    Article  Google Scholar 

  • Herbrich, R., Keilbach, M., Graepel, T., Bollmann-Sdorra, P., & Obermayer, K., (1999). Neural networks in economics. In Computational Techniques for Modelling Learning in Economics (pp. 169–196). Springer, Boston, MA

  • Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554

    Article  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780

    Article  Google Scholar 

  • Huang, W., Nakamori, Y., & Wang, S. Y. (2005). Forecasting stock market movement direction with support vector machine. Computers & Operations Research, 32(10), 2513–2522

    Article  Google Scholar 

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. (Vol. 112)Springer.

    Book  Google Scholar 

  • Jessen, C., & Lando, D. (2015). Robustness of distance-to-default. Journal of Banking & Finance, 50, 493–505

    Article  Google Scholar 

  • Kim, H., Cho, H., & Ryu, D. (2018). An empirical study on credit card loan delinquency. Economic Systems, 42(3), 437–449

    Article  Google Scholar 

  • Kim, H., Cho, H., & Ryu, D. (2019). Default risk characteristics of construction surety bonds. Journal of Fixed Income, 29(1), 77–87

    Article  Google Scholar 

  • Kim, H., Cho, H., & Ryu, D. (2020). Corporate default predictions using machine learning: Literature review. Sustainability, 12(16), 6325

    Article  Google Scholar 

  • Kim, H., Cho, H., & Ryu, D. (2021). Forecasting consumer credit recovery failure: Classification approaches. Journal of Credit Risk, Forthcoming.

  • Kuan, C.-M., & Liu, T. (1995). Forecasting exchange rates using feedforward and recurrent neural networks. Journal of Applied Econometrics, 10(4), 347–364

    Article  Google Scholar 

  • Kukuk, M., & Rönnberg, M. (2013). Corporate credit default models: A mixed logit approach. Review of Quantitative Finance and Accounting, 40(3), 467–483

    Article  Google Scholar 

  • Lee, Y.-C. (2007). Application of support vector machines to corporate credit rating prediction. Expert Systems with Applications, 33(1), 67–74

    Article  Google Scholar 

  • Nam, C., Kim, T., Park, N., & Lee, H. (2008). Bankruptcy prediction using a discrete-time duration model incorporating temporal and macroeconomic dependencies. Journal of Forecasting, 27(6), 493–506

    Article  Google Scholar 

  • Nelson, D. M., Pereira, A. C., & de Oliveira, R. A. (2017). Stock market's price movement prediction with LSTM neural networks. In 2017 International Joint Conference on Neural Networks, IEEE, 1419–1426

  • Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. In 1990 IJCNN International Joint Conference on Neural Networks, IEEE, 163–168.

  • Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109–131

    Article  Google Scholar 

  • Pan, Y., Wang, T. Y., & Weisbach, M. S. (2018). How management risk affects corporate debt. Review of Financial Studies, 31(9), 3491–3531

    Article  Google Scholar 

  • Piri, S., Delen, D., & Liu, T. (2018). A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decision Support Systems, 106, 15–29

    Article  Google Scholar 

  • Rather, A. M., Agarwal, A., & Sastry, V. N. (2015). Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, 42(6), 3234–3241

    Article  Google Scholar 

  • Selvin, S., Vinayakumar, R., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2017). Stock price prediction using LSTM, RNN and CNN-sliding window model. In 2017 International Conference on Advances in Computing, Communications and Informatics, IEEE, 1643–1647

  • Shumway, T. (2001). Forecasting bankruptcy more accurately: A simple hazard model. Journal of Business, 74(1), 101–124

    Article  Google Scholar 

  • Siami-Namini, S., & Namin, A. S. (2018). Forecasting economics and financial time series: ARIMA vs. LSTM. arXiv:1803.06386

  • Tian, S., Yu, Y., & Guo, H. (2015). Variable selection and corporate bankruptcy forecasts. Journal of Banking & Finance, 52, 89–100

    Article  Google Scholar 

  • Traczynski, J. (2017). Firm default prediction: A Bayesian model-averaging approach. Journal of Financial and Quantitative Analysis, 52(3), 1211–1245

    Article  Google Scholar 

  • Trustorff, J.-H., Konrad, P. M., & Leker, J. (2011). Credit risk prediction using support vector machines. Review of Quantitative Finance and Accounting, 36, 565–581

    Article  Google Scholar 

  • Veganzones, D., & Séverina, E. (2018). An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112, 111–124

    Article  Google Scholar 

  • Wilson, R. L., & Sharda, R. (1994). Bankruptcy prediction using neural networks. Decision Support Systems, 11(5), 545–557

    Article  Google Scholar 

  • Yang, Z., Platt, M. B., & Platt, H. D. (1999). Probabilistic neural networks in bankruptcy prediction. Journal of Business Research, 44(2), 67–74

    Article  Google Scholar 

  • Zhou, L. (2013). Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowledge-Based Systems, 41, 16–25

    Article  Google Scholar 

  • Zmijewski, M. E. (1984). Methodological issues related to the estimation of financial distress prediction models. Journal of Accounting Research, 22, 59–82

    Article  Google Scholar 

Download references

Funding

This work was supported by the 2020 Yeungnam University Research Grant.

Author information

Authors and Affiliations

Authors

Contributions

Proposal & original idea, H.K. and D.R.; conceptualization, H.K. and H.C.; modeling, H.K. and D.R.; methodology, H.K. and H.C.; validation, D.R.; resources, H.C.; software, H.K.; literature review, H.K., H.C., and D.R.; economic & business implication, D.R.; writing—original draft preparation, H.K., H.C., and D.R.; writing—review & editing, H.K. and D.R.; discussion, H.K. and D.R.; project administration, D.R. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Doojin Ryu.

Ethics declarations

Conflict of interest

There is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this study, we use five representative classification performance measures to assess each models’ prediction performance. The first four measures—accuracy, precision, recall, and the F1 score—are calculated as follows:

$$Accuracy=\frac{TP+TN}{TP+FP+TN+FN}$$
(10)
$$Precision=\frac{TP}{TP+FP}$$
(11)
$$Recall=\frac{TP}{TP+FN}$$
(12)
$$F1 score=\frac{2\times Precision\times Recall}{Precision+Recall}$$
(13)

where TP stands for true positive and represents the number of non-bankrupt firms classified as non-bankrupt, TN stands for true negative and represents the number of bankrupt firms classified as bankrupt, FP stands for false positive and represents the number of bankrupt firms classified as non-bankrupt, and FN stands for false negative and represents the number of non-bankrupt firms classified as bankrupt. The fifth measure, the area under the receiver operating characteristic curve, is the area under the graph with the recall on the Y-axis and one minus the specificity on the X-axis, where the specificity is calculated as TN/(FP + TN).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, H., Cho, H. & Ryu, D. Corporate Bankruptcy Prediction Using Machine Learning Methodologies with a Focus on Sequential Data. Comput Econ 59, 1231–1249 (2022). https://doi.org/10.1007/s10614-021-10126-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-021-10126-5

Keywords

JEL Classifications

Navigation