Stock investment strategy combining earnings power index and machine learning

https://doi.org/10.1016/j.accinf.2022.100576Get rights and content

Highlights

  • We proposed using machine-learning models for the relationship between factors of Earnings Power Index and excess returns.

  • We evaluated the model in predicting stock returns using the hedge portfolio test for the top and bottom 20% of observations.

  • Most portfolios, including EPI-related variables, presented positive returns regardless of the holding period.

  • The proposed approach can provide investors with a profitable mid-term strategy by estimating the probability of return changes.

Abstract

We propose an intermediate-term stock investment strategy based on fundamental analysis and machine learning. The approach uses predictors from the Earnings Power Index (EPI) as input variables derived from cross-sectional and time-series data from a company’s financial statements. The analytical methods of machine learning allow us to validate the link between financial factors and excess returns directly. We then select stocks for which returns are likely to increase at the time of the next disclosed financial statement. To verify the proposed approach’s usefulness, we use company data listed publicly on the Korean stock market from 2013 to 2019. We examine the profitability of trading strategy based on ten machine-learning techniques by forming long, short, and hedge portfolios with three different measures. As a result, most portfolios, including EPI-related variables, present positive returns regardless of the period. Especially, the neural network of the two layers with sigmoid function presents the best performance for the period of 3 months and 6 months, respectively. Our results show that incorporating machine learning is useful for mid-term stock investment. Further research into the possible convergence of financial statement analysis and machine-learning techniques is warranted.

Introduction

Predicting stock markets is a challenging task for both academics and investors in an effort to increase returns on investment. Information that affects stock prices ranges from macroeconomic factors, such as economic growth rates and exchange rates, to firm-specific data. Stakeholders rely on financial statements to obtain important information about future profits and companies’ intrinsic values. As a result, financial statements can help investors make rational decisions and build better investment portfolios (Kothari, 2001, Richardson et al., 2010).

Accounting and finance research has long supported the view that key financial indicators derived from fundamental analyses have significant predictive power for future earnings and explanatory potential with respect to the intrinsic value of a company (Abarbanell and Bushee, 1997, Abarbanell and Bushee, 1998, Lev and Thiagarajan, 1993, Mohanram, 2005, Penman and Zhang, 2002, Penman and Zhang, 2006, Piotroski, 2000; Wieland, 2011; Wahlen and Wieland, 2011). These studies show that financial variables can predict the direction of future earnings, and that an investment strategy that takes advantage of these forecasts can perform well. However, the conventional approach to financial statement analysis has been criticized because it selects variables arbitrarily without the benefit of a theoretical background (Richardson et al., 2010, Shin et al., 2017). To address these drawbacks, Penman and Zhang (2006), who focus on the sustainability of earnings using the residual income model, suggest application of a summary measure known as the predicted earnings increase score (PEIS). Wahlen and Wieland (2011) successfully apply Penman and Zhang’s index. Other studies report that the momentum effect can improve the accuracy of forecasts of the direction of future profit (Fairfield et al., 1996, Hirst et al., 2007). Following these strands of research, Song et al. (2020) propose an Earnings Power Index (EPI), which adds to PEIS indicators three potential candidate factors derived from time-series trends. This index has the comprehensible list of elements that have been proven to be predictive of future earnings. Here, we exploit the individual factors of EPI.

Previous studies focused on predicting future profit, using logistic or linear regression models to select significant variables (Abarbanell and Bushee, 1997, Lev and Thiagarajan, 1993, Nissim and Penman, 2001, Ou and Penman, 1989, Penman and Zhang, 2006). However, identifying the complex relationships among variables with the linear models can be difficult. Meanwhile, an algorithm customized for big data is necessary when the amount and complexity of data increase and infrastructure can be established to efficiently collect and manage data (Fayyad et al., 1996). To this end, machine-learning techniques have been utilized to estimate an unknown function that connects inputs and outputs. In the field of stock price prediction, machine-learning techniques have been applied through two approaches; fundamental analysis and technical analysis (Ballings et al., 2015, Bustos and Pomares-Quimbaya, 2020, Nti et al., 2020). Technical analysis has been used for predictions based on immediate price trends (Bustos and Pomares-Quimbaya, 2020, Nti et al., 2020), whereas in fundamental analysis, data become available at the disclosure of financial statements, making the approach appropriate for mid- to long-term forecasting.

Feature engineering is important to determine final model performance in most machine learning applications. Rather than using many raw variables, selective and sophisticated derived variables can provide better performance in most machine learning applications. In this research, rather than using more numbers of raw variables in a firm’s financial statements (Tsai et al., 2011, Ballings et al., 2015, Bao et al., 2020), EPI-related factors are selected as derived variables for stock price prediction. So, we propose using machine-learning models to examine the relationship between EPI indicators and excess returns.

In contrast to earlier research into fundamental analysis, this study’s methodology differs in terms of the convergence approach between financial statement analysis and machine learning. For financial statement analysis, machine learning can capture the complicated relationship between inputs and outputs more effectively than linear models. In this integrated approach, we can consider more sophisticated indicators and recognize interpretive power with a theoretical basis. The indicators come from research into the prediction of future earnings (Nissim and Penman, 2001, Penman and Zhang, 2006, Wahlen and Wieland, 2011, Song et al., 2020). To validate the model for predicting the rise in stock returns, we compared the difference in abnormal returns on the shares most likely to increase in price with those most likely to decrease in price. We follow the hedge portfolio strategy used in the research of Holthausen and Larker (1992) and the three-factor model of Fama and French (1993) to evaluate the efficiency of the model. However, the forecast and return measurement period is revised to 3 months and 6 months to meet the research purpose of intermediate-term investment. In addition, we use ten machine learning techniques as predictive models of abnormal returns to directly predict the signs of stock returns. Then, we calculate the hedge portfolio returns on a long in predicted winners and short in predicted losers and then compute the four different measures of that portfolio: market-adjusted return, Jensen alpha, size-adjusted return, and Fama-French three-factor model return.

This study differs from previous efforts in that machine learning is utilized to examine the influence of financially guaranteed factors for mid-term investment. Using the models of machine learning, we examine whether fundamental analysis can help investors make rational decisions and demonstrated the usefulness of EPI-related information for predicting future returns. We also empirically test whether machine-learning models targeting intermediate-term investments can generate abnormal returns in practice.

The balance of this paper is organized as follows. Section 2 reviews related research and discusses the direction of the study. Section 3 describes our proposed strategy of investment stock selection, and Section 4 details the experimental results of our strategy. Section 5 presents conclusions and implications for future research.

Section snippets

Prediction of excess returns initiated by factor investment

A factor investment strategy is based on exploring factors that can generate excess profit in a market over the long-term. Numerous studies have attempted to identify which factor can explain value premium, targeting steady profits in the market. The earliest research primarily verifies a rational capital asset pricing model (CAPM) under the efficient market hypothesis, in which a stock price immediately reflects information (Fama, 1970). First, empirical analysis, which connects financial

Proposed approach

This section describes the proposed approach using machine-learning techniques. It consists of three steps from data preparation to stock selection for investing, as summarized in Fig. 1.

Empirical setting

The analysis period is from 2013 to 2019, and we collect quarterly financial data for 1,878 companies listed at the time on the Korea Composite Stock Price Index (KOSPI) and Korea Securities Dealers Automated Quotation (KOSDAQ). There are 690 companies on the KOSPI and 1,188 in the KOSDAQ market. The data cover a wide range of industries (see Table 4) and are processed into 15 variables.

We collect data for 28,399 observations and partition these data into a training sample (of 20,413

Discussion

We confirm that the combination of financial statement analysis and machine learning can be effective. Beside the experimental result, we try to validate the usefulness of the EPI indicators and the profitability of the proposed investment strategy.

Conclusions

We examine the usefulness of machine-learning techniques in intermediate-term investment strategies, combining these techniques with financial indicators that have been proven to predict future profits. We use machine learning to estimate complicated relationships between variables. This study confirms binary classifiers’ performance in predicting stock returns by performing the standard investment hedge portfolio test using the top and bottom 20 % of available observations. As a result, most

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (72)

  • C.F. Huang

    A hybrid stock selection model using genetic algorithms and support vector regression

    Appl. Soft Comput. J.

    (2012)
  • K. Ishibashi et al.

    Model Selection for Financial Statement Analysis: Variable Selection with Data Mining Technique

    Procedia Computer Science

    (2016)
  • S.P. Kothari

    Capital markets research in accounting

    J. Account. Econ.

    (2001)
  • M.R. Reinganum

    Misspecification of capital asset pricing: Empirical anomalies based on earnings' yields and market values

    Journal of financial Economics

    (1981)
  • S. Richardson et al.

    Accounting anomalies and fundamental analysis: A review of recent research advances

    J. Account. Econ.

    (2010)
  • P. Rikhardsson et al.

    Business intelligence & analytics in management accounting research: Status and future focus

    Int. J. Account. Inf. Syst.

    (2018)
  • S.V. Stehman

    Selecting and interpreting measures of thematic classification accuracy

    Remote Sens. Environ.

    (1997)
  • T.L. Stober

    Summary financial statement measures and analysts' forecasts of earnings

    Journal of Accounting and Economics

    (1992)
  • S.G. Sutton et al.

    “The reports of my death are greatly exaggerated”—Artificial intelligence research in accounting

    Int. J. Account. Inf. Syst.

    (2016)
  • C.F. Tsai et al.

    Predicting stock returns by classifier ensembles

    Applied Soft Computing Journal

    (2011)
  • J. Zhang et al.

    A novel data-driven stock price trend prediction system

    Expert Systems with Applications

    (2018)
  • J.S. Abarbanell et al.

    Fundamental Analysis, Future Earnings, and Stock Prices

    J. Account. Res.

    (1997)
  • J.S. Abarbanell et al.

    Abnormal returns to a fundamental analysis strategy

    Account. Rev.

    (1998)
  • G. Albanis et al.

    Combining heterogeneous classifiers for stock selection. Intell. Syst. Accounting

    Financ. Manag.

    (2007)
  • Y. Bao et al.

    Detecting accounting fraud in publicly traded US firms using a machine learning approach

    Journal of Accounting Research

    (2020)
  • A. Baranes et al.

    Earning movement prediction using machine learning- Support Vector Machines (SVM)

    J. Manag. Inf. Decis. Sci.

    (2019)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • M.M. Carhart

    On persistence in mutual fund performance

    J. Finance

    (1997)
  • L. Chen et al.

    An Alternative Three-Factor Model

    Behavioral & Experimental Finance eJournal

    (2011)
  • T. Chen et al.

    XGBoost: A scalable tree boosting system, in

  • M.J. Cooper et al.

    Asset growth and the cross-section of stock returns

    J. Finance

    (2008)
  • N. Cristianini et al.

    An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Cambridge: Cambridge University Press.

    (2000)
  • G. Cybenko

    Approximation by superpositions of a sigmoidal function

    Math. Control Signals Syst.

    (1989)
  • K. Daniel et al.

    Evidence on the characteristics of cross sectional variation in stock returns

    J. Finance

    (1997)
  • P.M. Fairfield et al.

    Accounting classification and the predictive content of earnings

    (1996)
  • E.F. Fama

    Efficient Capital Markets: A Review of Theory and Empirical Work

    J. Finance

    (1970)
  • Cited by (1)

    View full text