Inferring trade directions in fast markets

https://doi.org/10.1016/j.finmar.2021.100635Get rights and content

Highlights

  • New algorithm (FI) significantly outperforms established alternatives, particularly at low timestamp precisions.

  • Improvements carry over to applications based on the classification results, e.g. estimation of transaction costs.

  • A risk averse investor would prefer to use the FI algorithm to estimate transaction costs instead of the Lee-Ready algorithm.

  • The interpolation method and the Bulk-Volume classification algorithm do not offer improvements.

Abstract

The reliability of trade classification algorithms that identify the liquidity demander in financial markets transaction data has been questioned due to an increase in the frequency of quote changes. Hence, this paper proposes a new method. While established algorithms rely on an ad hoc assignment of trades to quotes, the proposed full-information (FI) algorithm actively searches for the quote that matches a trade. The FI algorithm outperforms the existing ones, particularly at low timestamp precision: For data timestamped at seconds misclassification is reduced by half compared to the popular Lee-Ready algorithm. These improvements also carry over into empirical applications such as the estimation of transaction costs. The recently proposed interpolation method and bulk volume classification algorithm do not offer improvements.

Introduction

The trade direction of the liquidity demanding side of the order flow is a necessary ingredient for many traditional measures of market liquidity (Huang and Stoll, 1996; Fong et al., 2017) and remains a popular indicator of informed trading (see, e.g., Hu, 2014, 2018; Bernile et al., 2016; Muravyev, 2016; Chordia et al., 2019). Typically, studies on such topics rely on trade classification algorithms, most prominently the Lee and Ready (1991) algorithm, to obtain an indicator of the liquidity demanding side of each transaction, the trade initiator.

O'Hara (2015) and Easley et al. (2016) note that the new reality of fast markets poses certain challenges to the reliability of established trade classification algorithms.1 Yet, few attempts have been made to adjust or design new algorithms.

In this paper, I propose a new algorithm, the full-information algorithm, and show that it outperforms the common alternatives, including under the challenging conditions of fast markets. The algorithm is implemented in the increasingly popular Python language and made available on GitHub.2

The established methods, most notably the algorithms of Lee and Ready (1991) (LR), Ellis et al. (2000) (EMO), and Chakrabarty et al. (2007) (CLNV), classify trades based on the proximity of the transaction price to the quotes in effect at the time of the trade. This is problematic due to the increased frequency of order submission and cancellation. With several quote changes taking place at the time of the trade, it is not clear which quotes to select for the decision rule of the algorithm. For example, Angel et al. (2015) record an average of almost 700 quote changes per minute for all stocks in the MTAQ data in 2012.3 In the sample of NASDAQ trade and quote data studied in this paper, I find a median of 17 quote changes per second in which at least one trade occurs.

The problem of imprecise timestamps relative to the frequency of quote changes does not only pertain to U.S. data or the TAQ equity data. Futures transaction data are often studied with relatively low timestamp granularity (e.g., Bernile et al., 2016), and European transaction data collected under the Markets in Financial Instrument Directive (MiFID) are timestamped only to seconds.4

Chakrabarty et al. (2015) and Panayides et al. (2019) find accuracies of the LR algorithm of around 85% and 79%, respectively. Chakrabarty et al. (2015) use a combination of NASDAQ ITCH and DTAQ data over a three-month period in 2011. Panayides et al. (2019) use data from Euronext timestamped to seconds in 2007–2008. For data from the LSE in 2017 that is timestamped to the microsecond, the authors find an even worse performance of only 46% accuracy.

Older studies analyzing the accuracy of the LR algorithm, as well as the alternative EMO and CLNV algorithms, find classification accuracies ranging from 75% to 93% (e.g., Ellis et al., 2000; Finucane, 2000; Lee and Radhakrishna, 2000; Odders-White, 2000; Theissen, 2001; Chakrabarty et al., 2007). The concern in many of these studies has been more that of a time-delay between reported quotes and trades rather than insufficient timestamp granularity. The effect, however, is the same: the true trade-quote correspondence is unknown. The traditional response has been to lag quote times by varying degree depending on the sample under study.5

The algorithm proposed in this paper takes a new approach to the issue of unknown trade-quote correspondence. Instead of selecting a single pair of ask and bid quotes before the classification step, it matches the transaction to its corresponding quote at the same time as it is classified. The idea is that a trade executed against the ask must leave its footprint on the ask-side, while a trade against the bid must leave its footprint on the bid-side. Finding these footprints is equivalent to simultaneously finding the quote corresponding to a trade and classifying it.

Recent proposals to counter the problem of modern fast markets for trade classification include Easley et al.’s (2012) bulk volume classification (BVC) algorithm and Holden and Jacobsen (2014) interpolation method. The latter interpolates trade and quote times of imprecisely timestamped data before applying one of the traditional algorithms. The BVC algorithm is a more radical change to trade classification and questions the use of the aggressor flag in the context of extracting informed trading from the order flow altogether. Chakrabarty et al. (2015) and Panayides et al. (2019) show that the LR algorithm outperforms the BVC algorithm with respect to identifying the trade initiator, although the results by Panayides et al. (2019) depend on specific modelling and sample choices.

To evaluate the new algorithm against the LR, EMO, CLNV, and BVC algorithms, I use data from NASDAQ's electronic limit order book. The sample runs from May to July 2011 with a total of over 134 million transactions timestamped to nanoseconds. The data contain the trade direction of the executed standing orders in the limit order book. Hence, the liquidity supplying and demanding side for each transaction is known, which allows me to evaluate the ability of the algorithms to recover this information.

The NASDAQ data, of course, do not contain the same number of trades and quote changes as, for example, the consolidated tape and possibly other high-frequency databases.6 This is, however, not a problem per se as we are interested in the effect of high order submission and cancellation rates relative to the data timestamp precision. To simulate this problem, I truncate the timestamp precision of the NASDAQ data at various frequencies.

I find that the new algorithm outperforms the competing classification algorithms. At every considered timestamp precision, the new algorithm does not perform worse than the others and it offers considerable improvement in classification accuracy at lower timestamp precisions. For the data timestamped to the second, the new algorithm correctly classifies the trade initiator for 95% of the trading volume, compared to 90% for the best competitor, the EMO algorithm. Importantly, I find that the interpolation of timestamps considerably decreases the classification accuracy for the LR, EMO, and CLNV algorithms compared to the traditional approach of working with the last quote before the time of the trade based on the original timestamp. In addition, the BVC algorithm generally performs worse than the traditional approaches.

To give the improvement in classification accuracy more economic meaning, I apply the trade classification methods to the estimation of transaction costs. The transaction costs in turn are used in a portfolio optimization exercise. The results show that an investor with a mean-variance utility function would be willing to forgo up to 33 bps on yearly returns to use the proposed algorithm to estimate transaction costs instead of the LR algorithm.7

The improved accuracy of the proposed algorithm derives from using order book dynamics to identify trade-quote correspondences. The extent to which it can do so depends on the precise data structure, which will vary with different datasets. To demonstrate how improvements can still be achieved under very limited order book information, I repeat the analysis with noise added to transaction and quote timestamps. The setup is chosen to resemble what one might encounter in data from a consolidated tape with different latencies from the various exchanges to the tape. The algorithm continues to outperform existing methods when adapted to this new data structure.

The remainder of the paper is structured as follows. In Section 2, I introduce the traditional algorithms, as well the proposed full-information algorithm. In Section 3, I present the NASDAQ TotalView-ITCH data used in the following exercises. The main results on the classification accuracy obtained under the most accurate and granular data structure are presented in Section 4. In Section 5, I introduce adjustments to the proposed algorithm for a less granular and accurate data structure, as one might encounter when working with data from a consolidated tape, and presents results on the classification accuracy. In Section 6, the various algorithms are utilized to optimize portfolios under transaction costs. In Section 7, I present a separate comparison to the BVC algorithm, while in Section 8, I outline a few limitation of this study related to the external validity of the results presented here. Finally, I conclude in Section 9.

Section snippets

The LR, EMO, and CLNV decision rules

The LR algorithm compares the transaction price to the mid-point of the ask and bid quote at the time the trade took place. If the transaction price is greater (smaller) than the mid-point, the trade is buyer-(seller-)initiated. If the transaction price is equal to the mid-point, the trade initiator is assigned according to the tick-test: if the transaction price is greater (smaller) than the last price that is not equal to the current transaction price, the trade is buyer-(seller-)initiated.

Data

The evaluation of the algorithms is based on equity trading data from NASDAQ's electronic limit order book constructed from NASDAQ's TotalView-ITCH data.9 The

Classification accuracy at different timestamp precisions

I analyze the improvements than can be achieve over the traditional algorithms by applying them and my proposed full-information (FI) algorithm to the data with varying timestamp precisions. The traditional algorithms are used in combination with the quote matching rule of using the last quotes from before the time of the trade (denoted by LR, EMO, and CLNV), and using the interpolated time of trades and quotes (denoted by LRi, EMOi, and CLNVi) as proposed by Holden and Jacobsen (2014). The

Data structure

So far, I have assumed the same level of data granularity (summarized in Data Structure 1) that is provided by the reconstructed limit order book from the NASDAQ TotalView-ITCH data. The advantage of the FI algorithm over the traditional approaches feeds on the use of information offered from this granularity. In this section, I relax the assumptions in Data Structure 1 and present appropriate adjustments to the algorithm.

Specifically, I assume:

Data Structure 2. Aggregated Quote Changes and

Portfolio optimization under transaction costs

To provide an example of how differences in trade classification can influence the results in an application, I use the competing algorithms (excluding the interpolation method) to estimate transaction costs. These estimates in turn are used in a portfolio optimization exercise assuming an investor with mean-variance utility function.15 The differences in the investor's utility obtained under the different

Comparison to BVC

The bulk volume classification algorithm (BVC) of Easley et al. (2012) is not directly targeted at estimating the trade initiator as it is believed that the trade initiator only insufficiently reflects the trading intention in today's markets. Instead it is designed to better link the order imbalance with the information contained in the order flow. The measure has become a popular choice in the VPIN literature.

Despite the different focus of the BVC algorithm, the output produced by it can

Limitations

The ultimate test for the usefulness of the FI algorithm would be to test its external validity by applying the algorithm to the DTAQ data with the transactions from the tape matched to those from NASDAQ's ITCH data as in Chakrabarty et al. (2015). Such an exercise would allow me to evaluate the FI algorithm's performance for one of the most commonly used data sources for research in equities. Similarly, replicating studies that use traditional algorithms would allow me to evaluate the impact

Conclusion

The reliability of traditional trade classification algorithms that identify the trade direction of the liquidity demander in transaction data has been questioned in light of the increase in trading and quotation frequency. This paper, therefore, proposes a new method, the full-information algorithm, and examines its performance against traditional algorithms, such as the Lee-Ready algorithm, and more recent alternatives: the bulk volume classification algorithm of Easley et al. (2012) and the

References (36)

  • R.D. Huang et al.

    Dealer versus auction markets: a paired comparison of execution costs on NASDAQ and the NYSE

    J. Financ. Econ.

    (1996)
  • S. Kremer et al.

    Causes and consequences of short-term institutional herding

    J. Bank. Finance

    (2013)
  • O. Ledoit et al.

    A well-conditioned estimator for large-dimensional covariance matrices

    J. Multivariate Anal.

    (2004)
  • C.M. Lee et al.

    Inferring investor behavior: evidence from torq data

    J. Financ. Mark.

    (2000)
  • E.R. Odders-White

    On the occurrence and consequences of inaccurate trade classification

    J. Financ. Mark.

    (2000)
  • M. O'Hara

    High frequency market microstructure

    J. Financ. Econ.

    (2015)
  • M.A. Panayides et al.

    Bulk volume classification and information detection

    J. Bank. Finance

    (2019)
  • M. Perlin et al.

    On the performance of the tick test

    Q. Rev. Econ. Finance

    (2014)
  • Cited by (4)

    • Not so fast: Identifying and remediating slow and imprecise cryptocurrency exchange data

      2023, Finance Research Letters
      Citation Excerpt :

      As markets become faster and researchers increasingly rely on more granular data the raw output provided by many cryptocurrency exchanges may not be of sufficient quality for microstructure research. The problem of quote changes routinely exceeding the frequency of trade timestamps is well documented in other markets (see, for example, Jurkatis, 2022). While this may not have a large impact on asset pricing studies using daily or even lower frequency data, it can affect microstructure related research and change research inferences (Holder and Jacobsen, 2014).

    I would like to thank the editor, two anonymous referees, as well as Evangelos Benos, Bidisha Chakrabarty, Geoff Coppins, Emanuel Gasteiger, Dieter Nautz, Nicholas Vause, Gavin Wallis, Georg Weizsäcker and the Seminar/Conference participants at the Bank of England, Freie Universität Berlin, Humboldt University, the CFE-CMStatistic Conference, the CEF Conference, the FMND Workshop and the YFS Conference for their valuable comments. Any views expressed are solely those of the author and so cannot be taken to represent those of Bank of England or to state Bank of England policy. This paper should therefore not be reported as representing the views of the Bank of England or members of the Monetary Policy Committee, Financial Policy Committee or Prudential Regulation Committee.

    View full text