Estimation of high-dimensional integrated covariance matrix based on noisy high-frequency data with multiple observations

https://doi.org/10.1016/j.spl.2020.108996Get rights and content

Abstract

In this paper, we study the estimation of integrated covariance matrix based on noisy high-frequency data with multiple transactions using random matrix theory. We further prove that the proposed estimator is also asymptotically optimal for portfolio selection.

Introduction

The estimation of population covariance matrix is a fundamental problem in statistics. It is well known that, even based on the i.i.d. (independent and identically distributed) samples, the sample covariance matrix is not a consistent estimator in high-dimensional setting when the dimension p and the sample size n go to infinity proportionally. Hence, a large number of studies have been worked on this problem. Another challenge to estimate covariance matrix is that the condition of i.i.d. may be too strong for practical use, especially in financial applications. For example, as the rapid development of computer science, the tick-by-tick high-frequency data have become increasingly available. It is commonly assumed that the latent log price process, which is denoted by (Xt), follows the following diffusion model dXt=μtdt+ΘtdWt,t[0,1],where Xt is a p-dimensional log price process, μt is a p-dimensional drift process, Θt is a p×p matrix, which is called covolatility process, and Wt is a p-dimensional standard Brownian motion. The interval [0,1] is the period of interest, say, one trading day. The interested covariation between asset returns is the so-called integrated covariance (ICV) matrix, which is defined as Σ=01ΘtΘtTdt. The ICV matrix plays a crucial role in financial applications, such as portfolio optimization and risk management. Motivated by the wide applicability of high-frequency data, in this paper, we consider the estimation of ICV matrix in high-dimensional setting.

The estimation of ICV matrix is very difficult. The first difficulty is high dimensionality. When the dimension p is of the same order of magnitude as the sample size n, it is impossible to estimate O(p2) free parameters from a data set of order p2. Hence, a special structure is usually assumed on covariance matrix, such as sparsity (e.g. Fan et al., 2013). When no particular structure is imposed, random matrix theory is an effective tool to analyze high-dimensional covariance matrices (see Bai and Silverstein, 2010 for details). The second difficulty is microstructure noise. In practice, the observed high-frequency data are always contaminated by the market microstructure noise, which is induced by various frictions in the trading process. The accumulated microstructure noise badly affects the statistical inference about the latent price process. Hence, the estimator of ICV matrix has to be built based on the noisy observations. The third difficulty is multiple transactions. With tick-by-tick transaction data, there are often more than one record within one recording time interval (see Figure 1 in the supplement for details). With the presence of multiple transactions, one issue is that the order information of transactions within each recording time interval is not available or incorrectly recorded; another issue is asynchronous trading, which means that different stocks are not traded synchronously during each time interval.

In this paper, we consider the estimation of high-dimensional ICV matrix based on noisy high-frequency data with multiple transactions. Using random matrix theory, we propose a nonlinear shrinkage estimator of ICV matrix by retaining the eigenvectors and nonlinearly shrinkaging the eigenvalues of generalized sample covariance matrix based on self-normalized returns. We show that the proposed estimator has two desirable properties: it eliminates both impacts of microstructure noise and multiple transactions, and its limiting nonlinear shrinkage function solely depends on the limiting spectral distribution of the generalized sample covariance matrix. For financial application, we further prove that our proposed estimator is asymptotically optimal for portfolio selection.

Notation

For any matrix A, we use AF=tr(AAT) to denote its Frobenius norm. For any Hermitian matrix A, the empirical spectral distribution (ESD) is defined as FA=1pi=1pI(λj(A)x),for all xR, where λj(A),i=1,,p are the eigenvalues of A and I() denotes the indicator function. The limit of ESD as p is referred to as the limiting spectral distribution (LSD), if it exists. The Stieltjes transform of a bounded variation function G is defined by mG(z)=1λzdG(λ),for all zSupp(G), where Supp(G) denotes the support of function G. For any vector x, |x| stands for its Euclidean norm and A is the spectral norm of A. Let diag(A) be a diagonal matrix by setting the non-diagonal entries of A to be zero.

Section snippets

Observations and main results

In practice, instead of the latent log price process (Xt), we have contaminated data Yt=Xt+εt, where (Xt) is given in (1) and (εt) denotes the noise process. In the presence of multiple transactions, Li(q) denotes the number of transactions for qth stock at recording time ti=in, for q=1,,p and i=1,,n. For any process (Vt) (can be either (Xt), (Yt) or (εt)), suppose that Vi,j(q) is the jth observation for qth stock during time interval (ti1,ti] with j=1,,Li(q), q=1,,p and i=1,,n. The

Proofs

Proof of Theorem 1

Let FS̆ be the LSD of S̆M, mS̆(z) be the corresponding Stieltjes transform, m̲S̆(z)=c1z+cmS̆(z) for all z+ and m̲̆S̆(ξ)=limz+ξm̲S̆(z). Define ΔS̆M(x)=1pi=1pŭM,iTΣ̆ŭM,iI(λS̆M,ix), where ŭM,i is the corresponding eigenvector to the ith largest eigenvalue λS̆M,i of S̆M. The existence of the above functions can be found in Theorem 2.3 of Wang et al. (2019) and Lemma 6.1 of Bai and Silverstein (2010).

Firstly, we show that the results in Theorem 1 hold when the notations based on SM are

Acknowledgments

The authors thank Yimin Xiao (the editor), the associate editor, and the anonymous referees for their helpful comments that improved the article significantly. Wang’s work is supported by National Natural Science Foundation of China (11871322) and Shanghai University of Finance and Economics Graduate Innovation Program Project Research Innovation Fund (CXJI-2018-411). Xia’s research is supported by the National Natural Science Foundation of China Grant 11871322.

References (9)

  • JacodJ. et al.

    Microstructure noise in the continuous case: the pre-averaging approach

    Stochastic Process. Appl.

    (2009)
  • BaiZ. et al.

    Spectral Analysis of Large Dimensional Random Matrices

    (2010)
  • FanJ. et al.

    Large covariance estimation by thresholding principal orthogonal complements

    J. R. Stat. Soc.

    (2013)
  • LedoitO. et al.

    Eigenvectors of some large sample covariance matrix ensembles

    Probab. Theory Related Fields

    (2011)
There are more references available in the full text version of this article.

Cited by (0)

View full text