Skip to main content
Log in

Extracting a low-dimensional predictable time series

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

Large scale multi-dimensional time series can be found in many disciplines, including finance, econometrics, biomedical engineering, and industrial engineering systems. It has long been recognized that the time dependent components of the vector time series often reside in a subspace, leaving its complement independent over time. In this paper we develop a method for projecting the time series onto a low-dimensional time-series that is predictable, in the sense that an auto-regressive model achieves low prediction error. Our formulation and method follow ideas from principal component analysis, so we refer to the extracted low-dimensional time series as principal time series. In one special case we can compute the optimal projection exactly; in others, we give a heuristic method that seems to work well in practice. The effectiveness of the method is demonstrated on synthesized and real time series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Absil PA, Mahony R, Sepulchre R (2009) Optimization algorithms on matrix manifolds. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Ahn SK, Reinsel GC (1988) Nested reduced-rank autoregressive models for multiple time series. J Am Stat Assoc 83(403):849–856

    MathSciNet  MATH  Google Scholar 

  • Alquier P, Bertin K, Doukhan P, Garnier R (2020) High-dimensional VAR with low-rank transition. Stat Comput 30(4):1139–1153. https://doi.org/10.1007/s11222-020-09929-7

    Article  MathSciNet  MATH  Google Scholar 

  • Amengual D, Watson MW (2007) Consistent estimation of the number of dynamic factors in a large \(N\) and \(T\) panel. J Bus Econ Stat 25(1):91–96

    Article  MathSciNet  Google Scholar 

  • Angelosante D, Roumeliotis SI, Giannakis GB (2009) Lasso-Kalman smoother for tracking sparse signals. In: 2009 Conference record of the forty-third asilomar conference on signals, systems and computers, IEEE, pp 181–185

  • Bai J, Ng S (2007) Determining the number of primitive shocks in factor models. J Bus Econ Stat 25(1):52–60

    Article  MathSciNet  Google Scholar 

  • Bai J, Ng S (2008) Large dimensional factor analysis. Found Trend Reg Econ 3(2):89–163

    Google Scholar 

  • Barratt S, Dong Y, Boyd S (2021) Low rank forecasting. arXiv preprint arXiv:210112414

  • Basu S, Li X, Michailidis G (2019) Low rank and structured modeling of high-dimensional vector autoregressions. IEEE Trans Sig Process 67(5):1207–1222

    Article  MathSciNet  Google Scholar 

  • Box GE, Tiao GC (1977) A canonical analysis of multiple time series. Biometrika 64(2):355–365

    Article  MathSciNet  Google Scholar 

  • Brillinger DR (1981) Time series: data analysis and theory, Expanded. Holden-Day Inc, New York

    MATH  Google Scholar 

  • Charles A, Asif MS, Romberg J, Rozell C (2011) Sparsity penalties in dynamical system estimation. In: 2011 45th annual conference on information sciences and systems, IEEE, pp 1–6

  • Chen S, Liu K, Yang Y, Xu Y, Lee S, Lindquist M, Caffo BS, Vogelstein JT (2017) An M-estimator for reduced-rank system identification. Pattern Recognit Lett 86:76–81

    Article  Google Scholar 

  • Choi I (2012) Efficient estimation of factor models. Econ Theory 28(2):274–308

    Article  MathSciNet  Google Scholar 

  • Clark DG, Livezey JA, Bouchard KE (2019) Unsupervised discovery of temporal structure in noisy data with dynamical components analysis. arXiv preprint arXiv:190509944

  • Connor G, Korajczyk RA (1986) Performance measurement with the arbitrage pricing theory: a new framework for analysis. J Financ Econ 15(3):373–394

    Article  Google Scholar 

  • DelSole T (2001) Optimally persistent patterns in time-varying fields. J Atmosph Sci 58(11):1341–1356

    Article  MathSciNet  Google Scholar 

  • DelSole T, Tippett MK (2009a) Average predictability time: part i–theory. J Atmosph Sci 66(5):1172–1187

    Article  Google Scholar 

  • DelSole T, Tippett MK (2009b) Average predictability time: part ii–Seamless diagnoses of predictability on multiple time scales. J Atmosph Sci 66(5):1188–1204

    Article  Google Scholar 

  • Dong Y, Qin SJ (2018a) Dynamic latent variable analytics for process operations and control. Comput Chem Eng 114:69–80

    Article  Google Scholar 

  • Dong Y, Qin SJ (2018b) A novel dynamic pca algorithm for dynamic data modeling and process monitoring. J Process Control 67:1–11

    Article  Google Scholar 

  • Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353

    Article  MathSciNet  Google Scholar 

  • Forni M, Hallin M, Lippi M, Reichlin L (2000) The generalized dynamic-factor model: identification and estimation. Rev Econ Stat 82(4):540–554

    Article  Google Scholar 

  • Goerg G (2013) Forecastable component analysis. In: International conference on machine learning, pp 64–72

  • Kost O, Duník J, Straka O (2018) Correlated noise characteristics estimation for linear time-varying systems. In: 2018 IEEE Conference on Decision and Control (CDC), IEEE, pp 650–655

  • Lam C, Yao Q (2012) Factor modeling for high-dimensional time series: inference for the number of factors. Ann Stat 40(2):694–726

    Article  MathSciNet  Google Scholar 

  • Lam C, Yao Q, Bathia N (2011) Estimation of latent factors for high-dimensional time series. Biometrika 98(4):901–918

    Article  MathSciNet  Google Scholar 

  • Larimore WE (1983) System identification, reduced-order filtering and modeling via canonical variate analysis. In: 1983 American Control Conference, IEEE, pp 445–451

  • Lin J, Michailidis G (2020) System identification of high-dimensional linear dynamical systems with serially correlated output noise components. IEEE Trans Sig Process 68:5573–5587

    Article  MathSciNet  Google Scholar 

  • Melnyk I, Banerjee A (2016) Estimating structured vector autoregressive models. In: Proc. Intl. Conf. Machine Learning, pp 830–839

  • Moonen M, De Moor B, Vandenberghe L, Vandewalle J (1989) On-and off-line identification of linear state-space models. Int J Control 49(1):219–232

    Article  MathSciNet  Google Scholar 

  • Pan J, Yao Q (2008) Modelling multiple time series via common factors. Biometrika 95(2):365–379

    Article  MathSciNet  Google Scholar 

  • Pena D, Box GE (1987) Identifying a simplifying structure in time series. J Am Stat Assoc 82(399):836–843

    MathSciNet  MATH  Google Scholar 

  • Peña D, Yohai VJ (2016) Generalized dynamic principal components. J Am Stat Assoc 111(515):1121–1131

    Article  MathSciNet  Google Scholar 

  • Peña D, Smucler E, Yohai VJ (2019) Forecasting multiple time series with one-sided dynamic principal components. J Am Stat Assoc. https://doi.org/10.1080/01621459.2018.1520117

    Article  MathSciNet  MATH  Google Scholar 

  • Qin SJ, Dong Y, Zhu Q, Wang J, Liu Q (2020) Bridging systems theory and data science: a unifying review of dynamic latent variable analytics and process monitoring. Ann Rev Control 50:29

    Article  MathSciNet  Google Scholar 

  • Reinsel G (1983) Some results on multivariate autoregressive index models. Biometrika 70(1):145–156

    Article  MathSciNet  Google Scholar 

  • Richthofer S, Wiskott L (2015) Predictable feature analysis. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), IEEE, pp 190–196

  • She Q, Gao Y, Xu K, Chan R (2018) Reduced-rank linear dynamical systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32

  • Shumway RH, Stoffer DS (1982) An approach to time series smoothing and forecasting using the EM algorithm. J Time Ser Anal 3(4):253–264

    Article  Google Scholar 

  • Stock JH, Watson M (2011) Dynamic factor models. Oxford handbook on economic forecasting

  • Stock JH, Watson MW (2006) Forecasting with many predictors. Handb Econ Forecast 1:515–554

    Article  Google Scholar 

  • Stone JV (2001) Blind source separation using temporal predictability. Neural Comput 13(7):1559–1574

    Article  Google Scholar 

  • Tatum WO (2014) Ellen R. Grass lecture: extraordinary EEG. Neurodiagnostic J 54(1):3–21

    Google Scholar 

  • Teplan M (2002) Fundamentals of EEG measurement. Measurement Sci Rev 2(2):1–11

    Google Scholar 

  • Thornhill NF, Hägglund T (1997) Detection and diagnosis of oscillation in control loops. Control Eng Pract 5(10):1343–1354

    Article  Google Scholar 

  • Thornhill NF, Huang B, Zhang H (2003) Detection of multiple oscillations in control loops. J Process control 13(1):91–100

    Article  Google Scholar 

  • Usevich K, Markovsky I (2014) Optimization on a Grassmann manifold with application to system identification. Automatica 50(6):1656–1662

    Article  MathSciNet  Google Scholar 

  • Van Overschee P, De Moor B (1993) Subspace algorithms for the stochastic identification problem. Automatica 29(3):649–660

    Article  MathSciNet  Google Scholar 

  • Velu RP, Reinsel GC, Wichern DW (1986) Reduced rank models for multiple time series. Biometrika 73(1):105–118

    Article  MathSciNet  Google Scholar 

  • Wang Z, Bessler DA (2004) Forecasting performance of multivariate time series models with full and reduced rank: An empirical examination. Int J Forecast 20(4):683–695

    Article  Google Scholar 

  • Weghenkel B, Fischer A, Wiskott L (2017) Graph-based predictable feature analysis. Mach Learn 106(9–10):1359–1380

    Article  MathSciNet  Google Scholar 

  • Wiskott L, Sejnowski TJ (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput 14(4):715–770

    Article  Google Scholar 

Download references

Acknowledgements

We would like to express our appreciation to Professor Peter Stoica for his valuable and constructive suggestions during the preparation of this paper. We also thank Peter Nystrup for pointing us to related work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yining Dong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A derivation of (6)

Appendix A derivation of (6)

We show how to derive the expression (6) in this appendix. For simplicity, we ignore the superscript \(k+1\) in \(A^{k+1}\), \(A_i^{k+1}\), \(i=1,\ldots ,M\), and \(S_\tau ^{k+1}\), \(\tau \in {\mathbf{Z}}\), and the superscript k in \(W^k\).

When A is fixed, we have

$$\begin{aligned} \begin{array}{ll} f(w) {}= \mathop \mathbf{Tr}\left( -2A\begin{bmatrix}S_1 \\ S_2 \\ \vdots \\ S_M\end{bmatrix} +A\begin{bmatrix}S_0 {} S_1^T {} \cdots {} S_{M-1}^T\\ S_{1} {} S_0 {} \cdots {} S_{M-2}^T\\ \vdots {} \vdots {} \ddots {}\vdots \\ S_{M-1} {} S_{M-2} {} \cdots {} S_0 \end{bmatrix}A^T\right) \\ {}=-2\sum _{i=1}^M\mathop \mathbf{Tr}(A_iS_i) \\ {}\quad + \mathop \mathbf{Tr}\begin{bmatrix}S_0 {} S_1^T {} \cdots {} S_{M-1}^T\\ S_{1} {} S_0 {} \cdots {}S_{M-2}^T\\ \vdots {} \vdots {} \ddots {}\vdots \\ S_{M-1} {} S_{M-2} {} \cdots {} S_0 \end{bmatrix}\begin{bmatrix} A_1^TA_1 {} A_1^TA_2 {} \cdots {} A_1^TA_M\\ A_2^TA_1 {} A_2^TA_2 {} \cdots {} A_2^TA_M\\ \vdots {} \vdots {} \ddots {}\vdots \\ A_M^TA_1 {} A_M^TA_2 {} \cdots {} A_M^TA_M\end{bmatrix}. \end{array} \end{aligned}$$

We divide \(A_i\), \(i=1,2,\ldots ,M\) into the following submatrices,

$$\begin{aligned} A_i = \begin{bmatrix} A_{i,11} &{} A_{i,12} \\ A_{i,21} &{} A_{i,22} \end{bmatrix} \quad \text {for} \; i = 1,2,\ldots ,M, \end{aligned}$$

where \(A_{i,11} \in {\mathbf{R}}^{k\times k}\), \(A_{i,12} \in {\mathbf{R}}^{k\times 1}\), \(A_{i,21} \in {\mathbf{R}}^{1\times k}\), \(A_{i,22} \in {\mathbf{R}}\). With this notation, we can expand \(\mathop \mathbf{Tr}(A_iS_i)\) as

$$\begin{aligned} \begin{array}{ll} \mathop \mathbf{Tr}(A_iS_i) {}= \mathop \mathbf{Tr}\begin{bmatrix}A_{i,11} {} A_{i,12} \\ A_{i,21} {} A_{i,22} \end{bmatrix} \begin{bmatrix}W^T\Sigma _iW {} W^T\Sigma _iw \\ w^T \Sigma _i W {} w^T\Sigma _iw \end{bmatrix}\\ {}= w^T(A_{i,22}\Sigma _i)w + (\Sigma _iWA_{i,12}+\Sigma _i^TW A_{i,21}^T)^Tw + d, \end{array} \end{aligned}$$

where d is a constant. For the second term in f(w), we have

$$\begin{aligned} \begin{array}{ll} {}\mathop \mathbf{Tr}\begin{bmatrix}S_0 {} S_1^T {} \cdots {} S_{M-1}^T\\ S_{1} {} S_0 {} \cdots {} S_{M-2}^T\\ \vdots {} \vdots {} \ddots {}\vdots \\ S_{M-1} {} S_{M-2} {} \cdots {} S_0 \end{bmatrix}\begin{bmatrix}A_1^TA_1 & {} A_1^TA_2 {} \cdots {} A_1^TA_M\\ A_2^TA_1 {} A_2^TA_2 {} \cdots {} A_2^TA_M\\ \vdots {} \vdots {} \ddots {}\vdots \\ A_M^TA_1 {} A_M^TA_2 {} \cdots {} A_M^TA_M\end{bmatrix} \\ {}=\mathop \mathbf{Tr}(S_0A_1^TA_1+S_1^TA_2^TA_1+\cdots +S_{M-1}^TA_M^TA_1) + \mathop \mathbf{Tr}(S_1A_1^TA_2+S_0A_2^TA_2+\cdots \\ \quad{}+S_{M-2}^TA_M^TA_2) + \cdots + \mathop \mathbf{Tr}(S_{M-1}A_1^TA_M+S_{M-2}^TA_2^TA_M+\cdots +S_0 A_M^TA_M) \\ {}= \sum _{i, j}S_{j-i}A_i^TA_j, \end{array} \end{aligned}$$

where \(\mathop \mathbf{Tr}(S_{j-i}A_i^TA_j)\) can be expanded as

$$\begin{aligned} \begin{array}{ll}&\mathop \mathbf{Tr}(S_{j-i}A_i^TA_j) \\ {}= \begin{bmatrix}W^T\Sigma _{j-i}W {} W^T \Sigma _{j-i}w\\ w^T\Sigma _{j-i}W {} w^T\Sigma _{j-i}w \end{bmatrix} \begin{bmatrix} A_{i,11}^TA_{j,11} +A_{i,21}^TA_{j,21} {} A_{i,11}^T A_{j,12}+A_{i,21}^TA_{j,22}\\ A_{i,12}^TA_{j,11}+A_{i,22}A_{j,21} {} A_{i,12}^TA_{j,12} + A_{i,22}A_{j,22}\end{bmatrix}\\ {}= (A_{i,12}^TA_{j,11}+A_{i,22}A_{j,21})W^T\Sigma _{j-i}w + (A_{i,11}^TA_{j,12}+A_{i,21}^TA_{j,22})^TW^T\Sigma _{j-i}^Tw\\ \quad{}+ w^T(A_{i,12}^TA_{j,12}+A_{i,22}A_{j,22})\Sigma _{j-i}w. \end{array} \end{aligned}$$

Summing all terms, we can obtain the following expression for f(w),

$$\begin{aligned} f(w) = w^TBw - 2c^Tw+d, \end{aligned}$$

where d is a constant and

$$\begin{aligned} \begin{array}{ll} B &{}= \sum \limits _{1 \le i,j \le M} (A_{i,12}^TA_{j,12}+A_{i,22}A_{j,22})\Sigma _{j-i} - \sum \limits _{i=1}^M A_{i,22}(\Sigma _i+\Sigma _i^T), \\ c &{}= \sum \limits _{i=1}^M(\Sigma _iWA_{i,12}+\Sigma _i^TWA_{i,21}^T) - \sum \limits _{1\le i< j \le M}\Sigma _{j-i}^TW(A_{j,11}^TA_{i,12}+A_{i,22} A_{j,21}^T) \\ &{}\quad - \sum \limits _{1\le i < j \le M}\Sigma _{j-i}W(A_{i,11}^TA_{j,12}+A_{j,22} A_{i,21}^T). \end{array} \end{aligned}$$

The constant term can be ignored when we want to minimize f(w). It is easy to show that \(B \succ 0\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, Y., Qin, S.J. & Boyd, S.P. Extracting a low-dimensional predictable time series. Optim Eng 23, 1189–1214 (2022). https://doi.org/10.1007/s11081-021-09643-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11081-021-09643-x

Keywords

Navigation