Skip to main content
Log in

Block tensor train decomposition for missing data estimation

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

We propose a method for imputation of missing values in large scale matrix data based on a low-rank tensor approximation technique called the block tensor train (BTT) decomposition. Given sparsely observed data points, the proposed method iteratively computes the singular value decomposition (SVD) of the underlying data matrix with missing values. The SVD of the matrices is performed based on a low-rank BTT decomposition, by which storage and time complexities can be reduced dramatically for large-scale data matrices admitting a low-rank tensor structure. An iterative soft-thresholding algorithm is implemented for missing data estimation based on an alternating least squares method for BTT decomposition. Experimental results on simulated data and real benchmark data demonstrate that the proposed method can estimate a large amount of missing values accurately compared to a matrix-based standard method. The R source code of the BTT-based imputation method is available at https://github.com/namgillee/BTTSoftImpute.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Acar E, Dunlavy DM, Kolda TG, Mørup M (2011) Scalable tensor factorizations for incomplete data. Chemom Intell Lab Syst 106(1):41–56

    Article  Google Scholar 

  • Batselier K, Yu W, Daniel L, Wong N (2018) Computing low-rank approximations of large-scale matrices with the tensor network randomized SVD. SIAM J Matrix Anal Appl 39(3):1221–1244

    Article  MathSciNet  Google Scholar 

  • Bengua JA, Phien HN, Tuan HD, Do MN (2017) Efficient tensor completion for color image and video recovery: low-rank tensor train. IEEE Trans Image Process 26(5):2466–2479

    Article  MathSciNet  Google Scholar 

  • Bennett J, Lanning, S (2007) The Netflix prize. In: Proceedings of KDD cup and workshop 2007. www.netflixprize.com

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Candes EJ, Plan Y (2010) Matrix completion with noise. Proc IEEE 98(6):925–936

    Article  Google Scholar 

  • Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9:717–772

    Article  MathSciNet  Google Scholar 

  • Candès EJ, Tao T (2009) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inf Theory 56(5):2053–2080

    Article  MathSciNet  Google Scholar 

  • Carroll JD, Chang J-J (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35(3):283–319

    Article  Google Scholar 

  • Chen Y-L, Hsu C-T, Liao H-YM (2014) Simultaneous tensor decomposition and completion using factor priors. IEEE Trans Pattern Anal Mach Intell 36(3):577–591

    Article  Google Scholar 

  • Da Silva C, Herrmann FJ (2013) Hierarchical Tucker tensor optimization—applications to tensor completion. In: Proceedings of the 10th international conference on sampling theory and applications, pp 384–387

  • Debals O, De Lathauwer, L (2015) Stochastic and deterministic tensorization for blind signal separation. In: Vincent E, Yeredor A, Koldovsky Z, Tichavský P (eds) Proceedings of the 12th international conference on latent variable analysis and signal separation, pp 3–13. Springer International Publishing

  • Dolgov SV, Savostyanov DV (2014) Alternating minimal energy methods for linear systems in higher dimensions. SIAM J Sci Comput 36(5):A2248–A2271

    Article  MathSciNet  Google Scholar 

  • Dolgov SV, Khoromskij BN, Oseledets IV (2012) Fast solution of parabolic problems in the tensor train/quantized tensor train format with initial application to the Fokker-Planck equation. SIAM J Sci Comput 34(6):A3016–A3038

    Article  MathSciNet  Google Scholar 

  • Dolgov SV, Khoromskij BN, Oseledets IV, Savostyanov DV (2014) Computation of extreme eigenvalues in higher dimensions using block tensor train format. Comput Phys Commun 185(4):1207–1216

    Article  Google Scholar 

  • Enders CK (2010) Applied missing data analysis. Guilford Press, New York

    Google Scholar 

  • Falcó A, Hackbusch W (2012) On minimal subspaces in tensor representations. Found Comput Math 12(6):765–803

    Article  MathSciNet  Google Scholar 

  • Fazel M (2002) Matrix rank minimization with applications. PhD thesis. Stanford University, Stanford

  • Filipović M, Jukić A (2015) Tucker factorization with missing data with application to low-$n$-rank tensor completion. Multidimens Syst Signal Process 26(3):677–692

    Article  Google Scholar 

  • Friedland S, Lim L-H (2018) Nuclear norm of higher-order tensors. Math Comput 87:1255–1281

    Article  MathSciNet  Google Scholar 

  • Gabriel K, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21:236–246

    Article  Google Scholar 

  • Gandy S, Recht B, Yamada I (2011) Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl 27(2):025010

    Article  MathSciNet  Google Scholar 

  • Gillis N, Glineur F (2011) Low-rank matrix approximation with weights or missing data is NP-hard. SIAM J Matrix Anal Appl 32(4):1149–1165

    Article  MathSciNet  Google Scholar 

  • Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60(1):549–576

    Article  Google Scholar 

  • Grasedyck L (2010) Hierarchical singular value decomposition of tensors. SIAM J Matrix Anal Appl 31(4):2029–2054

    Article  MathSciNet  Google Scholar 

  • Grasedyck L, Kressner D, Tobler C (2013) A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen 36:53–78

    Article  MathSciNet  Google Scholar 

  • Grasedyck L, Kluge M, Krämer S (2015) Variants of alternating least squares tensor completion in the tensor train format. SIAM J Sci Comput 37(5):A2424–A2450

    Article  MathSciNet  Google Scholar 

  • Guillemot C, Le Meur O (2014) Image inpainting: overview and recent advances. IEEE Signal Process Mag 31(1):127–144

    Article  Google Scholar 

  • Guo X, Ma Y (2015) Generalized tensor total variation minimization for visual data recovery. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3603–3611

  • Hackbusch W (2012) Tensor spaces and numerical tensor calculus. Springer, Berlin

    Book  Google Scholar 

  • Hackbusch W, Kühn S (2009) A new scheme for the tensor representation. J Fourier Anal Appl 15(5):706–722

    Article  MathSciNet  Google Scholar 

  • Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, vol, 16, pp 1–84. http://publish.uwo.ca/~harshman/wpppfac0.pdf

  • Holtz S, Rohwedder T, Schneider R (2012a) The alternating linear scheme for tensor optimization in the tensor train format. SIAM J Sci Comput 34(2):A683–A713

    Article  MathSciNet  Google Scholar 

  • Holtz S, Rohwedder T, Schneider R (2012b) On manifolds of tensors of fixed tt-rank. Numerische Mathematik 120(4):701–731

    Article  MathSciNet  Google Scholar 

  • Huber B, Schneider R, Wolf S (2017) A randomized tensor train singular value decomposition. In: Boche H, Caire G, Calderbank R, März M, Kutyniok G, Mathar R (eds) Compressed sensing and its applications. Applied and numerical harmonic analysis. Birkhäuser, Cham, pp 261–290

    Chapter  Google Scholar 

  • Jannach D, Zanker M, Felfernig A, Friedrich G (2010) Recommender systems: an introduction, 1st edn. Cambridge University Press, Cambridge. ISBN 0521493366

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Karlsson L, Kressner D, Uschmajew A (2016) Parallel algorithms for tensor completion in the CP format. Parallel Comput 57:222–234

    Article  MathSciNet  Google Scholar 

  • Kasai H, Mishra B (2016) Low-rank tensor completion: a Riemannian manifold preconditioning approach. In: Balcan MF, Weinberger KQ (eds) Proceedings of the 33rd international conference on machine learning, volume 48 of Proceedings of machine learning research (PMLR), pp 1012–1021, New York, NY, USA

  • Khoromskij BN (2011) ${O}(d{\rm log}~N)$-quantics approximation of ${N}$-$d$ tensors in high-dimensional numerical modeling. Constr Approx 34(2):257–280

    Article  MathSciNet  Google Scholar 

  • Khoromskij BN (2012) Tensors-structured numerical methods in scientific computing: survey on recent advances. Chemometr Intell Lab Syst 110:1–19

    Article  Google Scholar 

  • Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2):251–266

    Article  MathSciNet  Google Scholar 

  • Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51:455–500

    Article  MathSciNet  Google Scholar 

  • Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37

    Article  Google Scholar 

  • Kressner D, Steinlechner M, Uschmajew A (2014a) Low-rank tensor methods with subspace correction for symmetric eigenvalue problems. SIAM J Sci Comput 36(5):A2346–A2368

    Article  MathSciNet  Google Scholar 

  • Kressner D, Steinlechner M, Vandereycken B (2014b) Low-rank tensor completion by Riemannian optimization. BIT Numer Math 54(2):447–468

    Article  MathSciNet  Google Scholar 

  • Lebedeva OS (2011) Tensor conjugate-gradient-type method for Rayleigh quotient minimization in block QTT-format. Russ J Numer Anal Math Model 26:465–489

    Article  MathSciNet  Google Scholar 

  • Lee N, Cichocki A (2015) Estimating a few extreme singular values and vectors for large-scale matrices in tensor train format. SIAM J Matrix Anal Appl 36(3):994–1014

    Article  MathSciNet  Google Scholar 

  • Lee N, Cichocki A (2016) Regularized computation of approximate pseudoinverse of large matrices using low-rank tensor train decompositions. SIAM J Matrix Anal Appl 37(2):598–623

    Article  MathSciNet  Google Scholar 

  • Lee N, Cichocki A (2018) Fundamental tensor operations for large-scale data analysis using tensor network formats. Multidimens Syst Signal Process 29(3):921–960

    Article  MathSciNet  Google Scholar 

  • Liu J, Musialski P, Wonka P, Ye J (2013) Tensor completion for estimating missing values in visual data. IEEE Trans Pattern Anal Mach Intell 35(1):208–220

    Article  Google Scholar 

  • Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res 11:2287–2322

    MathSciNet  MATH  Google Scholar 

  • Oseledets I, Tyrtyshnikov E (2009) Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J Sci Comput 31(5):3744–3759

    Article  MathSciNet  Google Scholar 

  • Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317

    Article  MathSciNet  Google Scholar 

  • Oseledets IV, Dolgov SV (2012) Solution of linear systems and matrix inversion in the TT-format. SIAM J Sci Comput 34(5):A2718–A2739

    Article  MathSciNet  Google Scholar 

  • Rai P, Wang Y, Guo S, Chen G, Dunson D, Carin L (2014) Scalable Bayesian low-rank decomposition of incomplete multiway tensors. In Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, volume 32 of Proceedings of machine learning research (PMLR), pp 1800–1808

  • Rauhut H, Schneider R, Stojanac Ž (2015) Tensor completion in hierarchical tensor representations. In: Boche H, Calderbank R, Kutyniok G, Vybíral J (eds) Compressed sensing and its applications. Applied and numerical harmonic analysis. Birkhäuser, Cham, pp 419–450

    Chapter  Google Scholar 

  • Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev 52(3):471–501

    Article  MathSciNet  Google Scholar 

  • Ricci F, Rokach L, Shapira B (2015) Recommender systems handbook, 2nd edn. Springer, Boston

    Book  Google Scholar 

  • Saad Y (2011) Numerical methods for large eigenvalue problems. Classics in applied mathematics. SIAM, Philadelphia, vol 66 (revised edition)

  • Steinlechner M (2016) Riemannian optimization for high-dimensional tensor completion. SIAM J Sci Comput 38(5):S461–S484

    Article  MathSciNet  Google Scholar 

  • Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311

    Article  MathSciNet  Google Scholar 

  • van Buuren S (2012) Flexible imputation of missing data. Interdisciplinary statistics series. Chapman and Hall/CRC, New York

    Book  Google Scholar 

  • Vervliet N, Debals O, Sorber L, De Lathauwer L (2014) Breaking the curse of dimensionality using decompositions of incomplete tensors: tensor-based scientific computing in big data analysis. IEEE Signal Process Mag 31(5):71–79

    Article  Google Scholar 

  • Yamaguchi Y, Hayashi K (2017) Tensor decomposition with missing indices. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI-17), pp 3217–3223

  • Yokota T, Cichocki A (2016) Tensor completion via functional smooth component deflation. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2514–2518

  • Yokota T, Zhao Q, Cichocki A (2016) Smooth PARAFAC decomposition for tensor completion. IEEE Trans Signal Process 64(20):5423–5436

    Article  MathSciNet  Google Scholar 

  • Yuan M, Zhang C-H (2016) On tensor completion via nuclear norm minimization. Found Comput Math 16(4):1031–1068

    Article  MathSciNet  Google Scholar 

  • Yuan L, Zhao Q, Cao J (2017) Completion of high order tensor data with missing entries via tensor-train decomposition. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy ESM (eds) Neural Inf Process. ICONIP 2017, volume 10634 of Lecture notes in computer science. Springer, Cham, pp 222–229

    Chapter  Google Scholar 

  • Zhao Q, Zhang L, Cichocki A (2015a) Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Trans Pattern Anal Mach Intell 37(9):1751–1763

    Article  Google Scholar 

  • Zhao Q, Zhang L, Cichocki A (2015b) Bayesian sparse Tucker models for dimension reduction and tensor completion. arXiv:1505.02343

  • Zhao Q, Zhou G, Zhang L, Cichocki A, Amari SI (2016) Bayesian robust tensor factorization for incomplete multiway data. IEEE Trans Neural Netw Learn Syst 27(4):736–748

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This study was supported by 2017 Research Grant from Kangwon National University and by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2017R1C1B5076912).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jong-Min Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, N., Kim, JM. Block tensor train decomposition for missing data estimation. Stat Papers 59, 1283–1305 (2018). https://doi.org/10.1007/s00362-018-1043-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-1043-8

Keywords

Navigation