Abstract
We propose a method for imputation of missing values in large scale matrix data based on a low-rank tensor approximation technique called the block tensor train (BTT) decomposition. Given sparsely observed data points, the proposed method iteratively computes the singular value decomposition (SVD) of the underlying data matrix with missing values. The SVD of the matrices is performed based on a low-rank BTT decomposition, by which storage and time complexities can be reduced dramatically for large-scale data matrices admitting a low-rank tensor structure. An iterative soft-thresholding algorithm is implemented for missing data estimation based on an alternating least squares method for BTT decomposition. Experimental results on simulated data and real benchmark data demonstrate that the proposed method can estimate a large amount of missing values accurately compared to a matrix-based standard method. The R source code of the BTT-based imputation method is available at https://github.com/namgillee/BTTSoftImpute.
Similar content being viewed by others
References
Acar E, Dunlavy DM, Kolda TG, Mørup M (2011) Scalable tensor factorizations for incomplete data. Chemom Intell Lab Syst 106(1):41–56
Batselier K, Yu W, Daniel L, Wong N (2018) Computing low-rank approximations of large-scale matrices with the tensor network randomized SVD. SIAM J Matrix Anal Appl 39(3):1221–1244
Bengua JA, Phien HN, Tuan HD, Do MN (2017) Efficient tensor completion for color image and video recovery: low-rank tensor train. IEEE Trans Image Process 26(5):2466–2479
Bennett J, Lanning, S (2007) The Netflix prize. In: Proceedings of KDD cup and workshop 2007. www.netflixprize.com
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Candes EJ, Plan Y (2010) Matrix completion with noise. Proc IEEE 98(6):925–936
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9:717–772
Candès EJ, Tao T (2009) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inf Theory 56(5):2053–2080
Carroll JD, Chang J-J (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35(3):283–319
Chen Y-L, Hsu C-T, Liao H-YM (2014) Simultaneous tensor decomposition and completion using factor priors. IEEE Trans Pattern Anal Mach Intell 36(3):577–591
Da Silva C, Herrmann FJ (2013) Hierarchical Tucker tensor optimization—applications to tensor completion. In: Proceedings of the 10th international conference on sampling theory and applications, pp 384–387
Debals O, De Lathauwer, L (2015) Stochastic and deterministic tensorization for blind signal separation. In: Vincent E, Yeredor A, Koldovsky Z, Tichavský P (eds) Proceedings of the 12th international conference on latent variable analysis and signal separation, pp 3–13. Springer International Publishing
Dolgov SV, Savostyanov DV (2014) Alternating minimal energy methods for linear systems in higher dimensions. SIAM J Sci Comput 36(5):A2248–A2271
Dolgov SV, Khoromskij BN, Oseledets IV (2012) Fast solution of parabolic problems in the tensor train/quantized tensor train format with initial application to the Fokker-Planck equation. SIAM J Sci Comput 34(6):A3016–A3038
Dolgov SV, Khoromskij BN, Oseledets IV, Savostyanov DV (2014) Computation of extreme eigenvalues in higher dimensions using block tensor train format. Comput Phys Commun 185(4):1207–1216
Enders CK (2010) Applied missing data analysis. Guilford Press, New York
Falcó A, Hackbusch W (2012) On minimal subspaces in tensor representations. Found Comput Math 12(6):765–803
Fazel M (2002) Matrix rank minimization with applications. PhD thesis. Stanford University, Stanford
Filipović M, Jukić A (2015) Tucker factorization with missing data with application to low-$n$-rank tensor completion. Multidimens Syst Signal Process 26(3):677–692
Friedland S, Lim L-H (2018) Nuclear norm of higher-order tensors. Math Comput 87:1255–1281
Gabriel K, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21:236–246
Gandy S, Recht B, Yamada I (2011) Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl 27(2):025010
Gillis N, Glineur F (2011) Low-rank matrix approximation with weights or missing data is NP-hard. SIAM J Matrix Anal Appl 32(4):1149–1165
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60(1):549–576
Grasedyck L (2010) Hierarchical singular value decomposition of tensors. SIAM J Matrix Anal Appl 31(4):2029–2054
Grasedyck L, Kressner D, Tobler C (2013) A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen 36:53–78
Grasedyck L, Kluge M, Krämer S (2015) Variants of alternating least squares tensor completion in the tensor train format. SIAM J Sci Comput 37(5):A2424–A2450
Guillemot C, Le Meur O (2014) Image inpainting: overview and recent advances. IEEE Signal Process Mag 31(1):127–144
Guo X, Ma Y (2015) Generalized tensor total variation minimization for visual data recovery. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3603–3611
Hackbusch W (2012) Tensor spaces and numerical tensor calculus. Springer, Berlin
Hackbusch W, Kühn S (2009) A new scheme for the tensor representation. J Fourier Anal Appl 15(5):706–722
Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, vol, 16, pp 1–84. http://publish.uwo.ca/~harshman/wpppfac0.pdf
Holtz S, Rohwedder T, Schneider R (2012a) The alternating linear scheme for tensor optimization in the tensor train format. SIAM J Sci Comput 34(2):A683–A713
Holtz S, Rohwedder T, Schneider R (2012b) On manifolds of tensors of fixed tt-rank. Numerische Mathematik 120(4):701–731
Huber B, Schneider R, Wolf S (2017) A randomized tensor train singular value decomposition. In: Boche H, Caire G, Calderbank R, März M, Kutyniok G, Mathar R (eds) Compressed sensing and its applications. Applied and numerical harmonic analysis. Birkhäuser, Cham, pp 261–290
Jannach D, Zanker M, Felfernig A, Friedrich G (2010) Recommender systems: an introduction, 1st edn. Cambridge University Press, Cambridge. ISBN 0521493366
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Karlsson L, Kressner D, Uschmajew A (2016) Parallel algorithms for tensor completion in the CP format. Parallel Comput 57:222–234
Kasai H, Mishra B (2016) Low-rank tensor completion: a Riemannian manifold preconditioning approach. In: Balcan MF, Weinberger KQ (eds) Proceedings of the 33rd international conference on machine learning, volume 48 of Proceedings of machine learning research (PMLR), pp 1012–1021, New York, NY, USA
Khoromskij BN (2011) ${O}(d{\rm log}~N)$-quantics approximation of ${N}$-$d$ tensors in high-dimensional numerical modeling. Constr Approx 34(2):257–280
Khoromskij BN (2012) Tensors-structured numerical methods in scientific computing: survey on recent advances. Chemometr Intell Lab Syst 110:1–19
Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2):251–266
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51:455–500
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37
Kressner D, Steinlechner M, Uschmajew A (2014a) Low-rank tensor methods with subspace correction for symmetric eigenvalue problems. SIAM J Sci Comput 36(5):A2346–A2368
Kressner D, Steinlechner M, Vandereycken B (2014b) Low-rank tensor completion by Riemannian optimization. BIT Numer Math 54(2):447–468
Lebedeva OS (2011) Tensor conjugate-gradient-type method for Rayleigh quotient minimization in block QTT-format. Russ J Numer Anal Math Model 26:465–489
Lee N, Cichocki A (2015) Estimating a few extreme singular values and vectors for large-scale matrices in tensor train format. SIAM J Matrix Anal Appl 36(3):994–1014
Lee N, Cichocki A (2016) Regularized computation of approximate pseudoinverse of large matrices using low-rank tensor train decompositions. SIAM J Matrix Anal Appl 37(2):598–623
Lee N, Cichocki A (2018) Fundamental tensor operations for large-scale data analysis using tensor network formats. Multidimens Syst Signal Process 29(3):921–960
Liu J, Musialski P, Wonka P, Ye J (2013) Tensor completion for estimating missing values in visual data. IEEE Trans Pattern Anal Mach Intell 35(1):208–220
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res 11:2287–2322
Oseledets I, Tyrtyshnikov E (2009) Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J Sci Comput 31(5):3744–3759
Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317
Oseledets IV, Dolgov SV (2012) Solution of linear systems and matrix inversion in the TT-format. SIAM J Sci Comput 34(5):A2718–A2739
Rai P, Wang Y, Guo S, Chen G, Dunson D, Carin L (2014) Scalable Bayesian low-rank decomposition of incomplete multiway tensors. In Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, volume 32 of Proceedings of machine learning research (PMLR), pp 1800–1808
Rauhut H, Schneider R, Stojanac Ž (2015) Tensor completion in hierarchical tensor representations. In: Boche H, Calderbank R, Kutyniok G, Vybíral J (eds) Compressed sensing and its applications. Applied and numerical harmonic analysis. Birkhäuser, Cham, pp 419–450
Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev 52(3):471–501
Ricci F, Rokach L, Shapira B (2015) Recommender systems handbook, 2nd edn. Springer, Boston
Saad Y (2011) Numerical methods for large eigenvalue problems. Classics in applied mathematics. SIAM, Philadelphia, vol 66 (revised edition)
Steinlechner M (2016) Riemannian optimization for high-dimensional tensor completion. SIAM J Sci Comput 38(5):S461–S484
Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311
van Buuren S (2012) Flexible imputation of missing data. Interdisciplinary statistics series. Chapman and Hall/CRC, New York
Vervliet N, Debals O, Sorber L, De Lathauwer L (2014) Breaking the curse of dimensionality using decompositions of incomplete tensors: tensor-based scientific computing in big data analysis. IEEE Signal Process Mag 31(5):71–79
Yamaguchi Y, Hayashi K (2017) Tensor decomposition with missing indices. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI-17), pp 3217–3223
Yokota T, Cichocki A (2016) Tensor completion via functional smooth component deflation. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2514–2518
Yokota T, Zhao Q, Cichocki A (2016) Smooth PARAFAC decomposition for tensor completion. IEEE Trans Signal Process 64(20):5423–5436
Yuan M, Zhang C-H (2016) On tensor completion via nuclear norm minimization. Found Comput Math 16(4):1031–1068
Yuan L, Zhao Q, Cao J (2017) Completion of high order tensor data with missing entries via tensor-train decomposition. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy ESM (eds) Neural Inf Process. ICONIP 2017, volume 10634 of Lecture notes in computer science. Springer, Cham, pp 222–229
Zhao Q, Zhang L, Cichocki A (2015a) Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Trans Pattern Anal Mach Intell 37(9):1751–1763
Zhao Q, Zhang L, Cichocki A (2015b) Bayesian sparse Tucker models for dimension reduction and tensor completion. arXiv:1505.02343
Zhao Q, Zhou G, Zhang L, Cichocki A, Amari SI (2016) Bayesian robust tensor factorization for incomplete multiway data. IEEE Trans Neural Netw Learn Syst 27(4):736–748
Acknowledgements
This study was supported by 2017 Research Grant from Kangwon National University and by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2017R1C1B5076912).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, N., Kim, JM. Block tensor train decomposition for missing data estimation. Stat Papers 59, 1283–1305 (2018). https://doi.org/10.1007/s00362-018-1043-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-018-1043-8