Reshaped tensor nuclear norms for higher order tensor completion

Wimalawarne, Kishan; Mamitsuka, Hiroshi

doi:10.1007/s10994-020-05927-y

Reshaped tensor nuclear norms for higher order tensor completion

Published: 03 January 2021

Volume 110, pages 507–531, (2021)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Reshaped tensor nuclear norms for higher order tensor completion

Download PDF

1116 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We investigate optimal conditions for inducing low-rankness of higher order tensors by using convex tensor norms with reshaped tensors. We propose the reshaped tensor nuclear norm as a generalized approach to reshape tensors to be regularized by using the tensor nuclear norm. Furthermore, we propose the reshaped latent tensor nuclear norm to combine multiple reshaped tensors using the tensor nuclear norm. We analyze the generalization bounds for tensor completion models regularized by the proposed norms and show that the novel reshaping norms lead to lower Rademacher complexities. Through simulation and real-data experiments, we show that our proposed methods are favorably compared to existing tensor norms consolidating our theoretical claims.

Bolstering stochastic gradient descent with model building

Article Open access 15 April 2024

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Article 05 April 2024

1 Introduction

Tensor formatted data is becoming abundant in machine learning applications. Among the many tensor related machine learning problems, tensor completion has gained an increased popularity in recent years. Tensor completion performs imputation of unknown elements of a partially observed tensor by exploiting its low-rank structure. Some of the popular real-world applications of tensor completion are found in recommendation systems (Karatzoglou et al. 2010; Zheng et al. 2010), computer vision (Liu et al. 2009), and multi-relational link prediction (Rai et al. 2015). Though there exist many methods to perform tensor completion (Song et al. 2017), global optimal solutions are obtained mainly by convex low-rank tensor norms, making them an active area of research.

Over the years, many researchers have proposed different low-rank inducing norms to minimize the rank of tensors, however, none of these norms are universally better compared to others. The main challenge in designing norms for tensors is that they have multiple dimensions and different definitions of ranks (Tucker rank, CP rank, TT-rank), making it difficult for a single norm to induce low-rankness with respect to all the properties of tensors. Most tensor norms have been designed with a focus to a specific rank; overlapped trace norm (Tomioka and Suzuki 2013) and latent trace norms (Wimalawarne et al. 2014) to constrain the multilinear ranks, tensor nuclear norm (Yuan and Zhang 2016; Yang et al. 2015; Lim and Comon 2014) to constrain the CP rank, and the Schatten TT rank (Imaizumi et al. 2017) to constrain the TT-rank. However, targeting a specific rank to constrain may not always be practical, since we may not know the most suitable rank for a tensor in advance.

Most tensor norms reshape tensors by rearranging its elements as matrices to induce low-rankness with respect to a mode or a set of modes. However, this reshaping method is specific to obtaining relevant ranks that a norm constrains. An alternative view was presented by Mu et al. (2014) with the square norm, where the tensor is reshaped as a balanced matrix without considering the structure of its ranks. The square norm has shown to have better sample complexities for higher order tensors (tensor with more than three modes) than some of the existing norms such as the overlapped trace norm (Yuan and Zhang 2016). However, this norm only considers the special case of reshaping a tensor as a matrix such that its dimensions are close to each other. Other possibilities of how reshaping tensors beyond matrices affect the inducement of low-rankness have not been investigated.

In this paper, we propose generalized reshaping strategies to reshape tensors and develop low-rank inducing tensor norms. We demonstrate that reshaping a higher order tensor as another tensor and applying the tensor nuclear norm leads to better inducement of low-rankness compared to applying existing low-rank norms on the original tensor or its matrix unfoldings. Furthermore, we propose the latent reshaped tensor nuclear norm that combines multiple reshaped tensors to obtain a better performance among a set of reshaped tensors. Using the generalization bounds, we show that the proposed norms are able to give lower Rademacher complexities compared to existing norms. Using simulations and real-world data experiments we justify our theoretical analysis and show that our proposed methods are able to give better performances for tensor completion compared other convex norms.

Throughout this paper we use the following notations. We represent a K-mode (K-way) tensor as ${\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}$. The mode-k unfolding (Kolda and Bader 2009) of a tensor ${\mathcal {T}}$ is given by $T_{(k)} \in {\mathbb {R}}^{n_{k}\times \prod _{j \ne k}{n_{j}}}$, which is obtained by concatenating all slices along the mode-k. We indicate the tensor product (Hackbusch 2012) between vectors $u_{i} \in {\mathbb {R}}^{n_{i}}, \; i = 1,\ldots ,K$ using the notation $\otimes$ as $(u_{1}\otimes \cdots \otimes u_{K})_{i_{1},\ldots ,i_{K}} = \prod _{l=1}^{K}u_{l,i_{l}}$. The k-mode product of a tensor ${\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_k \cdots \times n_{K}}$ and a vector $v \in {\mathbb {R}}^{n_k}$ is defined as ${\mathcal {T}} \times _{k} v = \sum _{i_{k}=1}^{n_k} {\mathcal {T}}_{i_1,i_2,\ldots ,i_{k},\ldots ,i_{K}}v_{i_k}$. The largest singular value of ${\mathcal {T}}$ is given by $\gamma _1({\mathcal {T}})$. The rank of a matrix $A \in {\mathbb {R}}^{n \times m}$ is given by $\mathrm {Rank}(A)$.

2 Review of low-rank tensor norms

Designing of convex low-rank inducing norms for tensors is a challenging task. Over the years, several tensor norms have been proposed with each norm having certain advantages over the others. The main challenge with defining tensor norms is the multi-dimensionality of tensors and the existence of different ranks (e.g. CP rank, multilinear (Tucker) rank). A common criterion for designing low-rank tensor norms is to induce low-rankness by minimizing a particular rank. A commonly used rank is the multilinear rank, which represents the rank with respect to each mode of a tensor. Given a tensor ${\mathcal {W}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}$, we obtain the rank of each unfolding $r_{k} = \mathrm {Rank}(W_{(k)}),\;k=1,\ldots ,K$, and define the multilinear rank as $(r_{1},\ldots ,r_{K})$. To minimize the multilinear rank the overlapped trace norm has been defined (Liu et al. 2009; Tomioka and Suzuki 2013), which for a tensor ${\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}$, is

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{\mathrm {overlap}} = \underset{k=1}{\overset{K}{\sum }} \Vert T_{(k)}\Vert _{\mathrm {tr}}, \end{aligned}$$

where $\Vert \cdot \Vert _{\mathrm {tr}}$ is the matrix nuclear norm (a.k.a. trace norm) (Fazel et al. 2001), which is the sum of the non-zero singular values of a matrix. A limitation with this norm is that for tensors with high variations in the multilinear rank this norm stays at poor performances (Tomioka and Suzuki 2013; Wimalawarne et al. 2014).

The latent trace norm (Tomioka and Suzuki 2013) has been proposed to overcome limitations of the overlapped trace norm, which allows freedom to learn ranks with respect to each mode unfolding by considering a latent decomposition of the tensor. More specifically, the latent tensor norm learns latent tensors ${\mathcal {T}}^{(k)}, \;k=1,\ldots ,K$ as

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{\mathrm {latent}} = \underset{{\mathcal {T}}^{(1)} + \ldots +{\mathcal {T}}^{(K)} = {\mathcal {T}} }{\inf } \underset{k=1}{\overset{K}{\sum }} \Vert T_{(k)}^{(k)} \Vert _{\mathrm {tr}} . \end{aligned}$$

This norm was shown to be more robust for tensors with high variations in the multilinear rank compared to the overlapped trace norm (Tomioka and Suzuki 2013). The latent trace norm has been further extended to develop the scaled latent trace norm (Wimalawarne et al. 2014) by considering the relative rank of each latent tensor by scaling using the inverse squared mode dimension.

Another popular rank for tensors is the CANDECOMP/PARAFAC (CP) rank (Carroll and Chang 1970; Harshman 1970; Hitchcock 1927; Kolda and Bader 2009), which can be considered as the higher order extension of the matrix rank. Recently, minimization of the CP rank has gained attention of many researchers, who have shown that it leads to a better sample complexity than multilinear rank based norms (Yuan and Zhang 2016). The tensor nuclear norm (Yuan and Zhang 2016; Yang et al. 2015; Lim and Comon 2014) has been defined as an approximation to minimize the CP rank of a tensor. For a tensor ${\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}$ with rank R, $\mathrm {Rank}({\mathcal {T}}) = R$, the tensor nuclear norm is defined as

$$\begin{aligned} \Vert {\mathcal {T}}\Vert _{*}&= \inf \Bigg \{ \sum _{j=1}^{R} \gamma _{j} | {\mathcal {T}} = \sum _{j=1}^{R} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Kj}, \nonumber \\ \Vert u_{kj}\Vert _{2}^{2}&= 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg \}, \end{aligned}$$

(1)

where $u_{kj} \in {\mathbb {R}}^{n_k}$ for $k=1,\ldots ,K$ and $j=1,\ldots ,R$.

The latest addition to convex low-rank tensor norms is the Schatten TT norm (Imaizumi et al. 2017), which minimizes the tensor train rank (Oseledets 2011) of tensors. The Schatten TT norm is defined as

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{s,T} = \frac{1}{K-1} \sum _{k=1}^{K-1} \Vert Q_{k}({\mathcal {T}}) \Vert _{\mathrm {tr}}, \end{aligned}$$

where $Q_{k}: {\mathcal {T}} \rightarrow {\mathbb {R}}^{n_{\ge k} \times n_{k <}}$ is an operator that reshapes the tensor ${\mathcal {T}}$ to a matrix by combining the first k modes as rows and the rest of the $K-k$ modes as columns. This norm has been shown to be suitable for high-order tensors.

It has also been shown that low-rank tensor norms can be designed without restricting to a specific rank. The square norm (Mu et al. 2014) reshapes a tensor as a matrix and apply the matrix nuclear norm as

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{\square } = \Bigg \Vert \mathrm {reshape}\Bigg ({\mathcal {T}}_{(1)}, \prod _{i=1}^{j} , \prod _{i=j+1}^{K} \Bigg ) \Bigg \Vert _{\mathrm {tr}}, \end{aligned}$$

where the function $\mathrm {reshape}()$ reshapes ${\mathcal {T}}$ to a matrix with approximately equal dimensions for some $j>1$. This norm has shown to have a better sample complexity for tensor completion compared to the overlapped trace norm.

We point out that all the existing tensor norms except the tensor nuclear norm reshape tensors as matrices to induce the low-rankness with respect to two sets of mode arrangements. As a result these norms are focused on constraining the multilinear rank of a tensor. However, tensor nuclear norm has shown to lead to a better sample complexity compared to multilinear rank based tensors norms for tensor completion (Yuan and Zhang 2016). Hence, lack of tensor nuclear norm regularization for reshaped tensors among existing norms may results in sub-optimal solutions.

3 Proposed method: tensor reshaping and tensor nuclear norm

In this paper, we investigate on extending the tensor nuclear norm for higher order tensors. We explore methods to combine tensor reshaping with the tensor nuclear norm.

3.1 Generalized tensor reshaping

First, we introduce the following notation to compute the product of tensor dimensions. For a given vector $(n_{1},\ldots ,n_{p})$, we represent its element-wise product by $\mathsf {prod}(n_{1},\ldots ,n_{p}) = n_{1}n_{2}\cdots n_{p}$. Next, we define generalized reshaping for tensors.

Definition 1

(Tensor Reshaping) Let us consider a tensor ${\mathcal {X}} \in {\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}}$ and its mode dimensions as $D = \{n_{1}, n_{2},\ldots , n_{K}\}$. Given M sets $D_{i} \subset D,\;i=1,\ldots ,M$, that are disjoint, $D_{i} \cap D_{j} = \emptyset$ for $i \ne j$, the reshaping operator is defined as

$$\begin{aligned} \varPi _{(D_{1},\ldots ,D_{M})}:{\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}} \rightarrow {\mathbb {R}}^{\textsf {prod}(D_{1}) \times \cdots \times \textsf {prod}(D_{M})}, \end{aligned}$$

and the inverse operator is represented by $\varPi _{(D_{1},\ldots ,D_{M})}^{\top }$. Further, we present the reshaping of ${\mathcal {X}}$ by the set $(D_{1},\ldots ,D_{M})$ as ${\mathcal {X}}_{(D_{1},\ldots ,D_{M})}$.

We point out that when $| D_{1} | = \cdots = | D_{M} | = 1$, there is no reshaping of the tensor, ${\mathcal {X}}_{(D_{1},\ldots ,D_{M})} = {\mathcal {X}}$. Unfolding of a tensor along the mode k (Kolda and Bader 2009) is equivalent to defining two sets with $D_{1} = n_{k}$ and $D_{2} = (n_{1}, \ldots ,n_{k-1},n_{k+1}, \ldots , n_{K})$. Further, we can obtain reshaping of a tensor as a matrix for the square norm (Mu et al. 2014) by specifying two sets $D_{1}$ and $D_{2}$ with $\textsf {prod}(D_{1}) \approx \textsf {prod}(D_{2})$.

3.2 Reshaped tensor nuclear norm

We propose a class of tensor norms by combining generalized tensor reshaping and the tensor nuclear norm. We name the proposed norms Reshaped Tensor Nuclear Norms. In order to define the proposed norms, we consider a K-mode tensor ${\mathcal {X}} \in {\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}}$ and a set $D_{i},\;i=1,\ldots ,M$, adhering to Definition 1. We define the reshaped tensor nuclear norm as

$$\begin{aligned} \Vert {\mathcal {X}}_{(D_1,\ldots ,D_M)} \Vert _{*}, \end{aligned}$$

where $\Vert \cdot \Vert _{*}$ is the tensor nuclear norm as defined in (1). It is understood that this norm is a convex norm, since the tensor nuclear norm (1) is convex.

3.3 Reshaped latent nuclear norm

A practical limitation in applying reshaping using the proposed tensor norm is the difficulty to select the most suitable reshaping set out of all possible reshaping combinations. This is critical since we would not know the ranks of the tensor prior to training a learning model. To overcome this difficulty we propose the Reshaped Latent Tensor Nuclear Norm by extending the latent trace norm (Tomioka and Suzuki 2013) for reshaping tensors.

Let us consider a collection of G reshaping sets $D_{\mathrm {L}} = (D^{(1)},\ldots ,D^{(G)})$ where each $D^{(s)} = (D^{(s)}_1,\ldots ,D^{(s)}_{m_{s}}),\;s=1,\ldots ,G$ consists a reshaping set for a $m_{s}$-mode reshaped tensor. Further, we consider the ${\mathcal {W}}$ as a summation of G latent tensors ${\mathcal {W}}^{(g)},\;g=1,\ldots ,G$ as ${\mathcal {W}} = \sum _{k=1}^{G} {\mathcal {W}}^{(k)}$. We define the reshaped latent tensor nuclear norm as

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})} = \inf _{{\mathcal {W}}^{(1)} + \cdots + {\mathcal {W}}^{(G)}= {\mathcal {W}}} \sum _{k=1}^{G} \Vert {\mathcal {W}}^{(k)}_{(D^{(k)}_{1},\ldots ,D^{(k)}_{m_{k}})} \Vert _{*}. \end{aligned}$$

(2)

We point out that the above norm differs from the latent trace norm (Tomioka and Suzuki 2013) since it considers reshaping sets defined by the user where the latent trace norm considers all the mode-wise tensor unfolding. Furthermore, the above norm uses the tensor nuclear norm while the latent trace norm is build using the matrix nuclear norm.

3.4 Completion models

Now, we propose tensor completion models for the proposed norms. Let us consider a partially observed tensor ${\mathcal {X}} \in {\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}}$. Given that ${\mathcal {X}}$ has m observed elements, we define the mapping of the observed elements from ${\mathcal {X}}$ by $\varOmega :{\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}} \rightarrow {\mathbb {R}}^{m}$. Given a reshaping set ${(D_{1},\ldots ,D_{M})}$, the completion model that is regularized by the reshaped norm is given as

$$\begin{aligned}&\min _{{\mathcal {W}}}\frac{1}{2}\Vert \varOmega ({\mathcal {X}}) - \varOmega ({\mathcal {W}}) \Vert _{\mathrm {F}}^{2}\nonumber \\&\quad \mathrm {s.t.} \;\; \Vert {\mathcal {W}}_{(D_{1},\ldots ,D_{M})}\Vert _{*} \le \lambda , \end{aligned}$$

(3)

where $\lambda$ is a regularization parameter. For a selected set of reshaping sets $D_{\mathrm {L}}=(D^{(1)},\ldots ,D^{(G)})$, a completion model regularized by the reshaped latent tensor nuclear norm is given as

$$\begin{aligned}&\min _{{\mathcal {W}}^{(1)}+\cdots +{\mathcal {W}}^{(G)}={\mathcal {W}}}\frac{1}{2}\Vert \varOmega ({\mathcal {X}}) - \varOmega ({\mathcal {W}}^{(1)}+\cdots +{\mathcal {W}}^{(G)}) \Vert _{\mathrm {F}}^{2}\nonumber \\&\quad \mathrm {s.t.} \;\; \Vert {\mathcal {W}}\Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})} \le \lambda , \end{aligned}$$

(4)

where $\lambda$ is a regularization parameter.

4 Theory

We investigate theoretical properties of our proposed methods to identify the optimal conditions for reshaping of tensors. For our analysis, we use generalization bounds based on the transductive Rademacher complexity analysis (El-Yaniv and Pechyony 2007; Shamir and Shalev-Shwartz 2014).

We consider the learning problem in (3) and we denote the indexes of the observed elements of ${\mathcal {X}}$ by $\mathrm {S}$, where each index $(i_{1},\cdots ,i_{K})$ of observed elements of ${\mathcal {X}}$ is assigned as an element $\alpha _j \in \mathrm {S}$ for some $1 \le j \le |\mathrm {S}|$. We consider observed elements as the training set denoted by ${\rm {S}}_{\rm {Train}}$ and the rest belonging to the test set denoted by ${\rm {S}}_{\rm {Test}}$. For the convenience of deriving the Rademacher complexity, we consider the special case of $|{\mathrm {S}_{\mathrm {Train}}}| = |{\mathrm {S}}_{\mathrm {Test}}| = |\mathrm {S}|/2$ as in (Shamir and Shalev-Shwartz 2014).

Given a reshaping set $(D_1,\ldots ,D_M)$, we consider the hypothesis class $\textsf {W} = \{{\mathcal {W}}| \Vert {\mathcal {W}}_{(D_{1},\ldots ,D_{M})}\Vert _{*} \le t \}$ for a given t. Given a loss function $l(\cdot ,\cdot )$ and a set $\mathrm {S}$, we define the empirical loss as

$$\begin{aligned} L_{\mathrm {S}}(l \circ {\mathcal {W}}) := \frac{1}{|\mathrm {S}|} \Bigg [ \sum _{(i_{1},\ldots ,i_{K}) \in \mathrm {S} } l({\mathcal {X}}_{i_{1},\ldots ,i_{K}}, {\mathcal {W}}_{i_{1},\ldots ,i_{K}}) \Bigg ] . \end{aligned}$$

Given that $\max _{i_{1},\ldots ,i_{K} \; {\mathcal {W}} \in \textsf {W} } |l({\mathcal {X}}_{i_{1},\ldots ,i_{K}},{\mathcal {W}}_{i_{1},\ldots ,i_{K}})| \le b_{l}$, it is straight forward to extend generalizing bounds for matrices from (Shamir and Shalev-Shwartz 2014) to tensors, which holds with probability $1-\delta$ as

$$\begin{aligned}L_{{\mathrm {S}}_{\rm {Test}}}(l \circ {\mathcal {W}}) - L_{{\mathrm {S}}_{\rm {Train}}}(l \circ {\mathcal {W}}) &\le 4R_{\mathrm {S}}(l \circ {\mathcal {W}}) \\&\quad + b_{l}\Bigg (\frac{11 + 4\sqrt{\log {\frac{1}{\delta }}}}{\sqrt{|{\mathrm {S}}_{\rm {Train}} |} }\Bigg ), \end{aligned}$$

where $R_{\mathrm {S}}(l \circ {\mathcal {W}})$ is the transductive Rademacher complexity theory (El-Yaniv and Pechyony 2007; Shamir and Shalev-Shwartz 2014) defined as

$$R_{{\text{S}}} \left( {l \circ {\mathcal{W}}} \right) = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}\in W}} \sum\limits_{{j = 1}}^{{\left| {\text{S}} \right|}} {\sigma _{j} l\left( {{\mathcal{X}}_{{\alpha j}} ,{\mathcal{W}}_{{\alpha j}} } \right)} } \right],$$

(5)

where $\sigma _{j} \in \left\{ { - 1,1} \right\},\quad j = 1, \ldots ,\left| {\text{S}} \right|$ with probability of 0.5 are Rademacher variables.

The following theorem gives the Rademacher complexity for completion models regularized by a reshaped tensor nuclear norm.

Table 1 Rademacher complexities for convex norm regularized completion models for a K-mode tensor ${\mathcal {T}} \in {\mathbb {R}}^{n \times \cdots \times n}$ with a multilinear rank $(r_1,\ldots ,r_K)$. $\gamma _{1}({\mathcal {X}})$ is the largest singular value of ${\mathcal {X}}$, G reshaping sets of $D^{(s)}=(D^{(s)}_1,\ldots ,D^{(s)}_{m_{s}}),\;g=1,\ldots ,G$, and c, $\varLambda$, and $B_{{\mathcal {T}}}$ are constants

Full size table

Theorem 1

Consider a K-mode tensor ${\mathcal {W}} \in {\mathbb {R}}^{n_1 \times n_2 \times \cdots \times n_K}$. Let us consider any M reshaping sets $(D_{1},\ldots ,D_{M})$ with a hypothesis class of $\textsf {W} = \{{\mathcal {W}}| \Vert {\mathcal {W}}_{(D_{1},\ldots ,D_{M})}\Vert _{*} \le t \}$. Suppose that for all $(i_1,\ldots ,i_K)$, $l({\mathcal {X}}_{i_1,\ldots ,i_K}, \cdot )$ is $\varLambda$-Lipschitz continuous. Then,

(a) given that ${\mathcal {W}}$ has a multilinear rank of $(r_1,\ldots ,r_K)$, we obtain

$$\begin{aligned}R_{\mathrm {S}}(l \circ {\mathcal {W}}) &\le \frac{c\varLambda }{|\mathrm {S}|}\bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}} \bigg ) \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})})\\&\quad \log (4M)\sum _{j = 1}^{M} \sqrt{ \prod _{p \in D_{j}} n_{p}}, \end{aligned}$$

(b) given that${\mathcal {W}}$ has a CP rank of$r_{cp}$, we obtain

$$\begin{aligned} R_{\mathrm {S}}(l \circ {\mathcal {W}}) \le \frac{c\varLambda }{|\mathrm {S}|}r_{cp} \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})})\log (4M)\sum _{j = 1}^{M} \sqrt{ \prod _{p \in D_{j}} n_{p}}, \end{aligned}$$

where c is a constant.

Using the Theorem 1, we can obtain the Rademacher complexities for tensor nuclear norm by considering $|D_1| = |D_2| = \cdots = |D_K|=1$ and the square norm by two reshaping sets of $|D_1|$ and $|D_2|$ such that $\prod _{p \in D_1}n_{p} \approx \prod _{q \in D_2} n_{q}$. We summarize Rademacher complexities of convex low-rank tensor norms in Table 1 for a tensor with equal mode dimensions ($n_1=n_2=\ldots =n_{K}=n$).

From Table 1 and Theorem 1, we see that norms constructed using the tensor nuclear norm lead to better bounds compared to the overlapped trace norm, latent trace norm, and the scaled latent trace norm. Further, we see that the mode based components of the Rademacher complexity would have the smallest value with the tensor nuclear norm $(\log (4K)\sqrt{Kn})$. It is also clear that for any reshaping set, we find that $\log (4K)\sqrt{Kn} \le \log (4M)\sum _{j = 1}^{M} \sqrt{ n^{|D_j|}}$. This observation might lead us to conclude that the tensor nuclear norm is better than all the other norms. However, considering the multilinear rank such that $1 < r_1 \le r_2 \le \cdots \le r_K$, we can always find $M < K$ reshaping sets $D_1, D_2, \ldots , D_M$ such that $\frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,K} r_{j}} \ge \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}}$. In other words, we can reshape the tensor such that the Rademacher complexity for the reshaped tensor nuclear norm is bounded with a smaller rank based component compared to the tensor nuclear norm.

It is not known how reshaping a tensor changes the CP rank of the original tensor into the rank of the reshaped tensor except that $\mathrm {Rank}({\mathcal {X}}_{(D_{1},\ldots ,D_{M})}) \le r_{cp}$ ((Mu et al. 2014) and Lemma 4 in “Appendix”). However, Theorem 1 shows that reshaping results in a mode based component of $\log (4M)\sum _{j = 1}^{M} \sqrt{ n^{|D_j|}}$ in the Rademacher complexity, which indicates that selecting a reshaping set that gives a lower mode based components can lead to a lower generalization bound compared the square norm or the tensor nuclear norm. Furthermore, it is clear that the reshaping a tensor and regularization using the tensor nuclear norm lead to a lower generalization bound compared to multilinear rank based norms such as the overlapped trace norm, latent trace norm, and scaled latent trace norms and tensor train rank based Schatten TT norm.

The next theorem provides the Rademacher complexity for completion models regularized by the reshaped latent tensor nuclear norm.

Theorem 2

Let us consider a K-mode tensor${\mathcal {W}} \in {\mathbb {R}}^{n_1 \times \cdots \times n_K}$. Let us consider a collection of G collection of reshaping sets $D_{\mathrm {L}}=(D^{(1)},\ldots ,D^{(G)})$ where each $D^{(s)} = (D^{(s)}_1,\ldots ,D^{(s)}_{M_{s}}),\;s=1,\ldots ,G$ consists a reshaping set for a $M_{s}$-mode reshaped tensor. Consider the hypothesis class $\textsf {W}_{\textsf {rl}} = \{ {\mathcal {W}} | {\mathcal {W}}^{(1)} + \cdots + {\mathcal {W}}^{(G)} = {\mathcal {W}}, \Vert {\mathcal {W}}\Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})} \le t \}$ for a given set of reshaping set $(D_{1},\ldots ,D_{M})$. Suppose that for all ${\mathcal {X}}_{i_1,\ldots ,i_K}$, $l({\mathcal {X}}_{i_1,\ldots ,i_K}, \cdot )$ is $\varLambda$-Lipschitz continuous. Then,

(a) when ${\mathcal {W}}$ has a multilinear rank of $(r_{1},\ldots ,r_{K})$, we obtain

$$\begin{aligned}R_{\mathrm {S}}(l \circ {\mathcal {W}}) &\le \frac{c\varLambda }{|\mathrm {S}|}\min _{g \in G} \bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D^{(g)}_{j}} r_{i}} \bigg ) \gamma _1\left({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})}\right)\\&\quad \max _{g \in G} \log (4M_g)\sum _{j = 1}^{M_{g}}\sqrt{ \prod _{p \in D^{(g)}_{j}} n_{p} }. \end{aligned}$$

(b) when ${\mathcal {W}}$ has a CP rank of $r_{cp}$, we obtain

$$\begin{aligned}R_{\mathrm {S}}(l \circ {\mathcal {W}}) &\le \frac{c\varLambda }{|\mathrm {S}|}r_{cp}\min _{g} \gamma _1\left({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})}\right)\\&\quad \max _{g \in G} \log (4M_g) \sum _{j = 1}^{M_{g}}\sqrt{ \prod _{p \in D^{(g)}_{j}} n_{p} }. \end{aligned}$$

where c is a constant.

Theorem 2 shows that latent reshaped tensor nuclear norm bounds the Rademacher complexity by the largest mode based component that results from all the reshaping sets. Further, with the multilinear rank of the tensor the Rademacher complexity is bounded by the smallest rank based component that results from all the reshaping sets. This observation indicates that properly selecting a set of reshaping sets to use with the latent reshaped tensor nuclear norm can lead to a lower generalization bound.

We want to point out that the largest singular values ($\gamma _1(\cdot )$) that appear in both Theorems 1 and 2 can be upper bounded by taking the largest singular value with respect to all possible reshaping sets for a tensor. However, we do not use such a bounding to keep the Rademacher complexities small.

4.1 Optimal reshaping strategies

Given that we have an understanding of the ranks of the tensor, Theorem 1 can be used to select a reshaping set such that reshaped tensor has a smaller rank and relatively smaller mode dimensions. However, since we do not know the rank in advance, selecting a reshaping set such that the reshaped tensor does not have large mode dimensions would lead to a better performance.

To avoid the difficulty in choosing a single reshaping set, we can use the reshaped latent tensor nuclear norm by choosing several reshaping sets that agree with our observation in Theorem 1. However, since the Rademacher complexity is bounded by the largest mode based components as shown in Theorem 2, it is important not to select reshaping sets that result in a tensor with large dimensions. A general strategy to create the reshaping sets by selecting the original tensor and other reshaping sets that do not result in large mode dimensions compared to the original tensor.

5 Optimization procedures

It has been shown that learning by constraining the tensor nuclear norm is the NP-Complete problem (Hillar and Lim 2013), which makes solving the problems (3) and (4) computationally difficult. In Yang et al. (2015) an approximation method have been proposed to compute the spectral norm by computing largest singular vectors on each mode that is combined with the Frank-Wolfe optimization method to solve (3). We adopt their approximation method to solve our proposed completion models with reshaped tensors (3) and (4). We found that solutions using the approximation method provide agreements with our theoretical results related to generalization bounds we showed in the Sect. 4. However, there is no theoretical analysis available to understand how well the approximation method results in a solution compared to an exact solution.

The optimization method proposed in Yang et al. (2015) uses an approximation method for the spectral norm using a recursive algorithm based on singular value decomposition with respect to each mode. However, we adopt a more simpler approach as given in Algorithm 1, which we believe is more easier to implement. Using the approximation method, we provide an optimization procedure to solve the completion model that is regularized by a single reshaped norm in the Algorithm 2. The optimization procedure in Algorithm 2 is also similar to the Frank-Wolfe based optimization procedure proposed in Yang et al. (2015). The additions in Algorithm 2 to Yang et al. (2015) are the computation of the spectral norm of the reshaped tensor in step 7 and the conversion of the reshaped tensor to the original dimensions in step 12. Here, we want to recall Definition 1 to refer to the reshaping operator $\varPi _{(D_{1},\ldots ,D_{M})}()$ and its inverse operator $\varPi _{(D_{1},\ldots ,D_{M})}^{\top }()$ for any given reshaping set $(D_1,D_2,\ldots ,D_M)$.

Next, we give an algorithm to solve the completion model regularized by the reshaped latent tensor nuclear norm. The Frank-Wolfe optimization method has also been applied to efficiently solve learning models regularized by the latent trace norms (Guo et al. 2017). We follow their approach to design Frank-Wolfe method for the reshaped latent tensor nuclear norm and Algorithm 3 shows the steps for optimization. From Lemma 1, we know that we need to find the reshaping with the largest spectral norm each t step to update the Frank-Wolfe procedure. This is shown in the lines 7–11 in the Algorithm 3.

6 Experiments

In this section, we give simulation and real-data experiments.

6.1 Simulation experiments

We created simulation experiments for tensor completion using tensors with some fixed multilinear rank and CP rank. We create a K-mode tensor with the multilinear rank of $(r_{1},\cdots ,r_{K})$ by generating a tensor ${\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K} }$ using the Tucker decomposition (Kolda and Bader 2009) as ${\mathcal {T}} = {\mathcal {C}} \times _{1} U_{1} \times _{2} U_{2} \times _{3} \cdots \times _{K} U_{K}$, where ${\mathcal {C}} \in {\mathbb {R}}^{r_{1} \times \cdots \times r_{K} }$ is a core tensor whose elements are sampled from a normal distribution specifying the multilinear rank $(r_{1},\cdots ,r_{K})$ and $U_{k} \in {\mathbb {R}}^{r_{k} \times n_{k}},\;k=1,\ldots ,K$ are orthogonal component matrices. We create a tensor with the CP rank of r using the CP decomposition (Kolda and Bader 2009) as ${\mathcal {T}} = \sum _{i=1}^{r} c_{i} u_{1i} \otimes u_{2i} \otimes \dots \otimes u_{Ki}$ where $u_{ki} \in {\mathbb {R}}^{n_{k}},\;k=1,\ldots ,K,\;i=1,\ldots ,r$ are sampled from a normal distribution and normalized such that $\Vert u_{ki}\Vert _{2}^{2} = 1$ and $c_{i} \in {\mathbb {R}}^{+}$. From the total number of elements in the generated tensors, we randomly selected 10, 40, and 70 percentages as training sets, and from the remaining we selected 10 percent of elements as validation set, and the rest were taken as test data. We conducted 3 simulations for each randomly generated tensor.

For all simulation experiments, we tested completion using our proposed completion models (3) with the reshaped tensor nuclear norm (abbreviated as RTNN) and (4) with the reshaped latent tensor nuclear norm (abbreviated as RLTNN). Additionally, we performed completion using the tensor nuclear norm (abbreviated as TNN) without reshaping and the square norm (abbreviated as SN). As further baseline methods, we used tensor completion with regularization using the overlapped trace norm (abbreviated as OTN), scaled latent trace norm (abbreviated as SLTN), and the Schatten TT norm (Imaizumi et al. 2017) (abbreviated as STTN). As the performance measure of completion, we calculated the mean squared error (MSE) on the validation data and test data. For all completion models, we performed cross-validation of regularization parameters in power of $2^{x}$, with x ranging from $-5$ to 15 with intervals of 0.25.

For our first simulation experiment, we created a 4-way tensors ${\mathcal {T}}_{1} \in {\mathbb {R}}^{n_{1} \times n_{2} \times n_{3} \times n_{4}}$ with $n_{1} = n_{2} = 10, n_{3} =n_{4} = 40$ with a multilinear rank of $(r_1,r_2,r_3,r_4) = (9,9,3,3)$. From Mu et al. (2014) we can reshape ${\mathcal {T}_1}$ by using a reshaping set of $(D_1,D_2) = ((n_1,n_3),(n_2,n_4))$ such that it creates a square matrix for the square norm. From Theorem 1, we see that the rank components in the Rademacher complexity for the nuclear norm and the square norm are $\prod _{k=1}^{K}r_{k} / (\max _{j = 1,\ldots ,4}r_{j}) = 243$ and $\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2}\prod _{i \in D_{j}} r_{i}) = 27$, respectively. Further, Theorem 1 shows that the mode based components for the nuclear norm and the square norm are $\log (4\cdot 4)(\sum _{k=1}^{4}\sqrt{n_k}) \approx 53$ and $\log (4\cdot 2)(\sqrt{n_1n_3} + \sqrt{n_2n_4}) \approx 83$, respectively. This leads to a lower generalization bound for the nuclear norm compared to the square norm justifying its better performance as shown in the Fig. 1a. However, Theorem 1 indicates that the lowest generalization bound is obtained by using the reshaping set $(D'_1,D'_2, D'_3) = ((n_1,n_2),n_3,n_4)$, which combines the high ranked modes (mode 1 and mode 2) together resulting in a rank based component of $\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2,3}\prod _{i \in D'_{j}} r_{i}) = 9$ and a mode based component of $\log (4\cdot 3)(\sqrt{n_1n_2} + \sqrt{n_3} + \sqrt{n_4}) \approx 56$. Figure 1a agrees with our theoretical analysis showing that our proposed reshaped tensor nuclear norm obtains the best performance compared to other norms. For the reshaped latent tensor nuclear norm, we combined the two reshaping sets $((n_1,n_2,n_3,n_4), ((n_1,n_2),n_3,n_4))$. Applying Theorem 2, we see that this reshaping set combination leads to a lower Rademacher complexity. However, this combination only gave a comparable performance compared to the reshaped tensor nuclear norm.

As our second simulation, we created a 5-mode tensor ${\mathcal {T}}_{2} \in {\mathbb {R}}^{n_1 \times n_2 \times \cdots \times n_5}$, where $n_1 = n_2 = \ldots =n_5=10$ with a CP rank of 3. From Theorem 1, we know that we can only consider the mode based component of the Rademacher complexity to obtain a lower generalization bound. For the square norm we can use a reshaping set such as $(D_1,D_2) =((n_1,n_2),(n_3,n_4,n_5))$, which results in the mode based component as $\log (4\cdot 2)(\sqrt{n_1n_2} + \sqrt{n_3n_4n_5}) \approx 86$. The tensor nuclear norm leads to a mode based component of $\log (4\cdot 5)(\sum _{k=1}^{5}\sqrt{n_k}) \approx 47$. As an alternative reshaping method, we propose to combine any two modes together to create a reshaping set such as $(D'_1,D'_2,D'_3) =(n_1,n_2,n_3,(n_4,n_5))$ for the reshaped tensor nuclear norm, which lead to a mode based component of $\log (4\cdot 4)(\sqrt{n_1} + \sqrt{n_2} + \sqrt{n_3} + \sqrt{n_4n_5}) \approx 54$. Comparing the Rademacher complexities using the mode based components we see that the lowest generalization bound is given by the tensor nuclear norm. Figure 1b shows that our theoretical observation is accurate since the tensor nuclear norm gives the best performance compared to other two reshaped norms. For the reshaped latent tensor nuclear norm we used all the 10 combinations of two modes combinations, which resulted in reshaping sets of $D =(((n_1,n_2),n_3,n_4,n_5),(n_1,(n_2,n_3),n_4,n_5),\ldots ,$ $(n_1,n_2,n_3,(n_4,n_5)))$. Figure 1b shows that the reshaped latent tensor nuclear norm has outperformed the tensor nuclear norm.

The next simulation focuses on a different multilinear rank for the 4-way tensor ${\mathcal {T}}_3 \in {\mathbb {R}}^{n_{1} \times n_{2} \times n_{3} \times n_{4} }$ with $n_{1} = n_{2} = 10, n_{3} =n_{4} = 40$. Figure 2a shows the simulation experiment with multilinear rank of (3, 3, 35, 35). Again based on Mu et al. (2014) we can reshape ${\mathcal {T}}_3$ by using a reshaping set of $(D_1,D_2) = ((n_1,n_3),(n_2,n_4))$ or $(D_1,D_2) = ((n_1,n_4),(n_2,n_3))$ to create a square matrix to use with the square norm. From Theorem 1 we can observe that the square norm will result in a rank based component of $\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2}\prod _{i \in D_{j}} r_{i}) = 105$ and a mode based component of $\log (4\cdot 2)(\sqrt{n_1n_3} + \sqrt{n_2n_4}) \approx 63$. However, if we combine the high ranked modes 3 and 4 together to create the reshaping set $(D'_1,D'_2, D'_3) = (n_1,n_2,(n_3,n_4))$ for the reshaped tensor nuclear norm, then the rank based component will decrease to $\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2,3}\prod _{i \in D'_{j}} r_{i}) = 9$ and mode based component will decrease to $\log (4\cdot 2)(\sqrt{n_1} \sqrt{n_3} + \sqrt{n_2n_4}) \approx 55$. Furthermore, the tensor nuclear norm leads to a rank based component of $\prod _{k=1}^{K}r_{k} / (\max _{j = 1,\ldots ,4}r_{j}) = 315$ and mode based component of $\log (4\cdot 4)(\sum _{k=1}^{4}\sqrt{n_k}) \approx 53$ resulting in a larger generalization bound compared to the proposed reshaped set $(D'_1,D'_2, D'_3) = (n_1,n_2,(n_3,n_4))$. This analysis is also confirmed with the experimental results as shown in Fig. 2a where the reshaped tensor nuclear norm gives the best performance. Using the Theorem 2, we find that if we use reshaping sets $((n_1,n_2,n_3,n_4),((n_1,n_2),n_3,n_4))$ for the reshaped latent tensor nuclear norm, the Rademacher complexity will be bounded by the smaller rank based component from the reshaping set $(n_1,n_2,n_3,n_4)$ and the mode based component from the reshaping set $((n_1,n_2),n_3,n_4)$. However, the reshaped latent tensor nuclear norm was not able to perform better than the tensor nuclear norm or the proposed reshaped norm with $(D'_1,D'_2, D'_3) = (n_1,n_2,(n_3,n_4))$.

The final simulation result shown in Fig. 2b is for a tensor ${\mathcal {T}}_4 \in {\mathbb {R}}^{10 \times 10 \times 10 \times 10 \times 10}$ with a CP rank of 243. For this experiment, we used the same reshaping strategies as in the previous experiment with CP rank. We see that when the fraction of training samples is less than 40 percent the tensor nuclear norm has given the best performance. When the fraction of the training samples increases beyond 40 percent the reshaped latent tensor nuclear norm has outperformed the tensor nuclear norm.

6.2 Multi-view video completion

We performed completion on multi-view video data using the EPFL data set: Multi-camera Pedestrian Videos data (Berclaz et al. 2011). Videos in this data set capture sequentially entering a room and walking around of four people from 4 views using 4 synchronized cameras. We down-sampled each video frame to a height of 96 and width of 120 to obtain a frame as a RGB-color image with dimensions of $96 \times 120 \times 3$. We sequentially selected 391 frames from each video. Combining all the video frames from all views resulted in a tensor of dimensions of $96 \times 120 \times 3 \times 391 \times 4$ (height × width × color × frames × views).

To evaluate completion, we randomly removed entries from the multi-view tensor and performed completion using the remaining elements. We randomly selected percentages of 2, 4, 8, 16, 32, and 64 of the total number of elements in the tensor as training elements. As our validation set we selected 10 percent of the total number of elements. The rest of the remaining elements were taken as the test set. To create a square norm, we considered the reshaping set $\mathrm {((height,width),(color,frames,views))}$. For the reshaped tensor nuclear norm, we experimentally found that the reshaping set $\mathrm {((height,views),(width,color),(frames))}$ gives the best performance. To create the reshaping set for the reshaped tensor nuclear norm we combined the reshaping sets of the square norm and the reshaped tensor nuclear norm with the unreshaped original tensor. The resulting set was $D= ((\mathrm {height,width,color,frames,views}),$ $\mathrm {((height,width),}$ $\mathrm {(color,frames,views))},$ $\mathrm {((height,views),}\mathrm {(width,color),(frames))})$. We cross-validated all the completion models with regularization parameters out of $10^{-1},10^{-0.75},10^{-0.5},\ldots ,10^{7}$.

Figure 3 shows that when the training set is small (or the reshaped tensor nuclear norm is sparse) the reshaped tensor nuclear norm and the tensor nuclear norm have given good performance compared to the square norm. When the percentage of observed elements increases more than 16 percent, the square norm outperforms the other norms. However, the reshaped latent tensor nuclear norm has shown to be adaptive to all fractions of training samples and has given the overall best performance.

7 Conclusions

In this paper, we generalize tensor reshaping for low-rank tensor regularization and introduce the reshaped tensor nuclear norm and the reshaped latent tensor nuclear norm. We propose tensor completion models that are regularized by the proposed norm. Using generalization bound analysis of the proposed completion models we show that the proposed norms lead to smaller Rademacher complexity bounds compared to exiting norms. Further, using our theoretical analysis we discuss optimal conditions to create reshaped tensor nuclear norms. Simulation and real-data experiments confirm our theoretical analysis.

Our research opens up several future research directions. The most important research should be focused on developing theoretical guaranteed methods for optimization of completion models regularized by the the proposed tensor nuclear norms. Though the approximation methods we have adopted for computing the tensor spectral norm to be used with the Frank-Wolfe from Yang et al. (2015) provide performances that agrees with our generalization bounds we do not know its approximation error. We believe that future theoretical investigations are needed to understand qualitative properties of the proposed optimization procedures. Furthermore, optimization methods for nuclear norms that can scale for large-scale higher order tensors would be an important future research direction. Another important research direction is to further explore the theoretical foundation of tensor completion using the reshaped tensor nuclear norm. In this regard, recovery bounds (Yuan and Zhang 2016) would provide us with stronger bounds on sample complexities for the proposed method.

References

Berclaz, J., Fleuret, F., Turetken, E., & Fua, P. (2011). Multiple object tracking using K-shortest paths optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1806–19.
Article Google Scholar
Carroll, J. D., & Chang, J.-J. (1970). Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young”decomposition. Psychometrika, 35(3), 283–319.
Article Google Scholar
El-Yaniv, R., & Pechyony, D. (2007). Transductive rademacher complexity and its applications. Learning Theory, 4539, 157–171.
Article MathSciNet Google Scholar
Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148) (Vol. 6, pp. 4734–4739).
Guo, X., Yao, Q., & Kwok, J. T. (2017). Efficient sparse low-rank tensor completion using the frank-wolfe algorithm. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA. (pp. 1948–1954).
Hackbusch, W. (2012). Tensor spaces and numerical tensor calculus. Berlin Heidelberg: Springer Series in Computational Mathematics. Springer. ISBN 9783642280276.
Harshman, R. A. (1970). Foundations of the PARAFAC procedure: Models and conditions for an explanatory multimodal factor analysis. UCLA Working Papers in Phonetics, 16, 1–84.
Google Scholar
Hillar, C. J., & Lim, L.-H. (2013). Most tensor problems are np-hard. Journal of ACM, 60(6), ISSN 0004-5411.
Hitchcock, F. L. (1927). The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1), 164–189.
Article Google Scholar
Imaizumi, M., Maehara, T., & Hayashi, K. (2017). On tensor train rank minimization: Statistical efficiency and scalable algorithm. In NIPS, pp. 3933–3942.
Karatzoglou, A., Amatriain, X., Baltrunas, L., & Oliver, N. (2010). Multiverse recommendation: N-dimensional tensor factorization for context-aware collaborative filtering. In RecSys (pp. 79–86). ACM.
Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500.
Article MathSciNet Google Scholar
Latała, R. (2005). Some estimates of norms of random matrices. Proceedings of the American Mathematical Society, 133(5), 1273–1282.
Article MathSciNet Google Scholar
Lim, L., & Comon, P. (2014). Blind multilinear identification. IEEE Transactions on Information Theory, 60(2), 1260–1280.
Article MathSciNet Google Scholar
Liu, J., Musialski, P., Wonka, P., & Ye, J. (2009). Tensor completion for estimating missing values in visual data. In ICCV (pp. 2114–2121).
Mu, C., Huang, B., Wright, J., & Goldfarb, D. (2014). Square deal: Lower bounds and improved relaxations for tensor recovery. In ICML (pp. 73–81).
Oseledets, I. V. (2011). Tensor-train decomposition. SIAM Journal on Scientific Computing, 33(5), 2295–2317, ISSN 1064-8275.
Rai, P., Hu, C., Harding, M., & Carin, L. (2015). Scalable probabilistic tensor factorization for binary and count data. IJCAI’15, pp. 3770–3776. AAAI Press. ISBN 978-1-57735-738-4.
Raskutti, G., Chen, H., Yuan, M. (2015). Convex regularization for high-dimensional multi-response tensor regression. CoRR, abs/1512.01215v2.
Shamir, O., & Shalev-Shwartz, S. (2014). Matrix completion with the trace norm: Learning, bounding, and transducing. Journal of Machine Learning Research, 15, 3401–3423.
MathSciNet MATH Google Scholar
Song, Q., Ge, H., Caverlee, J., & Hu, X. (2017). Tensor completion algorithms in big data analytics. CoRR, abs/1711.10105.
Tomioka, R., & Suzuki, T. (2013). Convex tensor decomposition via structured schatten norm regularization. In NIPS.
Wimalawarne, K., Sugiyama, M., & Tomioka, R. (2014). Multitask learning meets tensor factorization: Task imputation via convex optimization. In NIPS.
Yang, Y., Feng, Y., & Suykens, J. A. K. (2015). A rank-one tensor updating algorithm for tensor completion. IEEE Signal Processing Letters, 22(10), 1633–1637. ISSN 1070-9908.
Yuan, M., & Zhang, C.-H. (2016). On tensor completion via nuclear norm minimization. Foundations of Computational Mathematics, 16(4), 1031–1068.
Article MathSciNet Google Scholar
Zheng, V. W., Cao, B., Zheng, Y., Xie, X., & Yang, Q. (2010). Collaborative filtering meets mobile recommendation: A user-centered approach. In AAAI, AAAI’10, pp. 236–241. AAAI Press.

Download references

Acknowledgements

H.M. has been supported in part by JST ACCEL [Grant Number JPMJAC1503], MEXT Kakenhi [Grant Number 19H04169] and AIPSE by Academy of Finland.

Author information

Authors and Affiliations

Department of Mathematical Informatics, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656, Japan
Kishan Wimalawarne
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan
Hiroshi Mamitsuka
Department of Computer Science, Aalto University, Espoo, Finland
Hiroshi Mamitsuka

Authors

Kishan Wimalawarne
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Mamitsuka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kishan Wimalawarne.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editor: Pradeep Ravikumar.

Appendix

1.1 Dual norms of reshaped tensor nuclear norms

In this section, we discuss the dual norm of the proposed reshaped tensor nuclear norm. The dual norm is useful in developing optimization procedures and proving theoretical bounds.

The dual norm of the tensor nuclear norm (Yang et al. 2015; Yuan and Zhang 2016) for a K-mode tensor ${\mathcal {T}} \in {\mathbb {R}}^{n_1 \times \cdots \times n_K}$ is given by

$$\begin{aligned} \Vert {\mathcal {T}}\Vert _{\mathrm {op}} = \max _{\Vert y_{i}\Vert _{2} = 1, 1 \le i \le K} \langle {\mathcal {T}}, y_{1} \otimes y_{2} \otimes \cdots \otimes y_{K} \rangle . \end{aligned}$$

(6)

This definition applies to all tensor nuclear norms including the reshaped norms.

The next lemma provides the dual norm for the reshaped latent tensor nuclear norm.

Lemma 1

The dual norm of the reshaped latent tensor nuclear norm for a tensor ${\mathcal {W}}\in {\mathbb {R}}^{n_1 \times \cdots \times n_{K}}$ for a collection of G reshaping sets $D_{\mathrm {L}} = (D^{(1)},\ldots ,D^{(G)})$ is

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})^{*}} = \max _{g} \Vert {\mathcal {W}}_{(D^{(g)})} \Vert _{\mathrm {op}}. \end{aligned}$$

Proof

Using the standard formulation of the dual norm, we write the dual norm for $\Vert {\mathcal {W}} \Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})^{*}}$ as

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})^{*}} = \sup \Bigg \langle \sum _{k=1}^{G} {\mathcal {X}}^{(k)}, {\mathcal {W}} \Bigg \rangle \quad \mathrm {s.t.}\; \inf _{{\mathcal {X}}^{(1)} + \cdots + {\mathcal {X}}^{(G)}= {\mathcal {X}}} \sum _{k=1}^{G} \Vert {\mathcal {X}}^{(k)}_{(D^{(k)})} \Vert _{\star } \le 1. \end{aligned}$$

(7)

The solution to (7) resides on the simplex of $\inf _{{\mathcal {X}}^{(1)} + \cdots + {\mathcal {X}}^{(G)}= {\mathcal {X}}} \sum _{k=1}^{G} \Vert {\mathcal {X}}^{(k)}_{(D^{(k)})} \Vert _{\star } \le 1$ and one of the edges of the simplex is a solution. Then, we can take any $g \in 1,\ldots ,G$ such that ${\mathcal {X}}^{(g)} = {\mathcal {X}}$ and all ${\mathcal {X}}^{(k \ne g)} =0$, and arrange (7) as

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})^{*}} = \sup _{g \in 1,\ldots ,G} \Big \langle {\mathcal {X}}_{(D^{(g)})} , {\mathcal {W}}_{(D^{(g)})} \Big \rangle \quad \mathrm {s.t.}\; \Vert {\mathcal {X}}_{(D^{(g)})} \Vert _{\star } \le 1, \end{aligned}$$

which results in the following

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})^{*}} = \max _{g \in 1,\ldots ,G} \Vert {\mathcal {W}}_{(D^{(g)})} \Vert _{\mathrm {op}}. \end{aligned}$$

$\square$

1.2 Proofs of theoretical analysis

In this section, we provide proofs of the theoretical analysis in Sect. 4.

First, we prove following useful lemmas. These lemmas bound the tensor nuclear norm and the reshaped tensor nuclear norms with respect to the multilinear rank of a tensor.

Lemma 2

Let ${\mathcal {X}} \in {\mathbb {R}}^{n_1 \times \cdots \times n_K}$ be a random K-mode tensor with a multilinear rank of $(r_1,\ldots ,r_K)$. Let $r_{cp}$ be the CP rank of ${\mathcal {X}}$,then

$$\begin{aligned} \Vert {\mathcal {X}} \Vert _{\star }&= \Bigg \{ \sum _{j=1}^{r_{cp}} \gamma _{j} | {\mathcal {X}} = \sum _{j=1}^{r_{cp}} \gamma _{j} u_{1j} \otimes u_{2j} \otimes \cdots \otimes u_{Kj}, \Vert u_{kj}\Vert _{2}^{2} = 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg\}\\ & \le \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,K} r_{i}} \gamma _{1}, \end{aligned}$$

where $\gamma _i$ is the ith singular value of ${\mathcal {X}}$.

Proof

Let us consider the Tucker decomposition of ${\mathcal {X}}$ as

$$\begin{aligned} {\mathcal {X}} = \sum _{j_1=1}^{r_1}\sum _{j_2=1}^{r_2}\cdots \sum _{j_K=1}^{r_K} {\mathcal {C}}_{j_1,\ldots ,j_K} u_{j_1}^{(1)} \otimes u_{j_2}^{(2)} \otimes \cdots \otimes u_{j_K}^{(K)}, \end{aligned}$$

where ${\mathcal {C}} \in {\mathbb {R}}^{r_1 \times \cdots \times r_K}$ is the core tensor and $u^{j}_{(i)} \in {\mathbb {R}}^{n_{j}},\;\Vert u^{j}_{(i)}\Vert _{2}=1,\;i=1,\ldots ,r_i,\;j=1,\ldots ,K$ are component vectors.

Following Chapter 8 of Hackbusch (2012), we can express the above Tucker decomposition as

$$\begin{aligned} {\mathcal {X}} = \sum _{j_2=1}^{r_2}\cdots \sum _{j_K=1}^{r_K} \underbrace{\Bigg (\sum _{j_1=1}^{r_1} {\mathcal {C}}_{j_1,\ldots ,j_K} u_{j_1}^{(1)} \Bigg )}_{{\hat{u}}^{(1)}[j_2,\ldots ,j_K] \in {\mathbb {R}}^{n_1}} \otimes u_{j_2}^{(2)} \otimes \cdots \otimes u_{j_K}^{(K)}, \end{aligned}$$

(8)

where we have taken summation over the multiplications of core tensor elements and component vectors of the mode 1. It is easy to see that we can also consider the summation over component vectors of any other mode in a similar manner.

By considering ${\hat{u}}^{(1)}[j_2,\ldots ,j_K] = \gamma [j_2,\ldots ,j_K]\frac{{\hat{u}}^{(1)}[j_2,\ldots ,j_K]}{\Vert {\hat{u}}^{(1)}[j_2,\ldots ,j_K]\Vert _{2}}$ where $\gamma [j_2,\ldots ,j_K] =\Vert {\hat{u}}^{(1)}[j_2,\ldots ,j_K]\Vert _{2}$, the above arrangement leads to a CP decomposition with a rank of $r_{cp} = \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,K} r_{i}}$.

By arranging $\gamma [j_2,\ldots ,j_K]$ in descending order along component vectors ${\hat{u}}^{(1)}[j_2,\ldots ,j_K]$ and renaming them as $\gamma _1 \ge \gamma _2 \ge \ldots$ and $u_{1j}$, respectively, we obtain

$$\begin{aligned} \Vert {\mathcal {X}} \Vert _{\star } = \Bigg \{ \sum _{j=1}^{r_{cp}} \gamma _{j} | {\mathcal {X}} = \sum _{j=1}^{r_{cp}} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Kj}, \Vert u_{kj}\Vert _{2}^{2} = 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg \}, \end{aligned}$$

where $u_{kj} \in [u^{(k)}_{1},\ldots u^{(k)}_{r_{k}}]$ are component vectors from (8) for each $k=2,\ldots ,K$.

Then we arrive at the final bound of

$$\begin{aligned} \Vert {\mathcal {X}} \Vert _{\star }&= \Bigg \{ \sum _{j=1}^{r_{cp}} \gamma _{j} | {\mathcal {X}} = \sum _{j=1}^{r_{cp}} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Kj}, \Vert u_{kj}\Vert _{2}^{2} = 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg \} \\ & \le \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,K} r_{i}} \gamma _{1} . \end{aligned}$$

$\square$

Lemma 3

Let ${\mathcal {X}} \in {\mathbb {R}}^{n \times \ldots \times n}$ be a random K-mode tensor with multilinear rank of $(r_1,\ldots ,r_K)$. We consider a set of M reshaping modes $D_i,\;i=1,\ldots ,M$. Let $r_{cp}$ be the CP rank of ${\mathcal {X}}$, then

$$\begin{aligned}&\Vert {\mathcal {X}}_{(D_1,\ldots ,D_M)} \Vert _{\star } = \Bigg \{ \sum _{j=1}^{r_{cp}} \gamma _{j} | {\mathcal {X}}_{(D_1,\ldots ,D_M)} = \sum _{j=1}^{r_{cp}} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Mj},\\&\quad \Vert u_{kj}\Vert _{2}^{2} = 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg \} \le \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}} \gamma _{1}, \end{aligned}$$

where $\gamma _i$ is the ith singular value of ${\mathcal {X}}_{(D_1,\ldots ,D_M)}$.

Proof

Let us consider the Tucker decomposition of ${\mathcal {X}}$ as

$$\begin{aligned} {\mathcal {X}} = \sum _{j_1=1}^{r_1}\sum _{j_2=1}^{r_2}\cdots \sum _{j_K=1}^{r_K} {\mathcal {C}}_{j_1,\ldots ,j_K} u_{j_1}^{(1)} \otimes u_{j_2}^{(2)} \otimes \cdots \otimes u_{j_K}^{(K)}, \end{aligned}$$

where ${\mathcal {C}} \in {\mathbb {R}}^{r_1 \times \cdots \times r_K}$ is the core tensor and $u^{j}_{i} \in {\mathbb {R}}^{n_{j}},\;\Vert u^{j}_{i}\Vert _{2}=1,\;i=1,\ldots ,r_i,\;j=1,\ldots ,K$ are component vectors. We rearrange the Tucker decomposition for the reshaped tensor ${\mathcal {X}}_{(D_1,\ldots ,D_M)}$ as

$$\begin{aligned} {\mathcal {X}}_{(D_1,\ldots ,D_M)} &= \sum _{ j'_{a'}, j'_{b'},\ldots \in D_2}\cdots \Bigg ( \sum _{\sum _{ j''_{a''}, j''_{b''},\ldots \in D_M}} \underbrace{\Bigg ( \sum _{ j_a, j_b,\ldots \in D_1} {\mathcal {C}}_{j_1,\ldots ,j_K} \varPi _{D_1}( u_{j_a}^{(a)} \otimes u_{j_b}^{(b)} \cdots ) \Bigg )}_{{\hat{u}}_{1}[D'_2,\ldots ,D'_M] \in {\mathbb {R}}^{\mathrm {prod}(D_1)}} \\&\otimes \varPi _{D_M}( u_{j_a'}^{(a')} \otimes u_{j_b'}^{(b')} \cdots ) \Bigg ) \otimes \cdots , \end{aligned}$$

with ${\hat{u}}_{1}[D'_2,\ldots ,D'_M] = \gamma [D'_2,\ldots ,D'_M] \frac{{\hat{u}}_{1}[D'_2,\ldots ,D'_M]}{\Vert {\hat{u}}_{1}[D'_2,\ldots ,D'_M]\Vert _{2}}$ where $\gamma [D'_2,\ldots ,D'_M] = \Vert {\hat{u}}_{1}[D'_2,\ldots ,D'_M]\Vert _{2}$. We can consider the above summation over any reshaping set and it is easy to see that the arrangement takes a CP decomposition with a CP rank of $r_{cp}= \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}}$.

By arranging $\gamma [D_2,\ldots ,D_M]$ in descending order order along with component vectors ${\hat{u}}^{(1)}[D_2,\ldots ,D_M]$ and renaming them as $\gamma _1 \ge \gamma _2 \ge \ldots$ and $u_{1j}$, respectively, we obtain

$$\begin{aligned} \Vert {\mathcal {X}}_{(D_1,\ldots ,D_M)} \Vert _{\star } = \Bigg \{ \sum _{j=1}^{r_{cp}} \gamma _{j} | {\mathcal {X}}_{(D_1,\ldots ,D_M)} = \sum _{j=1}^{r_{cp}} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Mj}, \Vert u_{kj}\Vert _{2}^{2} = 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg \}, \end{aligned}$$

where $u_{kj} \in [\varPi _{D_k}( u_{1}^{(a')} \otimes u_{1}^{(b')} \cdots ) , \ldots , \varPi _{D_k}( u_{r_a'}^{(a')} \otimes u_{r_b'}^{(b')} \cdots ) ]$ are components for each $k=2,\ldots ,M$ and $a',b',\ldots \in D_k$.

Using the above results we arrive at the following bound

$$\begin{aligned}&\Vert {\mathcal {X}}_{(D_1,\ldots ,D_M)} \Vert _{\star } = \Bigg \{ \sum _{j=1}^{r_{cp}} \gamma _{j} | {\mathcal {X}}_{(D_1,\ldots ,D_M)} = \sum _{j=1}^{r_{cp}} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Mj}, \\&\quad \Vert u_{kj}\Vert _{2}^{2} = 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg \} \le \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}} \gamma _{1}, \end{aligned}$$

$\square$

Lemma 4

Let ${\mathcal {X}} \in {\mathbb {R}}^{n_1 \times \ldots \times n_K}$ be a random K-mode tensor with CP rank of $r_{cp}$. We consider a set of M reshaping sets $D_i,\;i=1,\ldots ,M$. Then

$$\begin{aligned} \Vert {\mathcal {X}}_{(D_1,\ldots ,D_M)} \Vert _{\star } \le r_{cp}\gamma _{1}, \end{aligned}$$

where $\gamma _i$ is the ith singular value of ${\mathcal {X}}_{(D_1,\ldots ,D_M)}$.

Proof

Let us consider ${\mathcal {X}}$ as

$$\begin{aligned} {\mathcal {X}} = \sum _{j=1}^{r_{cp}} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Kj}, \end{aligned}$$

with $\Vert u_{kj}\Vert _{2}^{2} = 1, \gamma _{j} \ge \gamma _{j+1} > 0$. For the reshaping set $(D_1,\ldots ,D_M)$, we rearrange ${\mathcal {X}}$ as

$$\begin{aligned} {\mathcal {X}}_{(D_1,\ldots ,D_M)} = \sum _{j=1}^{r_{cp}} \gamma _{j} (\circ _{i_1 \in D_1} u_{i_1j}) \otimes (\circ _{i_2 \in D_2} u_{i_2j}) \cdots \otimes (\circ _{i_M \in D_M} u_{i_Mj}), \end{aligned}$$

where $a \circ b = [a_1b, a_2b, \ldots , a_n b]^{\top }$ is the Khatri-Rao product (Kolda and Bader 2009). It is easy to verify that $\mathrm {vec}((a \circ b) \otimes (c \circ d)) = \mathrm {vec}(a \otimes b \otimes c \otimes d)$, which indicates that $\mathrm {vec}({\mathcal {X}}) = \mathrm {vec}({\mathcal {X}}_{(D_1,\ldots ,D_M)})$.

Using the fact that $\mathrm {Rank}(a \otimes b) \le \mathrm {Rank}(a)\mathrm {Rank}(b)$ from Kolda and Bader (2009), we have

$$\begin{aligned} \mathrm {Rank}({\mathcal {X}}_{(D_1,\ldots ,D_M)}) \le \mathrm {Rank}({\mathcal {X}}) = r_{cp}. \end{aligned}$$

This lead to the final observation

$$\begin{aligned} \Vert {\mathcal {X}}_{(D_1,\ldots ,D_M)} \Vert _{\star } \le r_{cp}\gamma _{1}. \end{aligned}$$

$\square$

In order to prove Rademacher complexities in Theorems 1 and 2, we use the following lemma form Raskutti et al. (2015).

Lemma 5

(Raskutti et al. 2015) Consider a K-mode tensor ${\mathcal {X}} \in {\mathbb {R}}^{n_1 \times \cdots \times n_K}$ with random samples from an i.i.d. Gaussian tensor ensemble. Then

$$\begin{aligned} {\mathbb {E}}\Vert {\mathcal {X}} \Vert _{\mathrm {op}} \le 4\log (4K)\sum _{k=1}^{K}\sqrt{n_k}. \end{aligned}$$

Given a tensor ${\mathcal {X}} \in {\mathbb {R}}^{n_1 \times \cdots \times n_K}$ with Gaussian entries, we can write

$$\begin{aligned} {\mathbb {E}}{\mathcal {X}} = {\mathbb {E}}\sum _{i_1,i_2,\ldots ,i_K} {\mathcal {X}}_{i_1,i_2,\ldots ,i_K}e_{i_1} \otimes e_{i_2}\otimes \cdots \otimes e_{i_K}, \end{aligned}$$

where $e_{i_k}$ is the vector with 1 at the kth element and rest of the elements are zero. Due to each ${\mathcal {X}}_{i_1,i_2,\ldots ,i_K}$ being a Gaussian entry, we have

$$\begin{aligned} {\mathbb {E}}{\mathcal {X}} = {\mathbb {E}}_{g}{\mathbb {E}}_{\epsilon }\sum _{i_1,i_2,\ldots ,i_K} \epsilon _{i_1,i_2,\ldots ,i_K}|{\mathcal {X}}_{i_1,i_2,\ldots ,i_K}|e_{i_1} \otimes e_{i_2}\otimes \cdots \otimes e_{i_K}, \end{aligned}$$

where $\epsilon _{i_1,i_2,\ldots ,i_K} \in \{-1,1\}$. Using the Jensen’s inequality, we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{g}{\mathbb {E}}_{\epsilon }\sum _{i_1,i_2,\ldots ,i_K} \epsilon _{i_1,i_2,\ldots ,i_K}|{\mathcal {X}}_{i_1,i_2,\ldots ,i_K}|e_{i_1} \otimes e_{i_2}\otimes \cdots e_{i_K}\\&\;\; \ge {\mathbb {E}}_{\epsilon }\sum _{i_1,i_2,\ldots ,i_K} \epsilon _{i_1,i_2,\ldots ,i_K}{\mathbb {E}}_{g}|{\mathcal {X}}_{i_1,i_2,\ldots ,i_K}|e_{i_1} \otimes e_{i_2}\otimes \cdots e_{i_K} \\&\;\; \ge \sqrt{2\pi }{\mathbb {E}}_{\epsilon }\sum _{i_1,i_2,\ldots ,i_K} \epsilon _{i_1,i_2,\ldots ,i_K}e_{i_1} \otimes e_{i_2}\otimes \cdots e_{i_K}. \end{aligned} \end{aligned}$$

This shows that we can use the Lemma 3 to bound tensors with Bernoulli random variables.

Next we give the detailed proof of Theorem 1.

Proof of Theorem 1

We expand the Rademacher complexity in (5) as

$$R_{{\text{S}}} \left( {l \circ {\mathcal{W}}} \right) = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}\in W}} \sum\limits_{{i_{1} , \ldots ,i_{K} }} {\Sigma_{{i_{1} , \ldots ,i_{K} }} {l\left( {{\mathcal{X}}_{{i_{1} , \ldots ,i_{K} }} ,{\mathcal{W}}_{{i_{1} , \ldots ,i_{K} }} } \right)} } } \right],$$

where $\Sigma _{{i_1, \ldots ,i_K}} = \sigma _{j}$ when $(i_{1},\ldots ,i_{K}) \in \mathrm {S}$ and $\Sigma _{{i_1, \ldots ,i_K}} = 0$, otherwise.

We analyze the Rademacher complexity

$$\begin{aligned} R_{{\text{S}}} \left( {l \circ {\mathcal{W}}} \right) & = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}\in W}} \sum\limits_{{i_1, \ldots ,i_K}} \Sigma_{i_1, \ldots ,i_K} {l\left( {{\mathcal{X}}_{{i_1, \ldots ,i_K}} ,{\mathcal{W}}_{{i_1, \ldots ,i_K}} } \right)} } \right], \\ & \le \frac{\varLambda}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}\in W}} \sum\limits_{{i_1, \ldots ,i_K}} {\Sigma _{{i_1, \ldots ,i_K}} {\mathcal{W}}_{{i_1, \ldots ,i_K}} } } \right], ({\text{Rademacher contraction}}) \\ & \le \frac{\varLambda}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \mathop {\sup }\limits_{{{\mathcal{W}}\in W}} \left\| {{\mathcal{W}}_{{\left( {D_1, \ldots ,D_M} \right)}} } \right\|_{\star}\left\| {\Sigma _{{\left( {D_1, \ldots ,D_M} \right)}} } \right\|_{{\star}^{*}}, ({\text{Duality relationship}}) \\ \end{aligned}$$

(9)

(a) Given that tensor has a multilinear rank of $(r_1,\ldots ,r_K)$, using the Lemma 3 we know that

$$\begin{aligned} \Vert {\mathcal {W}}_{(D_1,\ldots ,D_{M})} \Vert _{\star } \le \bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}} \bigg ) \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})}) . \end{aligned}$$

(10)

Using Lemma 5 we can bound ${\mathbb{E}}_{\sigma } \left\| {\Sigma _{{\left( {D_{1} , \ldots ,D_{M} } \right)}} } \right\|_{\star^{*}}$ as

$${\mathbb{E}}_{\sigma } \left\| {\Sigma _{{\left( {D_{1} , \ldots ,D_{M} } \right)}} } \right\|_{{\star}^{*}} \le 4\log \left( {4M} \right)\sum\limits_{{j = 1}}^{M} {\sqrt {\mathop \Pi \limits_{{p\in D_{j} }} n_{p} .} }$$

(11)

By substituting (10) and (11) to (9), we obtain the following bound

$$\begin{aligned} R_{\mathrm {S}}(l \circ {\mathcal {W}}) \le \frac{c\varLambda }{|\mathrm {S}|}\bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}} \bigg ) \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})})\log (4M)\sum _{j = 1}^{M}\sqrt{ \prod _{p \in D_{j}} n_{p}}. \end{aligned}$$

(12)

(b) Given that a tensor has a CP rank of $r_{cp}$, using the Lemma 4 we have

$$\begin{aligned} \Vert {\mathcal {W}}_{(D_1,\ldots ,D_{M})} \Vert _{\star } \le r_{cp} \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})}). \end{aligned}$$

(13)

From Lemma 5, we have

$${\mathbb{E}}_{\sigma } \left\| {\Sigma _{{\left( {D_{1} , \ldots ,D_{M} } \right)}} } \right\|_{{\star}^{*}} \le 4\log \left( {4M} \right)\sum\limits_{{j = 1}}^{M} {\sqrt {\mathop \Pi \limits_{{p\in D_{j} }} n_{p} } } .$$

(14)

By substituting (13) and (14) to (9), we obtain the desired bound

$$\begin{aligned} R_{\mathrm {S}}(l \circ {\mathcal {W}}) \le \frac{c\varLambda }{|\mathrm {S}|}r_{cp} \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})})\log (4M)\sum _{j = 1}^{M} \sqrt{\prod _{p \in D_j} n^{|D_j|}}. \end{aligned}$$

(15)

$\square$

Next, we give the proof for Theorem 2.

Proof of Theorem 2

We expand the Rademacher complexity in (5) using latent tensors ${\mathcal {W}}^{(1)},\ldots ,{\mathcal {W}}^{(G)}$ for the reshaped latent tensor nuclear norm as

$$R_{{\text{S}}} \left( {l \circ \left( {{\mathcal{W}}^{{(1)}} + \cdots + {\mathcal{W}}^{{(G)}} } \right)} \right) = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}^{{(1)}} + \cdots + {\mathcal{W}}^{{(G)}} = W\in {\text{W}}_{{r1}} }} \sum\nolimits_{{i_1, \ldots ,i_K}} {\Sigma_{{i_1, \ldots ,i_K}} {l\left( {{\mathcal{X}}_{{i_1, \ldots ,i_K}} ,{\mathcal{W}}_{{i_1, \ldots ,i_K}} } \right)} } } \right],$$

where $\Sigma _{{i_1, \ldots ,i_K}} = \sigma _{j}$ when $(i_{1},\ldots ,i_{K}) \in \mathrm {S}$ and $\Sigma _{{i_1, \ldots ,i_K}} = 0$, otherwise.

We analyze the Rademacher complexity as

$$ \begin{aligned} R_{{\text{S}}} \left( {l \circ \left( {{\mathcal{W}}^{{(1)}} + \cdots + {\mathcal{W}}^{{(G)}} } \right)} \right) & = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}^{{(1)}} + \cdots + {\mathcal{W}}^{{(G)}} = {\mathcal{W}}\epsilon {\text{W}}_{{{\text{r1}}}} }} \sum\limits_{{i1, \ldots ,iK}} {\Sigma _{{i1, \ldots ,iK}} l\left( {{\mathcal{X}}_{{i1, \ldots ,iK}} ,{\mathcal{W}}_{{i1, \ldots ,iK}} } \right)} } \right], \\ & \le \frac{\varLambda}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}^{{(1)}} + \cdots + {\mathcal{W}}^{{(G)}} = {\mathcal{W}}\epsilon {\text{W}}_{{{\text{r1}}}} }} \sum\limits_{{i1, \ldots ,iK}} {\Sigma _{{i1, \ldots ,iK}} {\mathcal{W}}_{{i1, \ldots ,iK}} } } \right],\quad ({\text{Rademacher}}\;{\text{contraction}}) \\ & \le \frac{\varLambda}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \mathop {\mathop {\sup }\limits_{{{\mathcal{W}}^{{(1)}} + \cdots + {\mathcal{W}}^{{(G)}} = {\mathcal{W}}\epsilon {\text{W}}_{{{\text{r1}}}} }} }\limits_{{}} \left\| {\mathcal{W}} \right\|_{{{\text{r\_latent}}}} \left\| \Sigma \right\|_{{{\text{r\_latent}}*}} \quad {\text{(Duality}}\;{\text{relationship}}){\text{.}} \\ \end{aligned} $$

(16)

(a) For a tensor with multilinear rank, using Lemma 4 we obtain

$$\begin{aligned} \begin{aligned} \Vert {\mathcal {W}}\Vert _{\mathrm {r\_latent}}&= \inf _{{\mathcal {W}}^{(1)} + \cdots + {\mathcal {W}}^{(G)}= {\mathcal {W}}} \sum _{g=1}^{G} \Vert {\mathcal {W}}^{(k)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{m_{g}})} \Vert _{\star } \\&\le \min _{g \in G} \bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D^{(g)}_{j}} r_{i}} \bigg ) \gamma _1({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})}) . \end{aligned} \end{aligned}$$

(17)

Using Lemma 1 we can bound ${\mathbb{E}}_{\sigma } \left\| \Sigma \right\|_{{{\text{r\_latent*}}}}$ as

$${\mathbb{E}}_{\sigma } \left\| \Sigma \right\|_{{{\text{r\_latent*}}}} = \mathop {\max }\limits_{{g\in G}} \left\| {{\mathcal{W}}_{{\left( {D_{1}^{{(g)}} , \ldots ,D_{{M_{g} }}^{{(g)}} } \right)}}^{{(g)}} } \right\|_{\star} \le 4\mathop {\max \log (4M_{g} )}\limits_{{g\in G}} \sum\limits_{{j = 1}}^{{M_{g} }} {\sqrt {\mathop \Pi \limits_{{p\in D_{j}^{{(g)}} }} n_{p} } .}$$

(18)

By substituting (17) and (18) to (16), we obtain the following bound

$$\begin{aligned}R_{\mathrm {S}}(l \circ {\mathcal {W}}) &\le \frac{c\varLambda }{|\mathrm {S}|}\min _{g \in G} \bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D^{(g)}_{j}} r_{i}} \bigg ) \gamma _1({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})})\\&\quad \max _{g \in G} \log (4M_g) \sum _{j = 1}^{M_{g}}\sqrt{ \prod _{p \in D^{(g)}_{j}} n_{p} }. \end{aligned}$$

(b) For tensor with CP rank, using Lemma 4 we obtain

$$\begin{aligned} \Vert {\mathcal {W}}\Vert _{\mathrm {r\_latent}} = \inf _{{\mathcal {W}}^{(1)} + \cdots + {\mathcal {W}}^{(G)}= {\mathcal {W}}} \sum _{g=1}^{G} \Vert {\mathcal {W}}^{(k)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{m_{g}})} \Vert _{\star } \le \min _{g \in G} r_{cp} \gamma _1\left({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})}\right) . \end{aligned}$$

(19)

By substituting (19) and (18) to (16), we obtain the following bound

$$\begin{aligned} R_{\mathrm {S}}(l \circ {\mathcal {W}}) \le \frac{c\varLambda }{|\mathrm {S}|}\min _{g \in G} r_{cp} \gamma _1({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})})\max _{g \in G} \log (4M_g) \sum _{j = 1}^{M_{g}}\sqrt{ \prod _{p \in D^{(g)}_{j}} n_{p} }. \end{aligned}$$

$\square$

Finally, we derive the Rademacher complexity for the tensor completion model regularized by the Schatten TT norm.

Theorem 3

Consider a K-mode tensor ${\mathcal {W}} \in {\mathbb {R}}^{n_1 \times \ldots \times n_K}$ with a multilinear rank of $(r_1,\ldots ,r_K)$. Let us consider the hypothesis class $\textsf {W}_{\mathrm {TT}} = \{{\mathcal {W}}| \Vert {\mathcal {T}} \Vert _{s,T} \le t \}$. Then Rademacher complexity is bounded as

$$\begin{aligned} R_{\mathrm {S}}(l \circ {\mathcal {W}}) \le \frac{c'\varLambda }{|\mathrm {S}|} \sum _{k=1}^{K-1} \min \Bigg ( \prod _{i=1}^{k} \sqrt{r_i}, \prod _{j=k+1}^{K} \sqrt{r_j} \Bigg ) B_{{\mathcal {T}}} \min _{k=1,\ldots ,K-1} \Bigg (\sqrt{\prod _{i < k}{n_{i}}} + \sqrt{\prod _{j \ge k}^{K}{n_{j}}} \Bigg ), \end{aligned}$$

(20)

where $\Vert {\mathcal {W}} \Vert _{\mathrm {F}} \le B_{{\mathcal {T}}}$ and $c'$ is a constant.

Proof

For this case we consider the hypothesis class ${\mathcal {W}}_{\mathrm {TT}}$ for the Rademacher complexity follows as

$$R_{{\text{S}}} \left( {l \circ {\mathcal{W}}} \right) = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}_{{{\text{TT}}}} \in {\text{W}}}} \sum\nolimits_{{i_1, \ldots ,i_K}} {\Sigma _{{i_1, \ldots ,i_K}} l\left( {{\mathcal{X}}_{{i_1, \ldots ,i_K}} ,{\mathcal{W}}_{{i_1, \ldots ,i_K}} } \right)} } \right],$$

where $\Sigma _{{i_1, \ldots ,i_K}} = \sigma _{j}$ when $(i_{1},\ldots ,i_{K}) \in S$ and $\Sigma _{{i_1, \ldots ,i_K}} = 0$, otherwise.

Now we analyze the Rademacher complexity for the hypothesis class ${\mathcal {W}}_{\mathrm {TT}}$. We have

$$\begin{aligned} R_{{\text{S}}} \left( {l \circ {\mathcal{W}}} \right) & = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}\in {\text{W}}_{{{\text{TT}}}} }} \sum\limits_{{i_1, \ldots ,i_K}} {\Sigma _{{i_1, \ldots ,i_K}} l\left( {{\mathcal{X}}_{{i_1, \ldots ,i_K}} ,{\mathcal{W}}_{{i_1, \ldots ,i_K}} } \right)} } \right], \\ & \le \frac{\varLambda}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}\in {\text{W}}_{{{\text{TT}}}} }} \sum\limits_{{i_1, \ldots ,i_K}} {\Sigma _{{i_1, \ldots ,i_K}} {\mathcal{W}}_{{i_1, \ldots ,i_K}} } } \right],\quad {\text{(Rademacher}}\;{\text{contraction)}} \\ & \le \frac{\varLambda}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \mathop {\sup }\limits_{{{\mathcal{W}}\in {\text{W}}_{{{\text{TT}}}} }} \left\| W \right\|_{{s,T}} \left\| \Sigma \right\|_{{s,T}^{*}} ,\quad {\text{(Duality}}\;{\text{relationship)}} \\ \end{aligned}$$

(21)

where $\Vert \cdot \Vert _{{s,T}^{*}}$ is the dual norm of $\Vert \cdot \Vert _{{s,T}}$. The last step can be obtained by applying the Holder’s inequality to the sum of trace norms in the Schatten TT norm.

Considering $\Vert {\mathcal {W}} \Vert _{s,T}$, we can expand it as

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{s,T} = \frac{1}{K-1} \sum _{k=1}^{K-1} \Vert Q_{k}({\mathcal {T}}) \Vert _{\mathrm {tr}} = \frac{1}{K-1} \sum _{k=1}^{K-1} \sum _{i_{k}=1}^{{\hat{r}}_{k}} \gamma _{i_k}(Q_{k}({\mathcal {T}})), \end{aligned}$$

where $Q_{k}: {\mathcal {T}} \rightarrow {\mathbb {R}}^{n_{\ge k} \times n_{k <}}$ is a reshaping operator, and $\gamma _{i_k}()$ and ${\hat{r}}_k$ are the $i_k$th singular value and the rank of the reshaped tensor by $Q_{k}$, respectively. Using the Cauchy-Schwarz inequality, we have

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{s,T} \le \frac{1}{K-1} \sum _{k=1}^{K-1} \sqrt{{\hat{r}}_{k}} \sqrt{\sum _{i_{k}=1}^{{\hat{r}}_{k}} \gamma _{i_k}^{2}(Q_{k}({\mathcal {T}}))} = \frac{1}{K-1} \sum _{k=1}^{K-1} \sqrt{{\hat{r}}_{k}} B_{{\mathcal {T}}}, \end{aligned}$$

where $\Vert {\mathcal {T}} \Vert _{\mathrm {F}} = B_{{\mathcal {T}}}$. Using Lemmas 1 and 2, we can infer that

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{s,T} \le \frac{1}{K-1} \sum _{k=1}^{K-1} \min \Bigg ( \prod _{i=1}^{k} \sqrt{r_i}, \prod _{j=k+1}^{K} \sqrt{r_j} \Bigg ) B_{{\mathcal {T}}}, \end{aligned}$$

(22)

Similar to the overlapped trace norm (Tomioka and Suzuki 2013), the Schatten TT norm also sums nuclear norms of the the same tensor reshaped into different matrices. Hence, we can extend the dual norm of the overlapped trace norm in Tomioka and Suzuki (2013) to the Schatten TT norm. Using (Tomioka and Suzuki 2013), it is easy to the dual norm of Schatten TT norm as

$$\left\| \Sigma \right\|_{{s,T}^{*}} = \mathop {\inf }\limits_{{\Sigma ^{{(1)}} + \cdots + \Sigma ^{{(K)}} = \Sigma }} \sum\nolimits_{{k = 1}}^{{K - 1}} {\left\| {Q_{k} \left( {\Sigma ^{{(k)}} } \right)} \right\|_{{{\text{op}}}} .}$$

We want to bound

$${\mathbb{E}}\left\| \Sigma \right\|_{{s,T}^{*}} = {\mathbb{E}}\mathop {\inf }\limits_{{\Sigma ^{{(1)}} + \cdots + \Sigma ^{{(K)}} = \Sigma }} \sum\nolimits_{{k = 1}}^{{K - 1}} {\left\| {Q_{k} \left( {\Sigma ^{{(k)}} } \right)} \right\|_{{{\text{op}}}} .}$$

and since we can take any of $\Sigma ^{{(k)}} ,k = 1, \ldots ,K$ to be equal to ${\Sigma }$, we have

$${\mathbb{E}}\left\| \Sigma \right\|_{{s,T}^{*}} \le \mathop {\min }\limits_{{k = 1, \ldots ,K - 1}} \left\| {Q_{k} \left( \Sigma \right)} \right\|_{{{\text{op}}}} .$$

We apply Latała’s Theorem (Latała 2005; Shamir and Shalev-Shwartz 2014) to the reshaping by the $Q_{k}$ operator and bound ${\mathbb{E}}\left\| {Q_{k} \left( \Sigma \right)} \right\|_{{{\text{op}}}}$ as

$${\mathbb{E}}\left\| {Q_{k} \left( \Sigma \right)} \right\|_{{{\text{op}}}} \le C_{1} \left( {\sqrt {\mathop \Pi \limits_{{i < k}} n_{i} + \sqrt {\mathop \Pi \limits_{{j \ge k}}^{K} n_{j} + \sqrt[4]{{\left| {Q_{k} (\Sigma )} \right|}}} } } \right),$$

and since $\root 4 \of {|Q_{k}(\Sigma )|} \le \root 4 \of { \prod _{i=1}^{K}{n_{i}}} \le \frac{1}{2} \bigg (\sqrt{\prod _{i < k}{n_{i}}} + \sqrt{\prod _{j \ge k}^{K}{n_{j}}} \bigg )$, we have,

$${\mathbb{E}}\left\| {Q_{k} \left( \Sigma \right)} \right\|_{{{\text{op}}}} \le \frac{{3C_{1} }}{2}\left( {\sqrt {\mathop \Pi \limits_{{i < k}} n_{i} + \sqrt {\mathop \Pi \limits_{{j \ge k}}^{K} n_{j} } } } \right).$$

This gives us the bounds for ${\mathbb{E}}\left\| \Sigma \right\|_{{s,T}^{*}}$ as

$${\mathbb {E}}\left\| \Sigma \right\|_{{s,T}^{*}} \le \mathop {\min }\limits_{{k = 1, \ldots ,K - 1}} \frac{{3C_{1} }}{2}\left( {\sqrt {\mathop \Pi \limits_{{i < k}} n_{i} + \sqrt {\mathop \Pi \limits_{{j \ge k}}^{K} n_{j} } } } \right).$$

(23)

By combining (22) and (23) to (21), we obtain

$$R_{{\text{S}}} \left( {l \circ {\mathcal{W}}} \right) \le \frac{{c^{{\prime }} {\varLambda}}}{{\left| {\text{S}} \right|\left( {K - 1} \right)}}\sum\limits_{{k = 1}}^{{K - 1}} {\min \left( {\mathop \Pi \limits_{{i = 1}}^{k} \sqrt {r_{i} } ,\mathop \Pi \limits_{{j = k + 1}}^{K} \sqrt {r_{j} } } \right)B_{{\mathcal{T}}} \mathop {\min }\limits_{k} } \left( {\sqrt {\mathop \Pi \limits_{{i < k}} n_{i} + \sqrt {\mathop \Pi \limits_{{j \ge k}}^{K} n_{j} } } } \right).$$

(24)

$\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wimalawarne, K., Mamitsuka, H. Reshaped tensor nuclear norms for higher order tensor completion. Mach Learn 110, 507–531 (2021). https://doi.org/10.1007/s10994-020-05927-y

Download citation

Received: 11 June 2020
Revised: 11 June 2020
Accepted: 21 October 2020
Published: 03 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10994-020-05927-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Reshaped tensor nuclear norms for higher order tensor completion

Abstract

Similar content being viewed by others

Bolstering stochastic gradient descent with model building

The Frank-Wolfe Algorithm: A Short Introduction

Sum-of-Squares Relaxations for Information Theory and Variational Inference

1 Introduction

2 Review of low-rank tensor norms

3 Proposed method: tensor reshaping and tensor nuclear norm

3.1 Generalized tensor reshaping

Definition 1

3.2 Reshaped tensor nuclear norm

3.3 Reshaped latent nuclear norm

3.4 Completion models

4 Theory

Theorem 1

Theorem 2

4.1 Optimal reshaping strategies

5 Optimization procedures

6 Experiments

6.1 Simulation experiments

6.2 Multi-view video completion

7 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Dual norms of reshaped tensor nuclear norms

Lemma 1

Proof

1.2 Proofs of theoretical analysis

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof of Theorem 1

Proof of Theorem 2

Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation