1 Introduction

Tensor formatted data is becoming abundant in machine learning applications. Among the many tensor related machine learning problems, tensor completion has gained an increased popularity in recent years. Tensor completion performs imputation of unknown elements of a partially observed tensor by exploiting its low-rank structure. Some of the popular real-world applications of tensor completion are found in recommendation systems (Karatzoglou et al. 2010; Zheng et al. 2010), computer vision (Liu et al. 2009), and multi-relational link prediction (Rai et al. 2015). Though there exist many methods to perform tensor completion (Song et al. 2017), global optimal solutions are obtained mainly by convex low-rank tensor norms, making them an active area of research.

Over the years, many researchers have proposed different low-rank inducing norms to minimize the rank of tensors, however, none of these norms are universally better compared to others. The main challenge in designing norms for tensors is that they have multiple dimensions and different definitions of ranks (Tucker rank, CP rank, TT-rank), making it difficult for a single norm to induce low-rankness with respect to all the properties of tensors. Most tensor norms have been designed with a focus to a specific rank; overlapped trace norm (Tomioka and Suzuki 2013) and latent trace norms (Wimalawarne et al. 2014) to constrain the multilinear ranks, tensor nuclear norm (Yuan and Zhang 2016; Yang et al. 2015; Lim and Comon 2014) to constrain the CP rank, and the Schatten TT rank (Imaizumi et al. 2017) to constrain the TT-rank. However, targeting a specific rank to constrain may not always be practical, since we may not know the most suitable rank for a tensor in advance.

Most tensor norms reshape tensors by rearranging its elements as matrices to induce low-rankness with respect to a mode or a set of modes. However, this reshaping method is specific to obtaining relevant ranks that a norm constrains. An alternative view was presented by Mu et al. (2014) with the square norm, where the tensor is reshaped as a balanced matrix without considering the structure of its ranks. The square norm has shown to have better sample complexities for higher order tensors (tensor with more than three modes) than some of the existing norms such as the overlapped trace norm (Yuan and Zhang 2016). However, this norm only considers the special case of reshaping a tensor as a matrix such that its dimensions are close to each other. Other possibilities of how reshaping tensors beyond matrices affect the inducement of low-rankness have not been investigated.

In this paper, we propose generalized reshaping strategies to reshape tensors and develop low-rank inducing tensor norms. We demonstrate that reshaping a higher order tensor as another tensor and applying the tensor nuclear norm leads to better inducement of low-rankness compared to applying existing low-rank norms on the original tensor or its matrix unfoldings. Furthermore, we propose the latent reshaped tensor nuclear norm that combines multiple reshaped tensors to obtain a better performance among a set of reshaped tensors. Using the generalization bounds, we show that the proposed norms are able to give lower Rademacher complexities compared to existing norms. Using simulations and real-world data experiments we justify our theoretical analysis and show that our proposed methods are able to give better performances for tensor completion compared other convex norms.

Throughout this paper we use the following notations. We represent a K-mode (K-way) tensor as \({\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}\). The mode-k unfolding (Kolda and Bader 2009) of a tensor \({\mathcal {T}}\) is given by \(T_{(k)} \in {\mathbb {R}}^{n_{k}\times \prod _{j \ne k}{n_{j}}}\), which is obtained by concatenating all slices along the mode-k. We indicate the tensor product (Hackbusch 2012) between vectors \(u_{i} \in {\mathbb {R}}^{n_{i}}, \; i = 1,\ldots ,K\) using the notation \(\otimes\) as \((u_{1}\otimes \cdots \otimes u_{K})_{i_{1},\ldots ,i_{K}} = \prod _{l=1}^{K}u_{l,i_{l}}\). The k-mode product of a tensor \({\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_k \cdots \times n_{K}}\) and a vector \(v \in {\mathbb {R}}^{n_k}\) is defined as \({\mathcal {T}} \times _{k} v = \sum _{i_{k}=1}^{n_k} {\mathcal {T}}_{i_1,i_2,\ldots ,i_{k},\ldots ,i_{K}}v_{i_k}\). The largest singular value of \({\mathcal {T}}\) is given by \(\gamma _1({\mathcal {T}})\). The rank of a matrix \(A \in {\mathbb {R}}^{n \times m}\) is given by \(\mathrm {Rank}(A)\).

2 Review of low-rank tensor norms

Designing of convex low-rank inducing norms for tensors is a challenging task. Over the years, several tensor norms have been proposed with each norm having certain advantages over the others. The main challenge with defining tensor norms is the multi-dimensionality of tensors and the existence of different ranks (e.g. CP rank, multilinear (Tucker) rank). A common criterion for designing low-rank tensor norms is to induce low-rankness by minimizing a particular rank. A commonly used rank is the multilinear rank, which represents the rank with respect to each mode of a tensor. Given a tensor \({\mathcal {W}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}\), we obtain the rank of each unfolding \(r_{k} = \mathrm {Rank}(W_{(k)}),\;k=1,\ldots ,K\), and define the multilinear rank as \((r_{1},\ldots ,r_{K})\). To minimize the multilinear rank the overlapped trace norm has been defined (Liu et al. 2009; Tomioka and Suzuki 2013), which for a tensor \({\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}\), is

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{\mathrm {overlap}} = \underset{k=1}{\overset{K}{\sum }} \Vert T_{(k)}\Vert _{\mathrm {tr}}, \end{aligned}$$

where \(\Vert \cdot \Vert _{\mathrm {tr}}\) is the matrix nuclear norm (a.k.a. trace norm) (Fazel et al. 2001), which is the sum of the non-zero singular values of a matrix. A limitation with this norm is that for tensors with high variations in the multilinear rank this norm stays at poor performances (Tomioka and Suzuki 2013; Wimalawarne et al. 2014).

The latent trace norm (Tomioka and Suzuki 2013) has been proposed to overcome limitations of the overlapped trace norm, which allows freedom to learn ranks with respect to each mode unfolding by considering a latent decomposition of the tensor. More specifically, the latent tensor norm learns latent tensors \({\mathcal {T}}^{(k)}, \;k=1,\ldots ,K\) as

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{\mathrm {latent}} = \underset{{\mathcal {T}}^{(1)} + \ldots +{\mathcal {T}}^{(K)} = {\mathcal {T}} }{\inf } \underset{k=1}{\overset{K}{\sum }} \Vert T_{(k)}^{(k)} \Vert _{\mathrm {tr}} . \end{aligned}$$

This norm was shown to be more robust for tensors with high variations in the multilinear rank compared to the overlapped trace norm (Tomioka and Suzuki 2013). The latent trace norm has been further extended to develop the scaled latent trace norm (Wimalawarne et al. 2014) by considering the relative rank of each latent tensor by scaling using the inverse squared mode dimension.

Another popular rank for tensors is the CANDECOMP/PARAFAC (CP) rank (Carroll and Chang 1970; Harshman 1970; Hitchcock 1927; Kolda and Bader 2009), which can be considered as the higher order extension of the matrix rank. Recently, minimization of the CP rank has gained attention of many researchers, who have shown that it leads to a better sample complexity than multilinear rank based norms (Yuan and Zhang 2016). The tensor nuclear norm (Yuan and Zhang 2016; Yang et al. 2015; Lim and Comon 2014) has been defined as an approximation to minimize the CP rank of a tensor. For a tensor \({\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K}}\) with rank R, \(\mathrm {Rank}({\mathcal {T}}) = R\), the tensor nuclear norm is defined as

$$\begin{aligned} \Vert {\mathcal {T}}\Vert _{*}&= \inf \Bigg \{ \sum _{j=1}^{R} \gamma _{j} | {\mathcal {T}} = \sum _{j=1}^{R} \gamma _{j} u_{1j} \otimes u_{2j} \cdots \otimes u_{Kj}, \nonumber \\ \Vert u_{kj}\Vert _{2}^{2}&= 1, \gamma _{j} \ge \gamma _{j+1} > 0 \Bigg \}, \end{aligned}$$
(1)

where \(u_{kj} \in {\mathbb {R}}^{n_k}\) for \(k=1,\ldots ,K\) and \(j=1,\ldots ,R\).

The latest addition to convex low-rank tensor norms is the Schatten TT norm (Imaizumi et al. 2017), which minimizes the tensor train rank (Oseledets 2011) of tensors. The Schatten TT norm is defined as

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{s,T} = \frac{1}{K-1} \sum _{k=1}^{K-1} \Vert Q_{k}({\mathcal {T}}) \Vert _{\mathrm {tr}}, \end{aligned}$$

where \(Q_{k}: {\mathcal {T}} \rightarrow {\mathbb {R}}^{n_{\ge k} \times n_{k <}}\) is an operator that reshapes the tensor \({\mathcal {T}}\) to a matrix by combining the first k modes as rows and the rest of the \(K-k\) modes as columns. This norm has been shown to be suitable for high-order tensors.

It has also been shown that low-rank tensor norms can be designed without restricting to a specific rank. The square norm (Mu et al. 2014) reshapes a tensor as a matrix and apply the matrix nuclear norm as

$$\begin{aligned} \Vert {\mathcal {T}} \Vert _{\square } = \Bigg \Vert \mathrm {reshape}\Bigg ({\mathcal {T}}_{(1)}, \prod _{i=1}^{j} , \prod _{i=j+1}^{K} \Bigg ) \Bigg \Vert _{\mathrm {tr}}, \end{aligned}$$

where the function \(\mathrm {reshape}()\) reshapes \({\mathcal {T}}\) to a matrix with approximately equal dimensions for some \(j>1\). This norm has shown to have a better sample complexity for tensor completion compared to the overlapped trace norm.

We point out that all the existing tensor norms except the tensor nuclear norm reshape tensors as matrices to induce the low-rankness with respect to two sets of mode arrangements. As a result these norms are focused on constraining the multilinear rank of a tensor. However, tensor nuclear norm has shown to lead to a better sample complexity compared to multilinear rank based tensors norms for tensor completion (Yuan and Zhang 2016). Hence, lack of tensor nuclear norm regularization for reshaped tensors among existing norms may results in sub-optimal solutions.

3 Proposed method: tensor reshaping and tensor nuclear norm

In this paper, we investigate on extending the tensor nuclear norm for higher order tensors. We explore methods to combine tensor reshaping with the tensor nuclear norm.

3.1 Generalized tensor reshaping

First, we introduce the following notation to compute the product of tensor dimensions. For a given vector \((n_{1},\ldots ,n_{p})\), we represent its element-wise product by \(\mathsf {prod}(n_{1},\ldots ,n_{p}) = n_{1}n_{2}\cdots n_{p}\). Next, we define generalized reshaping for tensors.

Definition 1

(Tensor Reshaping) Let us consider a tensor \({\mathcal {X}} \in {\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}}\) and its mode dimensions as \(D = \{n_{1}, n_{2},\ldots , n_{K}\}\). Given M sets \(D_{i} \subset D,\;i=1,\ldots ,M\), that are disjoint, \(D_{i} \cap D_{j} = \emptyset\) for \(i \ne j\), the reshaping operator is defined as

$$\begin{aligned} \varPi _{(D_{1},\ldots ,D_{M})}:{\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}} \rightarrow {\mathbb {R}}^{\textsf {prod}(D_{1}) \times \cdots \times \textsf {prod}(D_{M})}, \end{aligned}$$

and the inverse operator is represented by \(\varPi _{(D_{1},\ldots ,D_{M})}^{\top }\). Further, we present the reshaping of \({\mathcal {X}}\) by the set \((D_{1},\ldots ,D_{M})\) as \({\mathcal {X}}_{(D_{1},\ldots ,D_{M})}\).

We point out that when \(| D_{1} | = \cdots = | D_{M} | = 1\), there is no reshaping of the tensor, \({\mathcal {X}}_{(D_{1},\ldots ,D_{M})} = {\mathcal {X}}\). Unfolding of a tensor along the mode k (Kolda and Bader 2009) is equivalent to defining two sets with \(D_{1} = n_{k}\) and \(D_{2} = (n_{1}, \ldots ,n_{k-1},n_{k+1}, \ldots , n_{K})\). Further, we can obtain reshaping of a tensor as a matrix for the square norm (Mu et al. 2014) by specifying two sets \(D_{1}\) and \(D_{2}\) with \(\textsf {prod}(D_{1}) \approx \textsf {prod}(D_{2})\).

3.2 Reshaped tensor nuclear norm

We propose a class of tensor norms by combining generalized tensor reshaping and the tensor nuclear norm. We name the proposed norms Reshaped Tensor Nuclear Norms. In order to define the proposed norms, we consider a K-mode tensor \({\mathcal {X}} \in {\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}}\) and a set \(D_{i},\;i=1,\ldots ,M\), adhering to Definition 1. We define the reshaped tensor nuclear norm as

$$\begin{aligned} \Vert {\mathcal {X}}_{(D_1,\ldots ,D_M)} \Vert _{*}, \end{aligned}$$

where \(\Vert \cdot \Vert _{*}\) is the tensor nuclear norm as defined in (1). It is understood that this norm is a convex norm, since the tensor nuclear norm (1) is convex.

3.3 Reshaped latent nuclear norm

A practical limitation in applying reshaping using the proposed tensor norm is the difficulty to select the most suitable reshaping set out of all possible reshaping combinations. This is critical since we would not know the ranks of the tensor prior to training a learning model. To overcome this difficulty we propose the Reshaped Latent Tensor Nuclear Norm by extending the latent trace norm (Tomioka and Suzuki 2013) for reshaping tensors.

Let us consider a collection of G reshaping sets \(D_{\mathrm {L}} = (D^{(1)},\ldots ,D^{(G)})\) where each \(D^{(s)} = (D^{(s)}_1,\ldots ,D^{(s)}_{m_{s}}),\;s=1,\ldots ,G\) consists a reshaping set for a \(m_{s}\)-mode reshaped tensor. Further, we consider the \({\mathcal {W}}\) as a summation of G latent tensors \({\mathcal {W}}^{(g)},\;g=1,\ldots ,G\) as \({\mathcal {W}} = \sum _{k=1}^{G} {\mathcal {W}}^{(k)}\). We define the reshaped latent tensor nuclear norm as

$$\begin{aligned} \Vert {\mathcal {W}} \Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})} = \inf _{{\mathcal {W}}^{(1)} + \cdots + {\mathcal {W}}^{(G)}= {\mathcal {W}}} \sum _{k=1}^{G} \Vert {\mathcal {W}}^{(k)}_{(D^{(k)}_{1},\ldots ,D^{(k)}_{m_{k}})} \Vert _{*}. \end{aligned}$$
(2)

We point out that the above norm differs from the latent trace norm (Tomioka and Suzuki 2013) since it considers reshaping sets defined by the user where the latent trace norm considers all the mode-wise tensor unfolding. Furthermore, the above norm uses the tensor nuclear norm while the latent trace norm is build using the matrix nuclear norm.

3.4 Completion models

Now, we propose tensor completion models for the proposed norms. Let us consider a partially observed tensor \({\mathcal {X}} \in {\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}}\). Given that \({\mathcal {X}}\) has m observed elements, we define the mapping of the observed elements from \({\mathcal {X}}\) by \(\varOmega :{\mathbb {R}}^{n_{1} \times n_{2} \times \cdots \times n_{K}} \rightarrow {\mathbb {R}}^{m}\). Given a reshaping set \({(D_{1},\ldots ,D_{M})}\), the completion model that is regularized by the reshaped norm is given as

$$\begin{aligned}&\min _{{\mathcal {W}}}\frac{1}{2}\Vert \varOmega ({\mathcal {X}}) - \varOmega ({\mathcal {W}}) \Vert _{\mathrm {F}}^{2}\nonumber \\&\quad \mathrm {s.t.} \;\; \Vert {\mathcal {W}}_{(D_{1},\ldots ,D_{M})}\Vert _{*} \le \lambda , \end{aligned}$$
(3)

where \(\lambda\) is a regularization parameter. For a selected set of reshaping sets \(D_{\mathrm {L}}=(D^{(1)},\ldots ,D^{(G)})\), a completion model regularized by the reshaped latent tensor nuclear norm is given as

$$\begin{aligned}&\min _{{\mathcal {W}}^{(1)}+\cdots +{\mathcal {W}}^{(G)}={\mathcal {W}}}\frac{1}{2}\Vert \varOmega ({\mathcal {X}}) - \varOmega ({\mathcal {W}}^{(1)}+\cdots +{\mathcal {W}}^{(G)}) \Vert _{\mathrm {F}}^{2}\nonumber \\&\quad \mathrm {s.t.} \;\; \Vert {\mathcal {W}}\Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})} \le \lambda , \end{aligned}$$
(4)

where \(\lambda\) is a regularization parameter.

4 Theory

We investigate theoretical properties of our proposed methods to identify the optimal conditions for reshaping of tensors. For our analysis, we use generalization bounds based on the transductive Rademacher complexity analysis (El-Yaniv and Pechyony 2007; Shamir and Shalev-Shwartz 2014).

We consider the learning problem in (3) and we denote the indexes of the observed elements of \({\mathcal {X}}\) by \(\mathrm {S}\), where each index \((i_{1},\cdots ,i_{K})\) of observed elements of \({\mathcal {X}}\) is assigned as an element \(\alpha _j \in \mathrm {S}\) for some \(1 \le j \le |\mathrm {S}|\). We consider observed elements as the training set denoted by \({\rm {S}}_{\rm {Train}}\) and the rest belonging to the test set denoted by \({\rm {S}}_{\rm {Test}}\). For the convenience of deriving the Rademacher complexity, we consider the special case of \(|{\mathrm {S}_{\mathrm {Train}}}| = |{\mathrm {S}}_{\mathrm {Test}}| = |\mathrm {S}|/2\) as in (Shamir and Shalev-Shwartz 2014).

Given a reshaping set \((D_1,\ldots ,D_M)\), we consider the hypothesis class \(\textsf {W} = \{{\mathcal {W}}| \Vert {\mathcal {W}}_{(D_{1},\ldots ,D_{M})}\Vert _{*} \le t \}\) for a given t. Given a loss function \(l(\cdot ,\cdot )\) and a set \(\mathrm {S}\), we define the empirical loss as

$$\begin{aligned} L_{\mathrm {S}}(l \circ {\mathcal {W}}) := \frac{1}{|\mathrm {S}|} \Bigg [ \sum _{(i_{1},\ldots ,i_{K}) \in \mathrm {S} } l({\mathcal {X}}_{i_{1},\ldots ,i_{K}}, {\mathcal {W}}_{i_{1},\ldots ,i_{K}}) \Bigg ] . \end{aligned}$$

Given that \(\max _{i_{1},\ldots ,i_{K} \; {\mathcal {W}} \in \textsf {W} } |l({\mathcal {X}}_{i_{1},\ldots ,i_{K}},{\mathcal {W}}_{i_{1},\ldots ,i_{K}})| \le b_{l}\), it is straight forward to extend generalizing bounds for matrices from (Shamir and Shalev-Shwartz 2014) to tensors, which holds with probability \(1-\delta\) as

$$\begin{aligned}L_{{\mathrm {S}}_{\rm {Test}}}(l \circ {\mathcal {W}}) - L_{{\mathrm {S}}_{\rm {Train}}}(l \circ {\mathcal {W}}) &\le 4R_{\mathrm {S}}(l \circ {\mathcal {W}}) \\&\quad + b_{l}\Bigg (\frac{11 + 4\sqrt{\log {\frac{1}{\delta }}}}{\sqrt{|{\mathrm {S}}_{\rm {Train}} |} }\Bigg ), \end{aligned}$$

where \(R_{\mathrm {S}}(l \circ {\mathcal {W}})\) is the transductive Rademacher complexity theory (El-Yaniv and Pechyony 2007; Shamir and Shalev-Shwartz 2014) defined as

$$R_{{\text{S}}} \left( {l \circ {\mathcal{W}}} \right) = \frac{1}{{\left| {\text{S}} \right|}}{\mathbb{E}}_{\sigma } \left[ {\mathop {\sup }\limits_{{{\mathcal{W}}\in W}} \sum\limits_{{j = 1}}^{{\left| {\text{S}} \right|}} {\sigma _{j} l\left( {{\mathcal{X}}_{{\alpha j}} ,{\mathcal{W}}_{{\alpha j}} } \right)} } \right],$$
(5)

where \(\sigma _{j} \in \left\{ { - 1,1} \right\},\quad j = 1, \ldots ,\left| {\text{S}} \right|\) with probability of 0.5 are Rademacher variables.

The following theorem gives the Rademacher complexity for completion models regularized by a reshaped tensor nuclear norm.

Table 1 Rademacher complexities for convex norm regularized completion models for a K-mode tensor \({\mathcal {T}} \in {\mathbb {R}}^{n \times \cdots \times n}\) with a multilinear rank \((r_1,\ldots ,r_K)\). \(\gamma _{1}({\mathcal {X}})\) is the largest singular value of \({\mathcal {X}}\), G reshaping sets of \(D^{(s)}=(D^{(s)}_1,\ldots ,D^{(s)}_{m_{s}}),\;g=1,\ldots ,G\), and c, \(\varLambda\), and \(B_{{\mathcal {T}}}\) are constants

Theorem 1

Consider a K-mode tensor \({\mathcal {W}} \in {\mathbb {R}}^{n_1 \times n_2 \times \cdots \times n_K}\). Let us consider any M reshaping sets \((D_{1},\ldots ,D_{M})\) with a hypothesis class of \(\textsf {W} = \{{\mathcal {W}}| \Vert {\mathcal {W}}_{(D_{1},\ldots ,D_{M})}\Vert _{*} \le t \}\). Suppose that for all \((i_1,\ldots ,i_K)\), \(l({\mathcal {X}}_{i_1,\ldots ,i_K}, \cdot )\) is \(\varLambda\)-Lipschitz continuous. Then,

(a) given that \({\mathcal {W}}\) has a multilinear rank of \((r_1,\ldots ,r_K)\), we obtain

$$\begin{aligned}R_{\mathrm {S}}(l \circ {\mathcal {W}}) &\le \frac{c\varLambda }{|\mathrm {S}|}\bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}} \bigg ) \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})})\\&\quad \log (4M)\sum _{j = 1}^{M} \sqrt{ \prod _{p \in D_{j}} n_{p}}, \end{aligned}$$

(b) given that\({\mathcal {W}}\) has a CP rank of\(r_{cp}\), we obtain

$$\begin{aligned} R_{\mathrm {S}}(l \circ {\mathcal {W}}) \le \frac{c\varLambda }{|\mathrm {S}|}r_{cp} \gamma _1({\mathcal {W}}_{(D_1,\ldots ,D_{M})})\log (4M)\sum _{j = 1}^{M} \sqrt{ \prod _{p \in D_{j}} n_{p}}, \end{aligned}$$

where c is a constant.

Using the Theorem 1, we can obtain the Rademacher complexities for tensor nuclear norm by considering \(|D_1| = |D_2| = \cdots = |D_K|=1\) and the square norm by two reshaping sets of \(|D_1|\) and \(|D_2|\) such that \(\prod _{p \in D_1}n_{p} \approx \prod _{q \in D_2} n_{q}\). We summarize Rademacher complexities of convex low-rank tensor norms in Table 1 for a tensor with equal mode dimensions (\(n_1=n_2=\ldots =n_{K}=n\)).

From Table 1 and Theorem 1, we see that norms constructed using the tensor nuclear norm lead to better bounds compared to the overlapped trace norm, latent trace norm, and the scaled latent trace norm. Further, we see that the mode based components of the Rademacher complexity would have the smallest value with the tensor nuclear norm \((\log (4K)\sqrt{Kn})\). It is also clear that for any reshaping set, we find that \(\log (4K)\sqrt{Kn} \le \log (4M)\sum _{j = 1}^{M} \sqrt{ n^{|D_j|}}\). This observation might lead us to conclude that the tensor nuclear norm is better than all the other norms. However, considering the multilinear rank such that \(1 < r_1 \le r_2 \le \cdots \le r_K\), we can always find \(M < K\) reshaping sets \(D_1, D_2, \ldots , D_M\) such that \(\frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,K} r_{j}} \ge \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D_{j}} r_{i}}\). In other words, we can reshape the tensor such that the Rademacher complexity for the reshaped tensor nuclear norm is bounded with a smaller rank based component compared to the tensor nuclear norm.

It is not known how reshaping a tensor changes the CP rank of the original tensor into the rank of the reshaped tensor except that \(\mathrm {Rank}({\mathcal {X}}_{(D_{1},\ldots ,D_{M})}) \le r_{cp}\) ((Mu et al. 2014) and Lemma 4 in “Appendix”). However, Theorem 1 shows that reshaping results in a mode based component of \(\log (4M)\sum _{j = 1}^{M} \sqrt{ n^{|D_j|}}\) in the Rademacher complexity, which indicates that selecting a reshaping set that gives a lower mode based components can lead to a lower generalization bound compared the square norm or the tensor nuclear norm. Furthermore, it is clear that the reshaping a tensor and regularization using the tensor nuclear norm lead to a lower generalization bound compared to multilinear rank based norms such as the overlapped trace norm, latent trace norm, and scaled latent trace norms and tensor train rank based Schatten TT norm.

The next theorem provides the Rademacher complexity for completion models regularized by the reshaped latent tensor nuclear norm.

Theorem 2

Let us consider a K-mode tensor\({\mathcal {W}} \in {\mathbb {R}}^{n_1 \times \cdots \times n_K}\). Let us consider a collection of G collection of reshaping sets \(D_{\mathrm {L}}=(D^{(1)},\ldots ,D^{(G)})\) where each \(D^{(s)} = (D^{(s)}_1,\ldots ,D^{(s)}_{M_{s}}),\;s=1,\ldots ,G\) consists a reshaping set for a \(M_{s}\)-mode reshaped tensor. Consider the hypothesis class \(\textsf {W}_{\textsf {rl}} = \{ {\mathcal {W}} | {\mathcal {W}}^{(1)} + \cdots + {\mathcal {W}}^{(G)} = {\mathcal {W}}, \Vert {\mathcal {W}}\Vert _{\mathrm {r\_latent}(D_{\mathrm {L}})} \le t \}\) for a given set of reshaping set \((D_{1},\ldots ,D_{M})\). Suppose that for all \({\mathcal {X}}_{i_1,\ldots ,i_K}\), \(l({\mathcal {X}}_{i_1,\ldots ,i_K}, \cdot )\) is \(\varLambda\)-Lipschitz continuous. Then,

(a) when \({\mathcal {W}}\) has a multilinear rank of \((r_{1},\ldots ,r_{K})\), we obtain

$$\begin{aligned}R_{\mathrm {S}}(l \circ {\mathcal {W}}) &\le \frac{c\varLambda }{|\mathrm {S}|}\min _{g \in G} \bigg ( \frac{\prod _{k=1}^{K}r_{k}}{\max _{j = 1,\ldots ,M} \prod _{i \in D^{(g)}_{j}} r_{i}} \bigg ) \gamma _1\left({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})}\right)\\&\quad \max _{g \in G} \log (4M_g)\sum _{j = 1}^{M_{g}}\sqrt{ \prod _{p \in D^{(g)}_{j}} n_{p} }. \end{aligned}$$

(b) when \({\mathcal {W}}\) has a CP rank of \(r_{cp}\), we obtain

$$\begin{aligned}R_{\mathrm {S}}(l \circ {\mathcal {W}}) &\le \frac{c\varLambda }{|\mathrm {S}|}r_{cp}\min _{g} \gamma _1\left({\mathcal {W}}^{(g)}_{(D^{(g)}_{1},\ldots ,D^{(g)}_{M_{g}})}\right)\\&\quad \max _{g \in G} \log (4M_g) \sum _{j = 1}^{M_{g}}\sqrt{ \prod _{p \in D^{(g)}_{j}} n_{p} }. \end{aligned}$$

where c is a constant.

Theorem 2 shows that latent reshaped tensor nuclear norm bounds the Rademacher complexity by the largest mode based component that results from all the reshaping sets. Further, with the multilinear rank of the tensor the Rademacher complexity is bounded by the smallest rank based component that results from all the reshaping sets. This observation indicates that properly selecting a set of reshaping sets to use with the latent reshaped tensor nuclear norm can lead to a lower generalization bound.

We want to point out that the largest singular values (\(\gamma _1(\cdot )\)) that appear in both Theorems 1 and 2 can be upper bounded by taking the largest singular value with respect to all possible reshaping sets for a tensor. However, we do not use such a bounding to keep the Rademacher complexities small.

4.1 Optimal reshaping strategies

Given that we have an understanding of the ranks of the tensor, Theorem 1 can be used to select a reshaping set such that reshaped tensor has a smaller rank and relatively smaller mode dimensions. However, since we do not know the rank in advance, selecting a reshaping set such that the reshaped tensor does not have large mode dimensions would lead to a better performance.

To avoid the difficulty in choosing a single reshaping set, we can use the reshaped latent tensor nuclear norm by choosing several reshaping sets that agree with our observation in Theorem 1. However, since the Rademacher complexity is bounded by the largest mode based components as shown in Theorem 2, it is important not to select reshaping sets that result in a tensor with large dimensions. A general strategy to create the reshaping sets by selecting the original tensor and other reshaping sets that do not result in large mode dimensions compared to the original tensor.

5 Optimization procedures

It has been shown that learning by constraining the tensor nuclear norm is the NP-Complete problem (Hillar and Lim 2013), which makes solving the problems (3) and (4) computationally difficult. In Yang et al. (2015) an approximation method have been proposed to compute the spectral norm by computing largest singular vectors on each mode that is combined with the Frank-Wolfe optimization method to solve (3). We adopt their approximation method to solve our proposed completion models with reshaped tensors (3) and (4). We found that solutions using the approximation method provide agreements with our theoretical results related to generalization bounds we showed in the Sect. 4. However, there is no theoretical analysis available to understand how well the approximation method results in a solution compared to an exact solution.

The optimization method proposed in Yang et al. (2015) uses an approximation method for the spectral norm using a recursive algorithm based on singular value decomposition with respect to each mode. However, we adopt a more simpler approach as given in Algorithm 1, which we believe is more easier to implement. Using the approximation method, we provide an optimization procedure to solve the completion model that is regularized by a single reshaped norm in the Algorithm 2. The optimization procedure in Algorithm 2 is also similar to the Frank-Wolfe based optimization procedure proposed in Yang et al. (2015). The additions in Algorithm 2 to Yang et al. (2015) are the computation of the spectral norm of the reshaped tensor in step 7 and the conversion of the reshaped tensor to the original dimensions in step 12. Here, we want to recall Definition 1 to refer to the reshaping operator \(\varPi _{(D_{1},\ldots ,D_{M})}()\) and its inverse operator \(\varPi _{(D_{1},\ldots ,D_{M})}^{\top }()\) for any given reshaping set \((D_1,D_2,\ldots ,D_M)\).

figure a
figure b
figure c

Next, we give an algorithm to solve the completion model regularized by the reshaped latent tensor nuclear norm. The Frank-Wolfe optimization method has also been applied to efficiently solve learning models regularized by the latent trace norms (Guo et al. 2017). We follow their approach to design Frank-Wolfe method for the reshaped latent tensor nuclear norm and Algorithm 3 shows the steps for optimization. From Lemma 1, we know that we need to find the reshaping with the largest spectral norm each t step to update the Frank-Wolfe procedure. This is shown in the lines 7–11 in the Algorithm 3.

6 Experiments

In this section, we give simulation and real-data experiments.

6.1 Simulation experiments

We created simulation experiments for tensor completion using tensors with some fixed multilinear rank and CP rank. We create a K-mode tensor with the multilinear rank of \((r_{1},\cdots ,r_{K})\) by generating a tensor \({\mathcal {T}} \in {\mathbb {R}}^{n_{1} \times \cdots \times n_{K} }\) using the Tucker decomposition (Kolda and Bader 2009) as \({\mathcal {T}} = {\mathcal {C}} \times _{1} U_{1} \times _{2} U_{2} \times _{3} \cdots \times _{K} U_{K}\), where \({\mathcal {C}} \in {\mathbb {R}}^{r_{1} \times \cdots \times r_{K} }\) is a core tensor whose elements are sampled from a normal distribution specifying the multilinear rank \((r_{1},\cdots ,r_{K})\) and \(U_{k} \in {\mathbb {R}}^{r_{k} \times n_{k}},\;k=1,\ldots ,K\) are orthogonal component matrices. We create a tensor with the CP rank of r using the CP decomposition (Kolda and Bader 2009) as \({\mathcal {T}} = \sum _{i=1}^{r} c_{i} u_{1i} \otimes u_{2i} \otimes \dots \otimes u_{Ki}\) where \(u_{ki} \in {\mathbb {R}}^{n_{k}},\;k=1,\ldots ,K,\;i=1,\ldots ,r\) are sampled from a normal distribution and normalized such that \(\Vert u_{ki}\Vert _{2}^{2} = 1\) and \(c_{i} \in {\mathbb {R}}^{+}\). From the total number of elements in the generated tensors, we randomly selected 10, 40, and 70 percentages as training sets, and from the remaining we selected 10 percent of elements as validation set, and the rest were taken as test data. We conducted 3 simulations for each randomly generated tensor.

For all simulation experiments, we tested completion using our proposed completion models (3) with the reshaped tensor nuclear norm (abbreviated as RTNN) and (4) with the reshaped latent tensor nuclear norm (abbreviated as RLTNN). Additionally, we performed completion using the tensor nuclear norm (abbreviated as TNN) without reshaping and the square norm (abbreviated as SN). As further baseline methods, we used tensor completion with regularization using the overlapped trace norm (abbreviated as OTN), scaled latent trace norm (abbreviated as SLTN), and the Schatten TT norm (Imaizumi et al. 2017) (abbreviated as STTN). As the performance measure of completion, we calculated the mean squared error (MSE) on the validation data and test data. For all completion models, we performed cross-validation of regularization parameters in power of \(2^{x}\), with x ranging from \(-5\) to 15 with intervals of 0.25.

Fig. 1
figure 1

Performances of completion of the tensors (a) Tensor \({\mathcal {T}_1} \in {\mathbb {R}}^{10 \times 10 \times 40 \times 40}\) with a multilinear rank (9, 9, 3, 3) and (b) tensor \({\mathcal {T}}_2 \in {\mathbb {R}}^{10 \times 10 \times 10 \times 10 \times 10 }\) with a CP rank 3

Fig. 2
figure 2

Performances of completion of the tensors (a) Tensor \({\mathcal {T}}_3 \in {\mathbb {R}}^{10 \times 10 \times 40 \times 40}\) with a multilinear rank (3, 3, 35, 35) and (b) tensor \({\mathcal {T}}_3 \in {\mathbb {R}}^{10 \times 10 \times 10 \times 10 \times 10}\) with a CP rank 243

For our first simulation experiment, we created a 4-way tensors \({\mathcal {T}}_{1} \in {\mathbb {R}}^{n_{1} \times n_{2} \times n_{3} \times n_{4}}\) with \(n_{1} = n_{2} = 10, n_{3} =n_{4} = 40\) with a multilinear rank of \((r_1,r_2,r_3,r_4) = (9,9,3,3)\). From Mu et al. (2014) we can reshape \({\mathcal {T}_1}\) by using a reshaping set of \((D_1,D_2) = ((n_1,n_3),(n_2,n_4))\) such that it creates a square matrix for the square norm. From Theorem 1, we see that the rank components in the Rademacher complexity for the nuclear norm and the square norm are \(\prod _{k=1}^{K}r_{k} / (\max _{j = 1,\ldots ,4}r_{j}) = 243\) and \(\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2}\prod _{i \in D_{j}} r_{i}) = 27\), respectively. Further, Theorem 1 shows that the mode based components for the nuclear norm and the square norm are \(\log (4\cdot 4)(\sum _{k=1}^{4}\sqrt{n_k}) \approx 53\) and \(\log (4\cdot 2)(\sqrt{n_1n_3} + \sqrt{n_2n_4}) \approx 83\), respectively. This leads to a lower generalization bound for the nuclear norm compared to the square norm justifying its better performance as shown in the Fig. 1a. However, Theorem 1 indicates that the lowest generalization bound is obtained by using the reshaping set \((D'_1,D'_2, D'_3) = ((n_1,n_2),n_3,n_4)\), which combines the high ranked modes (mode 1 and mode 2) together resulting in a rank based component of \(\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2,3}\prod _{i \in D'_{j}} r_{i}) = 9\) and a mode based component of \(\log (4\cdot 3)(\sqrt{n_1n_2} + \sqrt{n_3} + \sqrt{n_4}) \approx 56\). Figure 1a agrees with our theoretical analysis showing that our proposed reshaped tensor nuclear norm obtains the best performance compared to other norms. For the reshaped latent tensor nuclear norm, we combined the two reshaping sets \(((n_1,n_2,n_3,n_4), ((n_1,n_2),n_3,n_4))\). Applying Theorem 2, we see that this reshaping set combination leads to a lower Rademacher complexity. However, this combination only gave a comparable performance compared to the reshaped tensor nuclear norm.

As our second simulation, we created a 5-mode tensor \({\mathcal {T}}_{2} \in {\mathbb {R}}^{n_1 \times n_2 \times \cdots \times n_5}\), where \(n_1 = n_2 = \ldots =n_5=10\) with a CP rank of 3. From Theorem 1, we know that we can only consider the mode based component of the Rademacher complexity to obtain a lower generalization bound. For the square norm we can use a reshaping set such as \((D_1,D_2) =((n_1,n_2),(n_3,n_4,n_5))\), which results in the mode based component as \(\log (4\cdot 2)(\sqrt{n_1n_2} + \sqrt{n_3n_4n_5}) \approx 86\). The tensor nuclear norm leads to a mode based component of \(\log (4\cdot 5)(\sum _{k=1}^{5}\sqrt{n_k}) \approx 47\). As an alternative reshaping method, we propose to combine any two modes together to create a reshaping set such as \((D'_1,D'_2,D'_3) =(n_1,n_2,n_3,(n_4,n_5))\) for the reshaped tensor nuclear norm, which lead to a mode based component of \(\log (4\cdot 4)(\sqrt{n_1} + \sqrt{n_2} + \sqrt{n_3} + \sqrt{n_4n_5}) \approx 54\). Comparing the Rademacher complexities using the mode based components we see that the lowest generalization bound is given by the tensor nuclear norm. Figure 1b shows that our theoretical observation is accurate since the tensor nuclear norm gives the best performance compared to other two reshaped norms. For the reshaped latent tensor nuclear norm we used all the 10 combinations of two modes combinations, which resulted in reshaping sets of \(D =(((n_1,n_2),n_3,n_4,n_5),(n_1,(n_2,n_3),n_4,n_5),\ldots ,\) \((n_1,n_2,n_3,(n_4,n_5)))\). Figure 1b shows that the reshaped latent tensor nuclear norm has outperformed the tensor nuclear norm.

The next simulation focuses on a different multilinear rank for the 4-way tensor \({\mathcal {T}}_3 \in {\mathbb {R}}^{n_{1} \times n_{2} \times n_{3} \times n_{4} }\) with \(n_{1} = n_{2} = 10, n_{3} =n_{4} = 40\). Figure 2a shows the simulation experiment with multilinear rank of (3, 3, 35, 35). Again based on Mu et al. (2014) we can reshape \({\mathcal {T}}_3\) by using a reshaping set of \((D_1,D_2) = ((n_1,n_3),(n_2,n_4))\) or \((D_1,D_2) = ((n_1,n_4),(n_2,n_3))\) to create a square matrix to use with the square norm. From Theorem 1 we can observe that the square norm will result in a rank based component of \(\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2}\prod _{i \in D_{j}} r_{i}) = 105\) and a mode based component of \(\log (4\cdot 2)(\sqrt{n_1n_3} + \sqrt{n_2n_4}) \approx 63\). However, if we combine the high ranked modes 3 and 4 together to create the reshaping set \((D'_1,D'_2, D'_3) = (n_1,n_2,(n_3,n_4))\) for the reshaped tensor nuclear norm, then the rank based component will decrease to \(\prod _{k=1}^{K}r_{k} / (\max _{j = 1,2,3}\prod _{i \in D'_{j}} r_{i}) = 9\) and mode based component will decrease to \(\log (4\cdot 2)(\sqrt{n_1} \sqrt{n_3} + \sqrt{n_2n_4}) \approx 55\). Furthermore, the tensor nuclear norm leads to a rank based component of \(\prod _{k=1}^{K}r_{k} / (\max _{j = 1,\ldots ,4}r_{j}) = 315\) and mode based component of \(\log (4\cdot 4)(\sum _{k=1}^{4}\sqrt{n_k}) \approx 53\) resulting in a larger generalization bound compared to the proposed reshaped set \((D'_1,D'_2, D'_3) = (n_1,n_2,(n_3,n_4))\). This analysis is also confirmed with the experimental results as shown in Fig. 2a where the reshaped tensor nuclear norm gives the best performance. Using the Theorem 2, we find that if we use reshaping sets \(((n_1,n_2,n_3,n_4),((n_1,n_2),n_3,n_4))\) for the reshaped latent tensor nuclear norm, the Rademacher complexity will be bounded by the smaller rank based component from the reshaping set \((n_1,n_2,n_3,n_4)\) and the mode based component from the reshaping set \(((n_1,n_2),n_3,n_4)\). However, the reshaped latent tensor nuclear norm was not able to perform better than the tensor nuclear norm or the proposed reshaped norm with \((D'_1,D'_2, D'_3) = (n_1,n_2,(n_3,n_4))\).

The final simulation result shown in Fig. 2b is for a tensor \({\mathcal {T}}_4 \in {\mathbb {R}}^{10 \times 10 \times 10 \times 10 \times 10}\) with a CP rank of 243. For this experiment, we used the same reshaping strategies as in the previous experiment with CP rank. We see that when the fraction of training samples is less than 40 percent the tensor nuclear norm has given the best performance. When the fraction of the training samples increases beyond 40 percent the reshaped latent tensor nuclear norm has outperformed the tensor nuclear norm.

6.2 Multi-view video completion

We performed completion on multi-view video data using the EPFL data set: Multi-camera Pedestrian Videos data (Berclaz et al. 2011). Videos in this data set capture sequentially entering a room and walking around of four people from 4 views using 4 synchronized cameras. We down-sampled each video frame to a height of 96 and width of 120 to obtain a frame as a RGB-color image with dimensions of \(96 \times 120 \times 3\). We sequentially selected 391 frames from each video. Combining all the video frames from all views resulted in a tensor of dimensions of \(96 \times 120 \times 3 \times 391 \times 4\) (height × width × color × frames × views).

To evaluate completion, we randomly removed entries from the multi-view tensor and performed completion using the remaining elements. We randomly selected percentages of 2, 4, 8, 16, 32, and 64 of the total number of elements in the tensor as training elements. As our validation set we selected 10 percent of the total number of elements. The rest of the remaining elements were taken as the test set. To create a square norm, we considered the reshaping set \(\mathrm {((height,width),(color,frames,views))}\). For the reshaped tensor nuclear norm, we experimentally found that the reshaping set \(\mathrm {((height,views),(width,color),(frames))}\) gives the best performance. To create the reshaping set for the reshaped tensor nuclear norm we combined the reshaping sets of the square norm and the reshaped tensor nuclear norm with the unreshaped original tensor. The resulting set was \(D= ((\mathrm {height,width,color,frames,views}),\) \(\mathrm {((height,width),}\) \(\mathrm {(color,frames,views))},\) \(\mathrm {((height,views),}\mathrm {(width,color),(frames))})\). We cross-validated all the completion models with regularization parameters out of \(10^{-1},10^{-0.75},10^{-0.5},\ldots ,10^{7}\).

Fig. 3
figure 3

Tensor completion of the multi-video tensor

Figure 3 shows that when the training set is small (or the reshaped tensor nuclear norm is sparse) the reshaped tensor nuclear norm and the tensor nuclear norm have given good performance compared to the square norm. When the percentage of observed elements increases more than 16 percent, the square norm outperforms the other norms. However, the reshaped latent tensor nuclear norm has shown to be adaptive to all fractions of training samples and has given the overall best performance.

7 Conclusions

In this paper, we generalize tensor reshaping for low-rank tensor regularization and introduce the reshaped tensor nuclear norm and the reshaped latent tensor nuclear norm. We propose tensor completion models that are regularized by the proposed norm. Using generalization bound analysis of the proposed completion models we show that the proposed norms lead to smaller Rademacher complexity bounds compared to exiting norms. Further, using our theoretical analysis we discuss optimal conditions to create reshaped tensor nuclear norms. Simulation and real-data experiments confirm our theoretical analysis.

Our research opens up several future research directions. The most important research should be focused on developing theoretical guaranteed methods for optimization of completion models regularized by the the proposed tensor nuclear norms. Though the approximation methods we have adopted for computing the tensor spectral norm to be used with the Frank-Wolfe from Yang et al. (2015) provide performances that agrees with our generalization bounds we do not know its approximation error. We believe that future theoretical investigations are needed to understand qualitative properties of the proposed optimization procedures. Furthermore, optimization methods for nuclear norms that can scale for large-scale higher order tensors would be an important future research direction. Another important research direction is to further explore the theoretical foundation of tensor completion using the reshaped tensor nuclear norm. In this regard, recovery bounds (Yuan and Zhang 2016) would provide us with stronger bounds on sample complexities for the proposed method.