Computation of low-rank tensor approximation under existence constraint via a forward-backward algorithm

doi:10.1016/j.sigpro.2021.108178

Signal Processing

Volume 188, November 2021, 108178

https://doi.org/10.1016/j.sigpro.2021.108178 Get rights and content

Highlights

•
Performance analysis of iterative algorithms for the calculation of the CP decomposition.
•
A novel problem formulation for the CP tensor decomposition via logarithmic regularized barriers.
•
Forward-backward algorithm to compute the CP tensor decomposition under coherence constraints.
•
Accelerated proximal gradient remains slightly faster than forward-backward for difficult cases.
•
Better performance of forward-backward compared to other algorithms for less difficult cases.

Abstract

The Canonical Polyadic (CP) tensor decomposition has become an attractive mathematical tool in several fields during the last ten years. This decomposition is very powerful for representing and analyzing multidimensional data. The most attractive feature of the CP decomposition is its uniqueness, contrary to rank-revealing matrix decompositions, where the problem of rotational invariance remains. This paper presents the performance analysis of iterative descent algorithms for calculating the CP decomposition of tensors when columns of factor matrices are almost collinear – i.e. swamp problems arise. We propose in this paper a new and efficient proximal algorithm based on the Forward Backward splitting method. More precisely, the existence of the best low-rank tensor approximation is ensured thanks to a coherence constraint implemented via a logarithmic regularized barrier. Computer experiments demonstrate the efficiency and stability of the proposed algorithm in comparison to other iterative algorithms in the literature for the normal case, and also producing significant results even in difficult situations.

Introduction

In numerous applications, signals or data may depend on several quantities such as spatial coordinates, velocity, time, frequency, temperature, etc. And are therefore naturally represented by higher-order arrays of numerical values, which are generally known as tensors. Basically, a tensor is considered as a mathematical object that possesses the properties of multi-linearity when changing the coordinate system [1]. For our purposes, it will be sufficient to see a tensor of order $N$ as a multi-dimensional array in which every element is accessed via $N$ indices. For instance, a first-order tensor is a vector, which is simply a column of numbers, a second-order tensor is a matrix, a third-order tensor appears as numbers arranged in a rectangular box (or a cube, if all modes have the same dimension), etc. Tensors of order higher than two possess properties that are not enjoyed by matrices and vectors.

We shall mainly focus on the so-called Canonical Polyadic (CP) decomposition of tensors [2]. Because of a rediscovery forty years later, it received other names such as Parafac [3], [4] or Candecomp [5]. The acronym “CP” can smartly stand for either “Canonical Polyadic” or “Candecomp/Parafac”, as pointed out in Kiers [6], Comon et al. [7]. We shall assume this terminology. The CP decomposition has been already used in various fields [8], [9], [10], [11], [12], [13]. The main interest in using the CP decomposition lies in its uniqueness, under rather mild hypotheses [14], [15]. Other tensor decompositions exist, but permit only compression and not parameter identification since they are not unique [16], [17]. Various CP decomposition algorithms can encounter problems of slowness or sometimes lack of convergence; such cases may be due to tensor degeneracies. These situations have been well categorized by Richard Harshman [18] into the following three cases: Bottleneck when two or more factors in one of the modes are almost collinear [19], Swamp when all modes have at least two quasi-collinear factors [19], [20], [21] – a general case of bottleneck, and CP-degeneracies when some of the factors diverge to infinity while tend to cancelling each other, producing a better data fit [22], [23].

The traditional algorithm to compute the CP decomposition is ALS (Alternating Least Squares), which was initially proposed in Carroll and Chang [5]. Basically, the ALS is an iterative algorithm that progressively updates each unknown parameter individually in an alternating fashion starting from an initial guess. ALS continues until it can no longer improve the solution, or it reaches the allowed maximum number of iterations. As a result, the system to be solved is then turned into three simple least squares (LS) problems. The ALS algorithm converges towards a local minimum under mild conditions [24]. However, its convergence towards the global minimum can sometimes be slow, if ever reached. Furthermore, the convergence of the algorithm may, in certain cases, fall in swamps [20], where the convergence rate is very low and the error between two successive iterations remains unchanged. Various solutions have been proposed to face the slowness of convergence of the ALS algorithm, for instance [11], [25], [26]. The idea is to produce a good initialization via a Generalized Eigenvalue Decomposition (GEVD) of two tensor slices. But such an initialization assumes that the two factor matrices are of full rank, and that the third one doesn’t contain zero elements. Another way to increase the ALS convergence speed is to compress the tensor via a Tucker3 decomposition [27], [28]. In [11], the Tucker3 compression is applied as well as an initialization based on the proper analysis; this process has become a common practice to reduce the computational burden [7]. Other tricks to speed-up convergence include the Enhanced Line Search (ELS) method [19], which has demonstrated its effectiveness in cases of tensor affected by degenerative factors (Factor degeneracies), or to reduce ALS sensitivity to initialization [29]. The above algorithms enhance the convergence speed of the ALS algorithm but are still unable to handle swamp phenomena. Besides, a partitioned ALS algorithm was proposed in Tichavskỳ et al. [30], Phan et al. [31], in which factor matrices are partitioned into blocks, and blocks of different factor matrices are updates jointly. In contrast to alternating algorithms, all-at-once optimization algorithms [7], [32], [33], [34] allow simultaneous estimation of the matrix factors. These algorithms have shown good performances to deal with swamp or bottleneck phenomena when some required conditions are satisfied.

Recent studies [12], [34], [35] have shown that the introduction of coherence constraints overcome these phenomena. The solution proposed in Farias et al. [35] is a direct adaptation of ALS using the Dykstra projection algorithm over all correlation matrices. In the proposals [12], [34], which consist in estimating the factor matrices in a simultaneous way, such methods have not only proved to be efficient and stable for calculating the CP decomposition in the normal case, but also in difficult cases, where the estimated factors are close to collinear.

This document presents a performance evaluation of the iterative descent algorithms for calculating CP decomposition such as the unconstrained gradient descent algorithm [10], the constrained gradient descent algorithm [12], the proximal gradient descent algorithm [34] and the accelerated proximal gradient descent algorithm [34] and where we propose another new algorithm links the logarithmic barrier function with proximal methods [36], which are now reliable and powerful optimization tools, leading to a variety of proximal algorithms, such as the Proximal Point algorithm, the Proximal Gradient algorithm, the Forward Backward algorithm, and several others including linearization and/or splitting [37]. Since the CP decomposition problem in the simultaneous estimation version represents a non-convex problem [38], it is therefore necessary to develop efficient and fast algorithms for such problem. Among non-convex splitting methods, one can find the works of [39], [40], where a monotonous descent of the objective value is imposed to ensure convergence, while on the other hand, there are some recent works [41], [42] that introduce a generic method for non-convex non-smooth problems based on Kurdyka–Lojasiewicz theory. We shall be particularly interested in the Forward Backward proximal algorithm by following the convergence proof of [40] as it satisfies the assumptions of our novel CP decomposition formulation.

The paper is organized as follows. The Section 2 states notation and basic properties of tensors. In Section 3, formulation and conditions of CP approximation under coherence constraint are explained. In Section 4 we present the novel formulation for CP Decomposition as well as our new optimization algorithm. In Section 5 we report computer experiments, and eventually conclude in Section 6.

Section snippets

Preliminaries

First, we start by introducing some basic notations that will be used in this article. Scalars are denoted by lower-case letters, e.g., $a$ , vectors are denoted by lower-case bold letters, e.g., a, matrices are denoted by upper-case bold letters, e.g., M, and tensors are denoted by calligraphic letters, e.g., $T$ . Also, the conjugate of an element a will be denoted $a^{*}$ , while its modulus $| a |$ . The conjugate of the matrix M will be denoted $M^{*}$ , its transpose $M^{T}$ and its Hermitian transpose $M^{H}$ .

Low rank

The problem we want to solve is the following: given a third-order tensor $X \in C^{I \times J \times K}$ , compute its best approximation of rank $R$ . In other words, for a given $R$ , the goal is to minimize an objective function $f$ of the form: $f (A, B, C; λ) = {∥ X - \hat{X} ∥}_{F}^{2} = {∥ X - \sum_{r = 1}^{R} λ_{r} (a_{r} \otimes b_{r} \otimes c_{r}) ∥}_{F}^{2} .$ As an alternative, we can also write the minimization of (8) using the vectorization defined in (3) with $x = v e c {X}$ as: $\min_{\hat{x}} f (\hat{x}) = \min_{\hat{x}} {∥ x - \hat{x} ∥}_{F}^{2} = \min_{A, B, C, λ} {∥ x - \sum_{r = 1}^{R} λ_{r} (a_{r} ⊠ b_{r} ⊠ c_{r}) ∥}_{F}^{2},$ where symbol $⊠$ represents the Kronecker product [49]. The

Problem formulation

For the sake of simplicity of writing, we will stack factor matrices in a single vector defined as: $x = v e c {[A^{T}, B^{T}, C^{T}]}$ , and note that in this article, we have chosen to define the function $g$ as the logarithmic term $- l n (C_{p} (x))$ of the coherence constraint $C_{p} (x)$ . Formally, we face a constrained optimization problem of the form: $\begin{matrix} \min_{x} & f (x) \\ s.t. & g (x) & \geq 0 . \end{matrix}$

This optimization problem with inequality constraint can be solved by converting it to an unconstrained optimization problem with an appropriate penalty,

Results and discussions

In this section, we compare various algorithms based on computer experiments performed under two scenarios: (i) the first one, which is is a concrete example in which it is necessary to deal with the swamp phenomenon, i.e. where the factor matrices are highly correlated in all three modes and the constraint is not satisfied ( $C_{p} \leq 0$ ) at convergence, and (ii) the second scenario addresses the normal case of the CP decomposition, where the limit point lies in the admissible region, i.e. when the

Conclusions

We have provided a novel formulation for the CP decomposition related to the logarithmic barrier function. We also described a new method to compute this decomposition for both normal and difficult cases, i.e. when the problem of swamp arises. This method is based on the Forward Backward splitting algorithm, with a simple and effective monitoring strategy capable of calculating the minimal CP decomposition of three-way arrays. We performed a complete comparison based on computer experiments,

CRediT authorship contribution statement

Marouane Nazih: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - original draft, Validation. Khalid Minaoui: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - review & editing, Validation. Elaheh Sobhani: Conceptualization, Methodology, Funding acquisition, Data curation, Formal analysis, Writing - original draft, Validation. Pierre Comon: Conceptualization, Methodology, Funding acquisition, Data

Declaration of Competing Interest

Authors declare that they have no conflict of interest.

References (60)

C. Jutten et al.
Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture
Signal Process.
(1991)
J.B. Kruskal
Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics
Linear Algebra Appl.
(1977)
A. Stegeman et al.
On Kruskal’s uniqueness condition for the Candecomp/Parafac decomposition
Linear Algebra Appl.
(2007)
W.S. Rayens et al.
Two-factor degeneracies and a stabilization of PARAFAC
Chemom. Intell. Lab. Syst.
(1997)
N. Li et al.
Some convergence results on the regularized alternating least-squares method for tensor decomposition
Linear Algebra Appl.
(2013)
C.A. Andersson et al.
Improving the speed of multi-way algorithms:: Part i. tucker3
Chemom. Intell. Lab. Syst.
(1998)
R. Bro et al.
Improving the speed of multiway algorithms: Part ii: Compression
Chemom. Intell. Lab. Syst.
(1998)
A.-H. Phan et al.
Low complexity damped Gauss–Newton algorithms for Candecomp/Parafac
SIAM J. Matrix Anal. Appl.
(2013)
R.C. Farias et al.
Coherence constrained alternating least squares
2018 26th European Signal Processing Conference (EUSIPCO)
(2018)
V.D. Silva et al.
Tensor rank and the ill-posedness of the best low-rank approximation problem
SIAM J. Matrix Anal. Appl.
(2008)

J.-L. Starck et al.

Tensor decompositions-state of the art and applications

Keynote Address in IMA Conf, Mathematics in Signal Processing, Warwick, UK

(2000)

F.L. Hitchcock

The expression of a tensor or a polyadic as a sum of products

J. Math. Phys.

(1927)

R.A. Harshman, et al., Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multimodal...

R.A. Harshman

Determination and proof of minimum uniqueness conditions for PARAFAC1

UCLA Work. Pap. Phonetics

(1972)

J.D. Carroll et al.

Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart–Young” decomposition

Psychometrika

(1970)

H.A.L. Kiers

Towards a standardized notation and terminology in multiway analysis

J. Chemom.

(2000)

P. Comon et al.

Tensor decompositions, alternating least squares and other tales

J. Chemom.

(2009)

K.R. Murphy et al.

Fluorescence spectroscopy and multi-way techniques. PARAFAC

Anal. Methods

(2013)

A. Rouijel et al.

CP decomposition approach to blind separation for DS-CDMA system using a new performance index

EURASIP J. Adv. Signal Process.

(2014)

N.D. Sidiropoulos et al.

Blind PARAFAC receivers for DS-CDMA systems

IEEE Trans. Signal Process.

(2000)

S. Sahnoun et al.

Joint source estimation and localization

IEEE Trans. Signal Process.

(2015)

X. Liu et al.

Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis

IEEE Trans. Geosci. Remote Sens.

(2012)

L.R. Tucker

Some mathematical notes on three-mode factor analysis

Psychometrika

(1966)

T.G. Kolda

Orthogonal tensor decompositions

SIAM J. Matrix Anal. Appl.

(2001)

J. Kruskal et al.

How 3-MFA data can cause degenerate PARAFAC solutions, among other relationships

Multiway Anal.

(1989)

M. Rajih et al.

Enhanced line search : a novel method to accelerate PARAFAC

SIAM J. Matrix Anal. Appl.

(2008)

B.C. Mitchell et al.

Slowly converging PARAFAC sequences: swamps and two-factor degeneracies

J. Chemom.

(1994)

R.A. Harshman

The problem and nature of degenerate solutions or decompositions of 3-way arrays

Talk at the Tensor Decompositions Workshop, Palo Alto, CA

(2004)

Cited by (1)

Computation of the nonnegative canonical tensor decomposition with two accelerated proximal gradient algorithms
2022, Digital Signal Processing: A Review Journal
Citation Excerpt :
To the best of our knowledge, proximal algorithms have not yet been provided in the literature to simultaneously estimate factor matrices of the NCP. This all-at-once computation was considered for the CP decomposition in [29,56] but under a coherence constraint. One such approach has also been used under the nonnegative constraint in [51] for which the minimization was performed using the Enhanced Line Search (ELS) and the alternating backtracking with ELS for the gradient and quasi-Newton algorithms.
Multidimensional signal analysis has become an important part of many signal processing problems. This type of analysis allows to take advantage of different diversities of a signal in order to extract useful information. This paper focuses on the design and development of multidimensional data decomposition algorithms called Canonical Polyadic (CP) tensor decomposition, a powerful tool in a variety of real-world applications due to its uniqueness and ease of interpretation of its factor matrices. More precisely, it is desired to compute simultaneously the factor matrices involved in the CP decomposition of a real nonnegative tensor, under nonnegative constraints. For this purpose, two proximal algorithms are proposed, the Monotone Accelerated Proximal Gradient (M-APG) and the Non-monotone Accelerated Proximal Gradient (Nm-APG) algorithms. These algorithms are implemented via a regularization function with a simple control strategy capable of efficiently taking advantage of previous iterations. Simulation results demonstrate better performance of the two proposed algorithms in terms of accuracy when compared to other nonnegative CP algorithms in the literature.

View full text

Computation of low-rank tensor approximation under existence constraint via a forward-backward algorithm

Highlights

Abstract

Introduction

Section snippets

Preliminaries

Low rank

Problem formulation

Results and discussions

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Signal Process.

Linear Algebra Appl.

Linear Algebra Appl.

Chemom. Intell. Lab. Syst.

Linear Algebra Appl.

Chemom. Intell. Lab. Syst.

Chemom. Intell. Lab. Syst.

SIAM J. Matrix Anal. Appl.

SIAM J. Matrix Anal. Appl.

Tensor decompositions-state of the art and applications

Keynote Address in IMA Conf, Mathematics in Signal Processing, Warwick, UK

The expression of a tensor or a polyadic as a sum of products

J. Math. Phys.

Determination and proof of minimum uniqueness conditions for PARAFAC1

UCLA Work. Pap. Phonetics

Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart–Young” decomposition

Psychometrika

Towards a standardized notation and terminology in multiway analysis

J. Chemom.

Tensor decompositions, alternating least squares and other tales

J. Chemom.

Fluorescence spectroscopy and multi-way techniques. PARAFAC

Anal. Methods

CP decomposition approach to blind separation for DS-CDMA system using a new performance index

EURASIP J. Adv. Signal Process.

Blind PARAFAC receivers for DS-CDMA systems

IEEE Trans. Signal Process.

Joint source estimation and localization

IEEE Trans. Signal Process.

Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis

IEEE Trans. Geosci. Remote Sens.

Some mathematical notes on three-mode factor analysis

Psychometrika

Orthogonal tensor decompositions

SIAM J. Matrix Anal. Appl.

How 3-MFA data can cause degenerate PARAFAC solutions, among other relationships

Multiway Anal.

Enhanced line search : a novel method to accelerate PARAFAC

SIAM J. Matrix Anal. Appl.

Slowly converging PARAFAC sequences: swamps and two-factor degeneracies

J. Chemom.

The problem and nature of degenerate solutions or decompositions of 3-way arrays

Talk at the Tensor Decompositions Workshop, Palo Alto, CA

Enhanced line search : a novel method to accelerate PARAFAC